๐ ๐๐ฒ ๐๐ข๐ซ๐ฌ๐ญ ๐๐๐ญ๐ ๐ฆ๐๐๐ข๐ฎ๐ฆ ๐๐ซ๐ญ๐ข๐๐ฅ๐ along with ๐๐๐ ๐ ๐ฅ๐ ๐๐๐ญ๐๐ฌ๐๐ญ ๐๐ง๐ a complete ๐๐ง๐-๐ญ๐จ-๐๐ง๐ ๐๐๐ญ๐ ๐๐ข๐ฉ๐๐ฅ๐ข๐ง๐ ๐๐๐ฉ๐ฌ๐ญ๐จ๐ง๐ ๐ฉ๐ซ๐จ๐ฃ๐๐๐ญ!! ๐
It all began with the 6-week live DataExpert.io bootcamp by Zach Wilson this June. Needless to say who he isโone of the most experienced and talented data experts in the industry.
During the bootcamp assignments and lecture discussions in the Discord channel, I happened to meet my amazing project partner, Meeta Pandit. Full of ideas, we structured our capstone and combined our expertise and learnings to get the best out of us. Despite being in slightly different time zones, we both beautifully pulled it together.
For our capstone project ( https://lnkd.in/dfgRHJaB ), we picked up Citibike data in both real-time and historical formats.
Part 1 - Medium Article - https://lnkd.in/dj4cevHi :ย
Included real-time data ingestion using ๐๐๐๐ค๐ clusters followed by the ๐๐๐ญ๐๐๐ซ๐ข๐๐ค๐ฌ data transformations using Delta tables and ๐๐ฒ๐๐ฉ๐๐ซ๐ค for data processing. Data validations are the most overlooked part of a project, which is why we incorporated it at highest priority using the failure recovery, idempotency and data contracts.
ย Part 2 - Medium Article - https://lnkd.in/de6tpfkN :
Involved setting up roles and users setup with permissions and data governance policies based on roles in ๐๐ง๐จ๐ฐ๐๐ฅ๐๐ค๐, followed by DBT setup. We also utilized ๐๐ญ๐ซ๐๐๐ฆ๐ฅ๐ข๐ญ using the data in Snowflake to further analyse the data. Using ๐๐๐, the project showcased numerous transformations using macros and tests making sure clean and structured flow of data for further analytics. Finally, orchestrating DBT using ๐๐๐ ๐ฌ๐ญ๐๐ซ for better dependency visualizations and error handling.
But wait, itโs not over yet. We also incorporated the clean data on ๐๐๐ ๐ ๐ฅ๐ (https://lnkd.in/dhzK-tZ3) so that data enthusiasts like yourself can explore it more and create your own mini projects. This data mainly contains station details in the Citibike_dim_station table and fct_citibike_data, which not only includes the trip details but also the total cost of the trip based on the duration and standard price setโthis data is exclusively extracted from the DBT model created for the trip cost calculation.
Kudos to our hard work and collective efforts in making the most out of this bootcamp. A big thank you to the entire DataExpert.io team Zach Wilson Mitali Gupta JulieAnn Scherer!
#dataengineering #datapipeline #datajobs #dataexpert
Head of Data Platform @ MAPFRE
1wI will not miss it!!!