DataExpert.io

Education

San Francisco, California 18,900 followers

Data Engineering education, solutions, and evangelism

View all 17 employees

About us

EcZachly Inc is a company dedicated to inspiring and educating the next generation of data talent!

Website: https://www.dataexpert.io
External link for DataExpert.io
Industry: Education
Company size: 2-10 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2023

Locations

Primary

San Francisco, California 94103, US

Get directions

Employees at DataExpert.io

See all employees

Updates

DataExpert.io reposted this

Billy Switzer

Data Engineer and Architect | Musician
1d
Report this post
I am beyond honored, humbled, and grateful that my capstone project for the DataExpert.io V4 cohort was chosen as one of the top 5 individual submissions. Thank you so much, Zach Wilson and staff for your consideration, and for a fantastic bootcamp! Special shoutout to Jo Hjersman for the extra TA session you held before the project deadline - I wouldn't have received this honor without your guidance! Congratulations to my fellow honorees Aayushi Beniwal, Erich Silva, Hilary S., Maria Vysochina, Meeta Pandit, and Yavuz Karabiyik! For those interested in checking out my project, it can be found at https://lnkd.in/gSsS4AU5 (subject to revision). The original submission is https://lnkd.in/gwqYUrWR. #dataengineering #stocks #spark #airflow #sql #superset #iceberg
32 Comments

Like Comment Share
DataExpert.io reposted this

Donny Williams

Data Solutions Engineer I @ HealthEquity
1d
Report this post
I’m happy to share that I’ve obtained a new certification: DataEngineer.io Combined Excellence Certification from DataExpert.io! Thank you Zach Wilson, JulieAnn Scherer, Mitali Gupta and a whole host of amazing engineers and professionals for creating a collaborative, challenging, and creative bootcamp! The community the team has built around the DataExpert.io platform has vastly improved my data career possibilities. The 6 week course was both exciting as it was exhausting, but I would 00 do it again. It's just that good. I look forward to engaging with the community in the future as my career grows!

This content isn’t available here

Access this content and more in the LinkedIn app

2 Comments

Like Comment Share
DataExpert.io reposted this

Monte Carlo

27,321 followers
1d
Report this post
Data Quality Day is coming up FAST! ⚡ Join us July 30th for Data Quality Day: Scaling Data Trust Across Business Domains. 🔥 You'll hear leading strategies and insights from: ⚡ Zach Wilson, Founder at DataExpert.io ⚡ Seth Y., Global Field CTO Snowflake ⚡ Zachary Lancaster, Data Engineering Manager at Warner Bros. Discovery ⚡ Bronte Baer, Data Platform & Analytics Manager at Earnest ⚡ Siva Veera, Data Engineering Manager at Riot Games ⚡ Barr Moses, CEO & Co-Founder, Monte Carlo Register here: https://lnkd.in/eceYmVsT #dataqualityday #dataengineering #dataanalytics
1 Comment

Like Comment Share
DataExpert.io reposted this

Michael Marquez

Analytics Engineer at PlayStation 🎮
6d
Report this post
I’m happy to share that I’ve obtained a new certification: Data Engineering Boot Camp from DataExpert.io! This boot camp was a rollercoaster and felt like giving up plenty of time. Glad I didn't! The technical lessons were great, but I mostly appreciated Zach Wilson's insights on big tech culture and working with others. Special thanks to Bruno Souza de Lima and team DataExpert.io

This content isn’t available here

Access this content and more in the LinkedIn app

7 Comments

Like Comment Share
DataExpert.io reposted this

Renee S. Liu

Solutions Engineer at Scale AI
1w Edited
Report this post
As any production-grade ML model needs to be in a CI/CD pipeline to perform optimally, I've just upgraded my feature set and hyperparameters! Proud to receive my certification: Data Engineering from DataExpert.io! with the master Zach Wilson and his team. As a busy professional, my scarce time is only invested in something truly significant. https://lnkd.in/gKSU95Kz

This content isn’t available here

Access this content and more in the LinkedIn app

3 Comments

Like Comment Share
DataExpert.io reposted this

Monte Carlo

27,321 followers
1w
Report this post
📢 Calling all data leaders! 📢 Join us on July 30th for the latest webinar in our Data Quality Day – Scaling Data Trust Across Business Domains – a half-day virtual webinar featuring insights from leading data teams as they explore crucial strategies for ensuring data trust in a data-driven world. 🔥 You'll hear leading strategies from: ✅ Zach Wilson, Founder at DataExpert.io ✅ Seth Y., Global Field CTO Snowflake ✅ Zachary Lancaster, Data Engineering Manager at Warner Bros. Discovery ✅ Bronte Baer, Data Platform & Analytics Manager at Earnest ✅ Siva Veera, Data Engineering Manager at Riot Games ✅ Barr Moses, CEO & Co-Founder, Monte Carlo You don't want to miss this! Register here: https://lnkd.in/eceYmVsT #dataquality #datatrust #dataengineering #dataobservability #dataqualityday
1 Comment

Like Comment Share
DataExpert.io reposted this

Sagar Arora

Senior Data Engineer at Clearcover
1w
Report this post
I’m happy to share that I’ve obtained a new certification: Data Engineer from DataExpert.io! https://lnkd.in/g2ZkQ3KT

This content isn’t available here

Access this content and more in the LinkedIn app

10 Comments

Like Comment Share
DataExpert.io reposted this

Meeta Pandit

Data Engineer | Building optimized Data Pipelines | Cloud Computing
1w
Report this post
🚀 I am thrilled to share that I completed a stellar data engineering bootcamp at DataExpert.io with a successful capstone project and published my first ever Medium article that I am beyond excited to share with you all! Here are some highlights of the journey and how I accomplished it: ⚖ The Dream Team: Through the amazing community of fellow data nerds at the bootcamp, I met my fantastic project partner, Aayushi Beniwal. She shares my passion for data and technology and is also a nature lover keen on positively impacting the planet. Therefore, we teamed up to architect an end-to-end data pipeline using NYC bike-share data, perfectly aligned with our objectives. We outlined tasks and timelines and proactively shared updates to address any blockers and stay on top of our plan. 🎯 Objectives The project aims to provide near real-time updates on NYC Citibike’s station status and bike locations to operational teams Additionally, the project seeks to analyze historical trends using over 10 years (~2.2 GB) of data on trips taken, focusing on their seasonality, peak times, and stations with the highest bike usage 🔗 Github repo: https://bit.ly/3XZpvGe 🛠 Design and Execution Part I - Streaming Data Pipeline (https://bit.ly/3W49MmI) This was achieved by setting up Kafka topics to ingest data from 2 NYC bike-share data feeds as JSON APIs in real-time. The volume of incoming data was ~289 MB per day Processed this huge volume of data using Spark Streaming and PySpark in Databricks. Utilized micro-batch architecture to stream transformed data to the analytics layer, effectively reducing the load on executors and significantly cutting down processing costs Part II - Historical Trend Analysis (https://bit.ly/4ctSdnb) Created a detailed dimensional model and Processed ~93 MM (2.2 GB) of raw data into Snowflake using DBT transformations like created models for dim_stations, and fact_trips, added macros to calculate cost of each trip taken, and created snapshot (SCD) tables for tracking changes in prices over time Orchestrated DBT transformations using Dagster for dependency monitoring and error handling Data Quality was the highest priority for both pipelines and utilized frameworks like chispa (python lib) for column and dataframe integration testing in Spark and dbt_expectations with custom and generic tests for analytics pipeline. Data Integrity was especially critical in real-time data pipeline which was handled by storing checkpoints and watermarks for late-arriving events in Spark Streaming for failure recovery, and schema registry for strict data contracts in Kafka. A huge shoutout to everyone at DataExpert.io who contributed to the success of this bootcamp and especially Zach Wilson for always inspiring us to bring out the best in us.

End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline — Part I

medium.com

12 Comments

Like Comment Share
DataExpert.io

18,900 followers
1w
Report this post
The capstone projects by DataExpert.io students are coming out and they are truly amazing!

Aayushi Beniwal

Data Engineer | University of Waterloo | Masters in Electrical and Computer Engineering | Specialization in Artificial Intelligence and Machine Learning | DBT | SQL | Python | Snowflake | AWS | Azure
1w Edited

🔔 𝐌𝐲 𝐟𝐢𝐫𝐬𝐭 𝐃𝐚𝐭𝐚 𝐦𝐞𝐝𝐢𝐮𝐦 𝐚𝐫𝐭𝐢𝐜𝐥𝐞 along with 𝐊𝐚𝐠𝐠𝐥𝐞 𝐝𝐚𝐭𝐚𝐬𝐞𝐭 𝐚𝐧𝐝 a complete 𝐄𝐧𝐝-𝐭𝐨-𝐄𝐧𝐝 𝐃𝐚𝐭𝐚 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐜𝐚𝐩𝐬𝐭𝐨𝐧𝐞 𝐩𝐫𝐨𝐣𝐞𝐜𝐭!! 🔔 It all began with the 6-week live DataExpert.io bootcamp by Zach Wilson this June. Needless to say who he is—one of the most experienced and talented data experts in the industry. During the bootcamp assignments and lecture discussions in the Discord channel, I happened to meet my amazing project partner, Meeta Pandit. Full of ideas, we structured our capstone and combined our expertise and learnings to get the best out of us. Despite being in slightly different time zones, we both beautifully pulled it together. For our capstone project ( https://lnkd.in/dfgRHJaB ), we picked up Citibike data in both real-time and historical formats. Part 1 - Medium Article - https://lnkd.in/dj4cevHi : Included real-time data ingestion using 𝐊𝐚𝐟𝐤𝐚 clusters followed by the 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 data transformations using Delta tables and 𝐏𝐲𝐒𝐩𝐚𝐫𝐤 for data processing. Data validations are the most overlooked part of a project, which is why we incorporated it at highest priority using the failure recovery, idempotency and data contracts. Part 2 - Medium Article - https://lnkd.in/de6tpfkN : Involved setting up roles and users setup with permissions and data governance policies based on roles in 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞, followed by DBT setup. We also utilized 𝐒𝐭𝐫𝐞𝐚𝐦𝐥𝐢𝐭 using the data in Snowflake to further analyse the data. Using 𝐃𝐁𝐓, the project showcased numerous transformations using macros and tests making sure clean and structured flow of data for further analytics. Finally, orchestrating DBT using 𝐃𝐚𝐠𝐬𝐭𝐞𝐫 for better dependency visualizations and error handling. But wait, it’s not over yet. We also incorporated the clean data on 𝐊𝐚𝐠𝐠𝐥𝐞 (https://lnkd.in/dhzK-tZ3) so that data enthusiasts like yourself can explore it more and create your own mini projects. This data mainly contains station details in the Citibike_dim_station table and fct_citibike_data, which not only includes the trip details but also the total cost of the trip based on the duration and standard price set—this data is exclusively extracted from the DBT model created for the trip cost calculation. Kudos to our hard work and collective efforts in making the most out of this bootcamp. A big thank you to the entire DataExpert.io team Zach Wilson Mitali Gupta JulieAnn Scherer! #dataengineering #datapipeline #datajobs #dataexpert

End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline Part II

link.medium.com

Like Comment Share

DataExpert.io

Education

San Francisco, California 18,900 followers

Data Engineering education, solutions, and evangelism

About us

Locations

Employees at DataExpert.io

Preeti Prajapati

Data Engineering | Travel Adtech | Digital Marketing

Zach Wilson Zach Wilson is an Influencer

Founder @ DataExpert.io | YouTube: Data with Zach | ADHD | contact: [email protected]

Arockia Nirmal Amala Doss

Founder, Data Engineer @ ZippyTec GmbH | Data Migration & Data Engineering Consulting | Data Migration Coaching | AWS Community Builder

Joseph Corrado

Senior Software Engineer, Data @ London Stock Exchange Group

Updates

End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline — Part I

medium.com

End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline Part II

link.medium.com

Join now to see what you are missing

Similar pages

Mage

Illuminate AI

SylphAI

Learn Data Engineering

Seattle Data Guy

Airbnb

Data Engineer Things

ByteByteGo

Analyst Builder

Tabular (now part of Databricks)