DataExpert.io

DataExpert.io

Education

San Francisco, California 18,903 followers

Data Engineering education, solutions, and evangelism

About us

EcZachly Inc is a company dedicated to inspiring and educating the next generation of data talent!

Website
https://www.dataexpert.io
Industry
Education
Company size
2-10 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2023

Locations

Employees at DataExpert.io

Updates

  • DataExpert.io reposted this

    View profile for Billy Switzer, graphic

    Data Engineer and Architect | Musician

    I am beyond honored, humbled, and grateful that my capstone project for the DataExpert.io V4 cohort was chosen as one of the top 5 individual submissions. Thank you so much, Zach Wilson and staff for your consideration, and for a fantastic bootcamp! Special shoutout to Jo Hjersman for the extra TA session you held before the project deadline - I wouldn't have received this honor without your guidance! Congratulations to my fellow honorees Aayushi Beniwal, Erich Silva, Hilary S., Maria Vysochina, Meeta Pandit, and Yavuz Karabiyik! For those interested in checking out my project, it can be found at https://lnkd.in/gSsS4AU5 (subject to revision). The original submission is https://lnkd.in/gwqYUrWR. #dataengineering #stocks #spark #airflow #sql #superset #iceberg

    • No alternative text description for this image
  • DataExpert.io reposted this

    View profile for Donny Williams, graphic

    Data Solutions Engineer I @ HealthEquity

    I’m happy to share that I’ve obtained a new certification: DataEngineer.io Combined Excellence Certification from DataExpert.io! Thank you Zach Wilson, JulieAnn Scherer, Mitali Gupta and a whole host of amazing engineers and professionals for creating a collaborative, challenging, and creative bootcamp! The community the team has built around the DataExpert.io platform has vastly improved my data career possibilities. The 6 week course was both exciting as it was exhausting, but I would 00 do it again. It's just that good. I look forward to engaging with the community in the future as my career grows!

    This content isn’t available here

    Access this content and more in the LinkedIn app

  • DataExpert.io reposted this

    View organization page for Monte Carlo, graphic

    27,325 followers

    Data Quality Day is coming up FAST! ⚡ Join us July 30th for Data Quality Day: Scaling Data Trust Across Business Domains. 🔥 You'll hear leading strategies and insights from: ⚡ Zach Wilson, Founder at DataExpert.ioSeth Y., Global Field CTO SnowflakeZachary Lancaster, Data Engineering Manager at Warner Bros. DiscoveryBronte Baer, Data Platform & Analytics Manager at EarnestSiva Veera, Data Engineering Manager at Riot GamesBarr Moses, CEO & Co-Founder, Monte Carlo Register here: https://lnkd.in/eceYmVsT #dataqualityday #dataengineering #dataanalytics

    • No alternative text description for this image
  • DataExpert.io reposted this

    View profile for Michael Marquez, graphic

    Analytics Engineer at PlayStation 🎮

    I’m happy to share that I’ve obtained a new certification: Data Engineering Boot Camp from DataExpert.io! This boot camp was a rollercoaster and felt like giving up plenty of time. Glad I didn't! The technical lessons were great, but I mostly appreciated Zach Wilson's insights on big tech culture and working with others. Special thanks to Bruno Souza de Lima and team DataExpert.io

    This content isn’t available here

    Access this content and more in the LinkedIn app

  • DataExpert.io reposted this

    View profile for Renee S. Liu, graphic

    Solutions Engineer at Scale AI

    As any production-grade ML model needs to be in a CI/CD pipeline to perform optimally, I've just upgraded my feature set and hyperparameters! Proud to receive my certification: Data Engineering from DataExpert.io! with the master Zach Wilson and his team. As a busy professional, my scarce time is only invested in something truly significant. https://lnkd.in/gKSU95Kz

    This content isn’t available here

    Access this content and more in the LinkedIn app

  • DataExpert.io reposted this

    View organization page for Monte Carlo, graphic

    27,325 followers

    📢 Calling all data leaders! 📢 Join us on July 30th for the latest webinar in our Data Quality Day – Scaling Data Trust Across Business Domains – a half-day virtual webinar featuring insights from leading data teams as they explore crucial strategies for ensuring data trust in a data-driven world. 🔥 You'll hear leading strategies from: ✅ Zach Wilson, Founder at DataExpert.ioSeth Y., Global Field CTO SnowflakeZachary Lancaster, Data Engineering Manager at Warner Bros. DiscoveryBronte Baer, Data Platform & Analytics Manager at EarnestSiva Veera, Data Engineering Manager at Riot GamesBarr Moses, CEO & Co-Founder, Monte Carlo You don't want to miss this! Register here: https://lnkd.in/eceYmVsT #dataquality #datatrust #dataengineering #dataobservability #dataqualityday

    • No alternative text description for this image
  • DataExpert.io reposted this

    View profile for Meeta Pandit, graphic

    Data Engineer | Building optimized Data Pipelines | Cloud Computing

    🚀 I am thrilled to share that I completed a stellar data engineering bootcamp at DataExpert.io with a successful capstone project and published my first ever Medium article that I am beyond excited to share with you all! Here are some highlights of the journey and how I accomplished it: ⚖ The Dream Team: Through the amazing community of fellow data nerds at the bootcamp, I met my fantastic project partner, Aayushi Beniwal. She shares my passion for data and technology and is also a nature lover keen on positively impacting the planet. Therefore, we teamed up to architect an end-to-end data pipeline using NYC bike-share data, perfectly aligned with our objectives. We outlined tasks and timelines and proactively shared updates to address any blockers and stay on top of our plan. 🎯 Objectives The project aims to provide near real-time updates on NYC Citibike’s station status and bike locations to operational teams Additionally, the project seeks to analyze historical trends using over 10 years (~2.2 GB) of data on trips taken, focusing on their seasonality, peak times, and stations with the highest bike usage 🔗 Github repo: https://bit.ly/3XZpvGe 🛠 Design and Execution Part I - Streaming Data Pipeline (https://bit.ly/3W49MmI) This was achieved by setting up Kafka topics to ingest data from 2 NYC bike-share data feeds as JSON APIs in real-time. The volume of incoming data was ~289 MB per day Processed this huge volume of data using Spark Streaming and PySpark in Databricks. Utilized micro-batch architecture to stream transformed data to the analytics layer, effectively reducing the load on executors and significantly cutting down processing costs Part II - Historical Trend Analysis (https://bit.ly/4ctSdnb) Created a detailed dimensional model and Processed ~93 MM (2.2 GB) of raw data into Snowflake using DBT transformations like created models for dim_stations, and fact_trips, added macros to calculate cost of each trip taken, and created snapshot (SCD) tables for tracking changes in prices over time Orchestrated DBT transformations using Dagster for dependency monitoring and error handling Data Quality was the highest priority for both pipelines and utilized frameworks like chispa (python lib) for column and dataframe integration testing in Spark and dbt_expectations with custom and generic tests for analytics pipeline. Data Integrity was especially critical in real-time data pipeline which was handled by storing checkpoints and watermarks for late-arriving events in Spark Streaming for failure recovery, and schema registry for strict data contracts in Kafka. A huge shoutout to everyone at DataExpert.io who contributed to the success of this bootcamp and especially Zach Wilson for always inspiring us to bring out the best in us.

    End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline — Part I

    End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline — Part I

    medium.com

  • View organization page for DataExpert.io, graphic

    18,903 followers

    The capstone projects by DataExpert.io students are coming out and they are truly amazing!

    View profile for Aayushi Beniwal, graphic

    Data Engineer | University of Waterloo | Masters in Electrical and Computer Engineering | Specialization in Artificial Intelligence and Machine Learning | DBT | SQL | Python | Snowflake | AWS | Azure

    🔔 𝐌𝐲 𝐟𝐢𝐫𝐬𝐭 𝐃𝐚𝐭𝐚 𝐦𝐞𝐝𝐢𝐮𝐦 𝐚𝐫𝐭𝐢𝐜𝐥𝐞 along with 𝐊𝐚𝐠𝐠𝐥𝐞 𝐝𝐚𝐭𝐚𝐬𝐞𝐭 𝐚𝐧𝐝 a complete 𝐄𝐧𝐝-𝐭𝐨-𝐄𝐧𝐝 𝐃𝐚𝐭𝐚 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐜𝐚𝐩𝐬𝐭𝐨𝐧𝐞 𝐩𝐫𝐨𝐣𝐞𝐜𝐭!! 🔔 It all began with the 6-week live DataExpert.io bootcamp by Zach Wilson this June. Needless to say who he is—one of the most experienced and talented data experts in the industry. During the bootcamp assignments and lecture discussions in the Discord channel, I happened to meet my amazing project partner, Meeta Pandit. Full of ideas, we structured our capstone and combined our expertise and learnings to get the best out of us. Despite being in slightly different time zones, we both beautifully pulled it together. For our capstone project ( https://lnkd.in/dfgRHJaB ), we picked up Citibike data in both real-time and historical formats. Part 1 - Medium Article - https://lnkd.in/dj4cevHi :  Included real-time data ingestion using 𝐊𝐚𝐟𝐤𝐚 clusters followed by the 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 data transformations using Delta tables and 𝐏𝐲𝐒𝐩𝐚𝐫𝐤 for data processing. Data validations are the most overlooked part of a project, which is why we incorporated it at highest priority using the failure recovery, idempotency and data contracts.  Part 2 - Medium Article - https://lnkd.in/de6tpfkN : Involved setting up roles and users setup with permissions and data governance policies based on roles in 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞, followed by DBT setup. We also utilized 𝐒𝐭𝐫𝐞𝐚𝐦𝐥𝐢𝐭 using the data in Snowflake to further analyse the data. Using 𝐃𝐁𝐓, the project showcased numerous transformations using macros and tests making sure clean and structured flow of data for further analytics. Finally, orchestrating DBT using 𝐃𝐚𝐠𝐬𝐭𝐞𝐫 for better dependency visualizations and error handling. But wait, it’s not over yet. We also incorporated the clean data on 𝐊𝐚𝐠𝐠𝐥𝐞 (https://lnkd.in/dhzK-tZ3) so that data enthusiasts like yourself can explore it more and create your own mini projects. This data mainly contains station details in the Citibike_dim_station table and fct_citibike_data, which not only includes the trip details but also the total cost of the trip based on the duration and standard price set—this data is exclusively extracted from the DBT model created for the trip cost calculation. Kudos to our hard work and collective efforts in making the most out of this bootcamp. A big thank you to the entire DataExpert.io team Zach Wilson Mitali Gupta JulieAnn Scherer! #dataengineering #datapipeline #datajobs #dataexpert

    End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline Part II

    End to End Pyspark, Databricks, Kafka, DBT, Snowflake and Dagster ETL Pipeline Part II

    link.medium.com

Similar pages