Boost your ML workflow with Tecton's powerful declarative framework! 🚀 This new blog post reveals how to streamline feature engineering and bridge the data science-engineering gap. 🌉 Key reasons to dive in: 🤝 Learn about unified feature definitions for seamless collaboration ⚙ Automate feature pipelines from dev to production effortlessly 📊 Tackle training/serving skew for robust model performance 💰 Cut infrastructure costs while maximizing efficiency Ready to level up your ML game? This practical guide from Sergio F. is packed with insights to transform feature engineering. #MachineLearning #FeatureEngineering #MLOps #DataScience https://lnkd.in/g5rzf-au
Tecton’s Post
More Relevant Posts
-
📊 At the Data Engineering Summit 2024, Sachin Tripathi from the Bureau highlighted innovative tools that are revolutionizing ML pipelines and analytical workloads. By leveraging Metaflow and Superset, data engineers can streamline processes, enhance productivity, and ensure accuracy. Sachin spoke about how Metaflow simplifies ML development with flexible scaling, seamless AWS integration, and easy deployment, making it ideal for complex tasks. Superset, on the other hand, empowers users with powerful SQL querying and robust data visualization, enhancing real-time analytics. To know more about the session, read the below article 👇 https://lnkd.in/gy-5bJaP #DES2024 #DataEngineering #MachineLearning #Analytics #Innovation
An Overview of ML Pipelines and Analytical Workloads
https://business.machinehack.com
To view or add a comment, sign in
-
Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems. Open to leadership roles.
Application scenarios to consider in your AI and machine learning strategies.
Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
To view or add a comment, sign in
-
👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in hashtag #MLOps, hashtag #MachineLearning, hashtag #DataEngineering, hashtag #DataScience and overall hashtag #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴. Activate to view larger image,
Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
To view or add a comment, sign in
-
Just finished reading the 2024 State of Data Engineering Report and it's a must-read for anyone in the data field! The report presents a goldmine of insights with commendable analysis depth and clarity of trends. One of the highlights includes the impact of Generative AI on data engineering, affecting storage, computation engines, MLOps, and observability tools. Additionally, the report discusses the economic downturn's challenges on tech companies' growth, with a few leading the GenAI revolution. The competition between Databricks and Snowflake over open table formats and catalogs is also notable, with drama around Apache Iceberg community decisions. Furthermore, the report highlights the emergence of new players like XetHub and Oxen, emphasizing the importance of data version control as foundational infrastructure. #DataEngineering #TechTrends
The State of Data Engineering 2024
https://lakefs.io
To view or add a comment, sign in
-
Chief Technology Officer / Chief Data Officer / Fractional CTO | AI/ML, Data, IoT | "Data, Data, Data! I can't make bricks without clay!"
Great summary of AI/ML deployment types. #machinelearning #mlops
Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
To view or add a comment, sign in
-
Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
To view or add a comment, sign in
-
Earlier this week I advocated for monolithic data platforms, but they come with a big challenge: they mean many teams and use cases in close contact. Close contact means it's easy for these teams to trample each other. Trampling takes many forms: - **Starvation**: a data pipeline uses up resources needed by another pipeline. E.g. a low-priority experimental pipeline delays a business-critical report by occupying all available slots on the cluster. - **Monolithic dependencies**: library version choices in one pipeline restrict options elsewhere. E.g. an ancient pipeline depends on an old version of Pandas, so nobody else is able to use more recent Pandas versions and the new features that come with them. - **Coupled failures**: an error in one pipeline takes other pipelines down with it. E.g. a data engineer accidentally pushes a new pipeline with an invalid cron-schedule, and then every other pipeline in the platform can't load. - **Clutter**: another team's assets or pipelines make it hard to find what you're looking for. E.g. a massive dbt project is onboarded onto the platform, and whenever you search in the catalog the results you're looking for are buried under scores of "int_" tables. The takeaway for data infra tools is that good fences make good neighbors. I.e. it requires designing for multi-tenancy & isolation all the way through. In Dagster, this is extra challenging, because we're in the business of running user code, which is generally what's responsible for doing the trampling. And we don't just run pipelines, we also run user-defined event-based logic that decides when those pipelines should run. To credibly support monolithic data platforms while dealing with this, Dagster has needed to develop deep isolation abstractions, like code locations, which isolate both dependencies and scheduling logic evaluation.
To view or add a comment, sign in
-
CEO @ Epsilla | Y Combinator | ex-TigerGraph Senior Director | ex-Meta | Cornell Alum | Building the data and knowledge foundation for AI
Curious how Epsilla (YC S23)'s architecture empowers our customers to load hundreds of thousands of files for building their AI assistant? Dive into our latest blog on Large Scale ETL for unstructured data in RAG systems! Whether you're dealing with hundreds of thousands of files or large, text-intensive files exceeding 10,000 pages and hundreds of MBs in size, Epsilla's RAG as a Service platform is designed to tackle these challenges head-on. Read more about how industries like legal, construction, and education are leveraging our technology for transformative results. Don't miss out on the opportunity to transform your data AI strategy. Sign up and try these capabilities for FREE today at https://cloud.epsilla.com. #RAG #ETL #EpsillaCloud #BigData
Large Scale Smart ETL for Unstructured Data in RAG Systems with Epsilla
blog.epsilla.com
To view or add a comment, sign in
-
Interested in some practical advice on what's coming to data and analytics in 2024? Check out Jake Watson's latest post on his Data Platform Blog. It cuts through the hype and points you to core thinking on what's important.
A dumb twist on a popular (and overdone?) blog format of predicting what will happen in the next 360 odd days, I present the Not So Bold Data Predictions of 2024! Plus, we have another great selection of data articles: 📈 How Analytics Can Make a Massive Impact on the Bottom Line by Ergest Xheblati 🗻 Understanding Parquet, Iceberg and Data Lakehouses by David Gomes 🗺 Four pitfalls of spatiotemporal data analysis and how to avoid them 💡 Data Explained: Idempotence by Matt Palmer ❓Is it Time for Composable Data Systems? by Jordan Volz 📂 DevOps in Snowflake: How Git and Database Change Management enable a file-based object lifecycle by Vincent Raudszus Have you spotted any other trends in data that I missed? Any new predictions? Feel free to comment below! What receive future newsletter updates? Follow, connect or subscribe! #dataengineering #dataanalysis #dataplatforms
Issue #32: Reviewing 2023 and Not-So-Bold 2024 Predictions
thedataplatform.substack.com
To view or add a comment, sign in
-
A dumb twist on a popular (and overdone?) blog format of predicting what will happen in the next 360 odd days, I present the Not So Bold Data Predictions of 2024! Plus, we have another great selection of data articles: 📈 How Analytics Can Make a Massive Impact on the Bottom Line by Ergest Xheblati 🗻 Understanding Parquet, Iceberg and Data Lakehouses by David Gomes 🗺 Four pitfalls of spatiotemporal data analysis and how to avoid them 💡 Data Explained: Idempotence by Matt Palmer ❓Is it Time for Composable Data Systems? by Jordan Volz 📂 DevOps in Snowflake: How Git and Database Change Management enable a file-based object lifecycle by Vincent Raudszus Have you spotted any other trends in data that I missed? Any new predictions? Feel free to comment below! What receive future newsletter updates? Follow, connect or subscribe! #dataengineering #dataanalysis #dataplatforms
Issue #32: Reviewing 2023 and Not-So-Bold 2024 Predictions
thedataplatform.substack.com
To view or add a comment, sign in
20,519 followers