Tecton’s Post

View organization page for Tecton, graphic

20,519 followers

3mo

Boost your ML workflow with Tecton's powerful declarative framework! 🚀 This new blog post reveals how to streamline feature engineering and bridge the data science-engineering gap. 🌉 Key reasons to dive in: 🤝 Learn about unified feature definitions for seamless collaboration ⚙ Automate feature pipelines from dev to production effortlessly 📊 Tackle training/serving skew for robust model performance 💰 Cut infrastructure costs while maximizing efficiency Ready to level up your ML game? This practical guide from Sergio F. is packed with insights to transform feature engineering. #MachineLearning #FeatureEngineering #MLOps #DataScience https://lnkd.in/g5rzf-au

A Practical Guide to Tecton’s Declarative Framework | Tecton

tecton.ai

To view or add a comment, sign in

More Relevant Posts

MachineHack Generative AI

15,858 followers
3mo
Report this post
📊 At the Data Engineering Summit 2024, Sachin Tripathi from the Bureau highlighted innovative tools that are revolutionizing ML pipelines and analytical workloads. By leveraging Metaflow and Superset, data engineers can streamline processes, enhance productivity, and ensure accuracy. Sachin spoke about how Metaflow simplifies ML development with flexible scaling, seamless AWS integration, and easy deployment, making it ideal for complex tasks. Superset, on the other hand, empowers users with powerful SQL querying and robust data visualization, enhancing real-time analytics. To know more about the session, read the below article 👇 https://lnkd.in/gy-5bJaP #DES2024 #DataEngineering #MachineLearning #Analytics #Innovation

An Overview of ML Pipelines and Analytical Workloads

https://business.machinehack.com
Like Comment
To view or add a comment, sign in
Nicky Clarke

Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems. Open to leadership roles.
9mo
Report this post
Application scenarios to consider in your AI and machine learning strategies.
Aurimas Griciūnas Aurimas Griciūnas is an Influencer

Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
9mo

What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
Like Comment
To view or add a comment, sign in
Lava Kafle
9mo
Report this post
👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in hashtag #MLOps, hashtag #MachineLearning, hashtag #DataEngineering, hashtag #DataScience and overall hashtag #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴. Activate to view larger image,
Aurimas Griciūnas Aurimas Griciūnas is an Influencer

Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
9mo

What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
Like Comment
To view or add a comment, sign in
Siva Naveen Siddartha Nasari

Sr Scientific Informatics Analyst @ B. Braun Medical | CS @ UPenn
4mo
Report this post
Just finished reading the 2024 State of Data Engineering Report and it's a must-read for anyone in the data field! The report presents a goldmine of insights with commendable analysis depth and clarity of trends. One of the highlights includes the impact of Generative AI on data engineering, affecting storage, computation engines, MLOps, and observability tools. Additionally, the report discusses the economic downturn's challenges on tech companies' growth, with a few leading the GenAI revolution. The competition between Databricks and Snowflake over open table formats and catalogs is also notable, with drama around Apache Iceberg community decisions. Furthermore, the report highlights the emergence of new players like XetHub and Oxen, emphasizing the importance of data version control as foundational infrastructure. #DataEngineering #TechTrends

The State of Data Engineering 2024

https://lakefs.io
Like Comment
To view or add a comment, sign in
John DesJardins

Chief Technology Officer / Chief Data Officer / Fractional CTO | AI/ML, Data, IoT | "Data, Data, Data! I can't make bricks without clay!"
9mo
Report this post
Great summary of AI/ML deployment types. #machinelearning #mlops
Aurimas Griciūnas Aurimas Griciūnas is an Influencer

Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
9mo

What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
Like Comment
To view or add a comment, sign in
Aurimas Griciūnas Aurimas Griciūnas is an Influencer

Follow me to Learn about GenAI and Data Engineering Systems | Author of SwirlAI Newsletter | Public Speaker
9mo
Report this post
What are the four 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀? Even if you will not work with them day to day, the following are the four ways to deploy a ML Model you should know and understand as a MLOps/ML Engineer. ➡️ 𝗕𝗮𝘁𝗰𝗵: 👉 You apply your trained models as a part of ETL/ELT Process on a given schedule. 👉 You load the required Features from a batch storage, apply inference and save the results to a batch storage. 👉 It is sometimes falsely thought that you can’t use this method for Real Time Predictions. 👉 Inference results can be loaded into a real time storage and used for real time applications. ➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 👉 You apply your trained models as a part of Stream Processing Pipeline. 👉 While Data is continuously piped through your Streaming Data Pipelines, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage. 👉 This deployment type is likely to involve a real time Feature Store Serving API to retrieve additional Static Features for inference purposes. 👉 Predictions can be consumed by multiple applications subscribing to the Inference Stream. ➡️ 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲: 👉 You expose your model as a Backend Service (REST or gRPC). 👉 This ML Service retrieves Features needed for inference from a Real Time Feature Store Serving API. 👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms API Contract. ➡️ 𝗘𝗱𝗴𝗲: 👉 You embed your trained model directly into the application that runs on a user device. 👉 This method provides the lowest latency and improves privacy. 👉 Data in most cases is generated and lives inside of device significantly improving the security. What types of deployments are you mostly working on? Let me know in the comments! 👇 -------- Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space. 𝗗𝗼𝗻’𝘁 𝗳𝗼𝗿𝗴𝗲𝘁 𝘁𝗼 𝗹𝗶𝗸𝗲 👍, 𝘀𝗵𝗮𝗿𝗲 𝗮𝗻𝗱 𝗰𝗼𝗺𝗺𝗲𝗻𝘁! Join a growing community of Data Professionals by subscribing to my 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿/𝗕𝗹𝗼𝗴.
21 Comments
Like Comment
To view or add a comment, sign in
Sandy Ryza

Lead Engineer - Dagster
8mo
Report this post
Earlier this week I advocated for monolithic data platforms, but they come with a big challenge: they mean many teams and use cases in close contact. Close contact means it's easy for these teams to trample each other. Trampling takes many forms: - **Starvation**: a data pipeline uses up resources needed by another pipeline. E.g. a low-priority experimental pipeline delays a business-critical report by occupying all available slots on the cluster. - **Monolithic dependencies**: library version choices in one pipeline restrict options elsewhere. E.g. an ancient pipeline depends on an old version of Pandas, so nobody else is able to use more recent Pandas versions and the new features that come with them. - **Coupled failures**: an error in one pipeline takes other pipelines down with it. E.g. a data engineer accidentally pushes a new pipeline with an invalid cron-schedule, and then every other pipeline in the platform can't load. - **Clutter**: another team's assets or pipelines make it hard to find what you're looking for. E.g. a massive dbt project is onboarded onto the platform, and whenever you search in the catalog the results you're looking for are buried under scores of "int_" tables. The takeaway for data infra tools is that good fences make good neighbors. I.e. it requires designing for multi-tenancy & isolation all the way through. In Dagster, this is extra challenging, because we're in the business of running user code, which is generally what's responsible for doing the trampling. And we don't just run pipelines, we also run user-defined event-based logic that decides when those pipelines should run. To credibly support monolithic data platforms while dealing with this, Dagster has needed to develop deep isolation abstractions, like code locations, which isolate both dependencies and scheduling logic evaluation.

2 Comments
Like Comment
To view or add a comment, sign in
Renchu (Richard) Song

CEO @ Epsilla | Y Combinator | ex-TigerGraph Senior Director | ex-Meta | Cornell Alum | Building the data and knowledge foundation for AI
4mo Edited
Report this post
Curious how Epsilla (YC S23)'s architecture empowers our customers to load hundreds of thousands of files for building their AI assistant? Dive into our latest blog on Large Scale ETL for unstructured data in RAG systems! Whether you're dealing with hundreds of thousands of files or large, text-intensive files exceeding 10,000 pages and hundreds of MBs in size, Epsilla's RAG as a Service platform is designed to tackle these challenges head-on. Read more about how industries like legal, construction, and education are leveraging our technology for transformative results. Don't miss out on the opportunity to transform your data AI strategy. Sign up and try these capabilities for FREE today at https://cloud.epsilla.com. #RAG #ETL #EpsillaCloud #BigData

Large Scale Smart ETL for Unstructured Data in RAG Systems with Epsilla

blog.epsilla.com

2 Comments
Like Comment
To view or add a comment, sign in
Jeff Gilley

Business Development Leader at Oakland
9mo
Report this post
Interested in some practical advice on what's coming to data and analytics in 2024? Check out Jake Watson's latest post on his Data Platform Blog. It cuts through the hype and points you to core thinking on what's important.

Jake Watson

Writer of thedataplatform.substack.com | Principal Data Engineer @ Oakland
9mo Edited

A dumb twist on a popular (and overdone?) blog format of predicting what will happen in the next 360 odd days, I present the Not So Bold Data Predictions of 2024! Plus, we have another great selection of data articles: 📈 How Analytics Can Make a Massive Impact on the Bottom Line by Ergest Xheblati 🗻 Understanding Parquet, Iceberg and Data Lakehouses by David Gomes 🗺 Four pitfalls of spatiotemporal data analysis and how to avoid them 💡 Data Explained: Idempotence by Matt Palmer ❓Is it Time for Composable Data Systems? by Jordan Volz 📂 DevOps in Snowflake: How Git and Database Change Management enable a file-based object lifecycle by Vincent Raudszus Have you spotted any other trends in data that I missed? Any new predictions? Feel free to comment below! What receive future newsletter updates? Follow, connect or subscribe! #dataengineering #dataanalysis #dataplatforms

Issue #32: Reviewing 2023 and Not-So-Bold 2024 Predictions

thedataplatform.substack.com
Like Comment
To view or add a comment, sign in
Jake Watson

Writer of thedataplatform.substack.com | Principal Data Engineer @ Oakland
9mo Edited
Report this post
A dumb twist on a popular (and overdone?) blog format of predicting what will happen in the next 360 odd days, I present the Not So Bold Data Predictions of 2024! Plus, we have another great selection of data articles: 📈 How Analytics Can Make a Massive Impact on the Bottom Line by Ergest Xheblati 🗻 Understanding Parquet, Iceberg and Data Lakehouses by David Gomes 🗺 Four pitfalls of spatiotemporal data analysis and how to avoid them 💡 Data Explained: Idempotence by Matt Palmer ❓Is it Time for Composable Data Systems? by Jordan Volz 📂 DevOps in Snowflake: How Git and Database Change Management enable a file-based object lifecycle by Vincent Raudszus Have you spotted any other trends in data that I missed? Any new predictions? Feel free to comment below! What receive future newsletter updates? Follow, connect or subscribe! #dataengineering #dataanalysis #dataplatforms

Issue #32: Reviewing 2023 and Not-So-Bold 2024 Predictions

thedataplatform.substack.com

8 Comments
Like Comment
To view or add a comment, sign in

20,519 followers

View Profile Follow

Tecton’s Post

More Relevant Posts

Explore topics