We are pleased to announce a new integration that will allow data practitioners to easily include OpenAI API calls as part of their data pipelines. Besides generating AI responses, the integration provides insights that let you optimize your API calls and credit consumption.
Building Generative AI Steps into Your Data Pipelines
There are many potential uses of an OpenAI API call in a data pipeline. Allow me to quote an expert on the matter: ChatGPT:
While we can't yet picture all of the possible use cases of a generative AI step in a data pipeline, here are some scenarios that seem valuable:
- Submit a large document to OpenAI's API and request a summary of the document.
- Submit a customer testimonial and request a standardized classification for sentiment analysis
- Submit foreign language text and request a local translation
Here at Dagster Labs, we've used dagster-openai to summarize the category and generate learning summaries from GitHub issues and discussions. Our pipeline handles complex support requests. It provides a first-stab answer to user questions (speeding up our support team's response times), auto-categorizes the issue, and generates learning summaries on a weekly basis.
Keeping Your Costs Under Control
While generative AI offers a broad spectrum of capabilities, managing costs is essential. Dagster Labs is committed to providing the necessary tools to build your pipeline for optimal cost efficiency and performance. To this end, we introduce the OpenAIResource
alongside the with_usage_metadata
function from our library, ensuring uniform resource utilization across our platform.
Both Dagster Cloud and open-source users can take advantage of these features to monitor and optimize their data pipelines. For Dagster Cloud users, this functionality is seamlessly integrated with Dagster Insights, providing an enhanced experience with additional analytical capabilities. Meanwhile, open-source users can also leverage these tools and log their metadata, which they can then visualize as a metadata plot directly in the UI.
This unified approach ensures all users can effectively control their costs while maximizing the benefits of generative AI.
AI's Long-Term Impact on Data Engineering Roles
- Name
- Fraser Marlow
- Handle
- @frasermarlow
10 Reasons Why No-Code Solutions Almost Always Fail
- Name
- TéJaun RiChard
- Handle
- @tejaun
5 Best Practices AI Engineers Should Learn From Data Engineering
- Name
- TéJaun RiChard
- Handle
- @tejaun