data-pipeline

Here are 658 public repositories matching this topic...

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated Jul 28, 2024
Python

kestra-io / kestra

Star

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated Jul 27, 2024
Java

snowplow / snowplow

Star

The leader in Next-Generation Customer Data Infrastructure

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated May 31, 2024
Scala

apache / flink-cdc

Star

Flink CDC is a streaming data integration tool

mysql real-time kafka etl postgresql distributed batch data-integration schema-evolution elt flink cdc data-pipeline change-data-capture paimon

Updated Jul 26, 2024
Java

rudderlabs / rudder-server

Star

Privacy and Security focused Segment-alternative, in Golang and React

Updated Jul 26, 2024
Go

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Jun 19, 2024

superstreamlabs / memphis

Star

Memphis.dev is a highly scalable and effortless data streaming platform

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated May 27, 2024
Go

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Jul 25, 2024
Jupyter Notebook

bruin-data / ingestr

Star

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated Jul 25, 2024
Python

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Jul 25, 2024
HTML

pydoit / doit

Star

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated Jul 4, 2024
Python

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated Jul 25, 2024
Go

bytedance / bitsail

Star

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline