data-pipeline

Here are 711 public repositories matching this topic...

apache / shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

mysql sql database bigdata postgresql shard distributed-database encrypt data-pipeline data-encryption database-cluster distributed-transaction read-write-splitting database-middleware distributed-sql-database database-gateway

Updated Nov 1, 2024
Java

airbytehq / airbyte

Star

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated Nov 1, 2024
Python

snowplow / snowplow

Star

The leader in Next-Generation Customer Data Infrastructure

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated Sep 2, 2024
Scala

apache / flink-cdc

Star

Flink CDC is a streaming data integration tool

mysql real-time kafka etl postgresql distributed batch data-integration schema-evolution elt flink cdc data-pipeline change-data-capture paimon

Updated Oct 29, 2024
Java

rudderlabs / rudder-server

Star

Privacy and Security focused Segment-alternative, in Golang and React

Updated Oct 31, 2024
Go

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Jun 19, 2024

superstreamlabs / memphis

Star

Memphis.dev is a highly scalable and effortless data streaming platform

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated May 27, 2024
Go

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

python data-science machine-learning analytics logging constraints dataset dataops data-pipeline data-quality calculate-statistics data-constraints mlops model-performance ml-pipelines ai-pipelines approximate-statistics statistical-properties

Updated Nov 1, 2024
Jupyter Notebook

bruin-data / ingestr

Star

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated Oct 31, 2024
Python

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Oct 31, 2024
HTML

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated Oct 18, 2024
Go

pydoit / doit

Star

CLI task management & automation tool

python cli workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated Jul 4, 2024
Python

bytedance / bitsail

Star

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline

Updated Jan 1, 2024
Java

Multiwoven / multiwoven

Star

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation

Updated Oct 30, 2024
Ruby

GoogleCloudPlatform / data-science-on-gcp

Star

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated May 1, 2024
Jupyter Notebook

damklis / DataEngineeringProject

Star

Example end to end data engineering project.

python redis elasticsearch airflow kafka big-data mongodb scraping django-rest-framework s3 data-engineering minio kafka-connect hacktoberfest data-pipeline debezium

Updated Dec 8, 2022
Python

spotify / klio

Star

Smarter data pipelines for audio.

signal-processing data-pipeline audio-processing media-processing

Updated Jan 10, 2024
Python

AgnostiqHQ / covalent

Star

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

Updated Oct 14, 2024
Python

superlinked / superlinked

Star

A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured unstructured) data, with ultra-modal vector embeddings.

python nlp natural-language-processing information-retrieval deep-learning etl retrieval ml embeddings vectorization semantic-search data-pipeline mlops vector-search vector-database llm retrieval-augmented-generation

Updated Oct 31, 2024
Jupyter Notebook

infoslack / awesome-kafka

Star

A list about Apache Kafka

infrastructure kafka apache-spark stream-processing apache-kafka kafka-streams data-processing data-pipeline streaming-data

Updated Feb 9, 2024

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-pipeline

Here are 711 public repositories matching this topic...

apache / shardingsphere

airbytehq / airbyte

snowplow / snowplow

apache / flink-cdc

rudderlabs / rudder-server

adilkhash / Data-Engineering-HowTo

superstreamlabs / memphis

whylabs / whylogs

bruin-data / ingestr

elementary-data / elementary

reugn / go-streams

pydoit / doit

bytedance / bitsail

Multiwoven / multiwoven

GoogleCloudPlatform / data-science-on-gcp

damklis / DataEngineeringProject

spotify / klio

AgnostiqHQ / covalent

superlinked / superlinked

infoslack / awesome-kafka

Improve this page

Add this topic to your repo