data-ingestion

Star

Here are 157 public repositories matching this topic...

apache / seatunnel

Star

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

streaming real-time offline high-performance apache batch data-integration elt cdc change-data-capture data-ingestion

Updated Dec 25, 2024
Java

bruin-data / ingestr

Star

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated Dec 24, 2024
Python

apache / paimon

Star

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated Dec 25, 2024
Java

dashbitco / broadway

Star

Concurrent and multi-stage data ingestion and data processing with Elixir

elixir broadway concurrent data-processing genstage data-ingestion

Updated Dec 22, 2024
Elixir

pravega / pravega

Star

Pravega - Streaming as a new software defined storage primitive

streaming distributed-storage real-time-data streaming-data data-ingestion

Updated Sep 3, 2024
Java

bruin-data / bruin

Star

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

python bigquery sql analytics data-transformation snowflake data-platform data-analysis data-pipelines data-modeling data-ingestion

Updated Dec 24, 2024
Go

CrunchyData / pg_parquet

Star

Copy to/from Parquet in S3 from within PostgreSQL

postgresql parquet data-migration data-ingestion columnar

Updated Dec 24, 2024
Rust

orbitalapi / orbital

Star

Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.

kotlin java microservices typescript integration etl api-management api-gateway rest-api data-engineering data-ingestion bff api-integration data-piplines bff-api taxiql semantic-integration

Updated Dec 5, 2024
TypeScript

cuebook / cuelake

Star

Use SQL to build ELT pipelines on a data lakehouse.

sql apache-spark etl pipelines data-engineering data-lake data-transfer delta data-integration upsert elt data-pipeline datalake data-ingestion spark-sql zeppelin-notebook apache-iceberg lakehouse incremental-updates

Updated May 25, 2022
JavaScript

merantix-momentum / squirrel-core

Star

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way 🌰

Updated Dec 23, 2024
Python

thedataengineeringbook / thedataengineeringbook

Star

The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย

data book data-engineering data-integration hacktoberfest data-pipeline data-engineer data-ingestion data-infrastructure

Updated Oct 28, 2023
JavaScript

apache / paimon-rust

Star

Apache Paimon Rust The rust implementation of Apache Paimon.

rust big-data real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated Oct 1, 2024
Rust

jgperrin / net.jgp.labs.spark

Star

Apache Spark examples exclusively in Java

java spark ingestion udf dataframe data-ingestion

Updated Apr 21, 2023
Java

XavientInformationSystems / Data-Ingestion-Platform

Star

spark storm apex flink batch-processing data-ingestion dip samza

Updated Feb 11, 2020
Java

merantix-momentum / squirrel-datasets-core

Star

Squirrel dataset hub

python data-science machine-learning natural-language-processing ai computer-vision deep-learning tensorflow cv ml collaboration pytorch distributed dataops cloud-computing datasets npl data-ingestion data-mesh

Updated Sep 7, 2023
Python

Dynatrace / openkit-java

Star

OpenKit Java Reference Implementation

sdk dynatrace data-ingestion dev-program

Updated Aug 2, 2024
Java

aws-samples / amazon-kinesis-data-processor-aws-fargate

Star

Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

containers amazon-kinesis data-ingestion data-processor amazon-kcl kinesis-data-streams scalable-data-stream

Updated Jul 15, 2024
Python

Dynatrace / OneAgent-SDK-for-Java

Star

Enables custom tracing of Java applications in Dynatrace

agent sdk apm dynatrace data-ingestion sdk-java oneagent dev-program

Updated Sep 3, 2024
Java

fremantle-industries / history

Star

Download and warehouse historical trading data

data-science elixir trading data-visualization data-warehouse cryptocurrency trading-algorithms data-ingestion

Updated Mar 17, 2023
Elixir

tuannd89 / elasticsearch-full-text-search

Star

search-engine elasticsearch lucene data-ingestion

Updated Sep 7, 2020

Improve this page

Add a description, image, and links to the data-ingestion topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-ingestion topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-ingestion

Here are 157 public repositories matching this topic...

apache / seatunnel

bruin-data / ingestr

apache / paimon

dashbitco / broadway

pravega / pravega

bruin-data / bruin

CrunchyData / pg_parquet

orbitalapi / orbital

cuebook / cuelake

merantix-momentum / squirrel-core

thedataengineeringbook / thedataengineeringbook

apache / paimon-rust

jgperrin / net.jgp.labs.spark

XavientInformationSystems / Data-Ingestion-Platform

merantix-momentum / squirrel-datasets-core

Dynatrace / openkit-java

aws-samples / amazon-kinesis-data-processor-aws-fargate

Dynatrace / OneAgent-SDK-for-Java

fremantle-industries / history

tuannd89 / elasticsearch-full-text-search

Improve this page

Add this topic to your repo