Kinesis [ARCHIVED]
Prerequisites
- For Airbyte Open Source users using the Postgres source connector, upgrade your Airbyte platform to version
v0.40.0-alpha
or newer and upgrade your Kinesis connector to version0.1.4
or newer
Sync overview
Output schema
The incoming Airbyte data is structured in a Json format and is sent across diferent stream shards determined by the partition key. This connector maps an incoming data from a namespace and stream to a unique Kinesis stream. The Kinesis record which is sent to the stream is consisted of the following Json fields
_airbyte_ab_id
: Random UUID generated to be used as a partition key for sending data to different shards._airbyte_emitted_at
: a timestamp representing when the event was received from the data source._airbyte_data
: a json text/object representing the data that was received from the data source.
Features
Feature | Support | Notes |
---|---|---|
Full Refresh Sync | ❌ | |
Incremental - Append Sync | ✅ | Incoming messages are streamed/appended to a Kinesis stream as they are received. |
Incremental - Append Deduped | ❌ | |
Namespaces | ✅ | Namespaces will be used to determine the Kinesis stream name. |
Performance considerations
Although Kinesis is designed to handle large amounts of real-time data by scaling streams with shards, you should be aware of the following Kinesis Quotas and Limits. The connector buffer size should also be tweaked according to your data size and freguency
Getting started
Requirements
- The connector is compatible with the latest Kinesis service version at the time of this writing.
- Configuration
- Endpoint: Aws Kinesis endpoint to connect to. Default endpoint if not provided
- Region: Aws Kinesis region to connect to. Default region if not provided.
- shardCount: The number of shards with which the stream should be created. The amount of shards affects the throughput of your stream.
- accessKey: Access key credential for authenticating with the service.
- privateKey: Private key credential for authenticating with the service.
- bufferSize: Buffer size used to increase throughput by sending data in a single request.
Setup guide
Changelog
Expand to review
Version | Date | Pull Request | Subject |
---|---|---|---|
0.1.5 | 2022-09-22 | 16952 | Add required config fields |