-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Run Stream Enrich 0 10
🚧 The latest Stream Enrich documentation can be found on the Snowplow documentation site.
This documentation is for version 0.5.0 - 0.10.0 of Stream Enrich.
Stream Enrich is an executable jarfile which should be runnable from any Unix-like shell environment. Simply provide the configuration file as a parameter:
$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.json
This will start the Stream Enrich app to read raw events from Kinesis and write enriched events back to Kinesis.
If you are using configurable enrichments, provide the path to your enrichments directory as a parameter:
$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.js --enrichments file:path/to/enrichments
If you are storing the resolver and/or enrichments in DynamoDB, use the "dynamodb:" prefix in place of the "file:" prefix:
$ ./snowplow-stream-enrich-0.x.0 --config my.conf --resolver dynamodb:eu-west-1/ConfigurationTable/resolver --enrichments dynamodb:eu-west-1/ConfigurationTable/enrichment_
The above command that the enrichments and resolver are stored in a table named ConfigurationTable in eu-west-1, that the hash key for that table is "id", that the resolver JSON is stored in an item whose hash key has value "resolver", and the enrichments are stored in items whose hash keys have values beginning with "enrichment_".
When developing the Scala collector and Kinesis enrichment components, we realized that there were strong parallels between the Kinesis stream processing paradigm and conventional Unix stdio
I/O streams. As a result, we added the ability for:
- Scala Stream Collector to write Snowplow raw events to
stdout
instead of a Kinesis stream - Stream Enrich to read Snowplow raw events from
stdin
, and write enriched events tostdout
This has a nice side-effect: it is possible to run Snowplow in a "local mode", where you simply pipe the output of Scala Stream Collector directly into Stream Enrich, and can then see the generated enriched events printed to your console. You can run Snowplow in local mode with a shell script like this:
#!/bin/sh
echo "Piping local collector into local enrichment..."
./snowplow-stream-collector-0.1.0 --config ./collector.conf | ./snowplow-kinesis-enrich-0.1.0 --config ./enrich.conf
Make sure to set the sources and sinks in your configuration files (Scala configuration template, Kinesis enrich template) to the relevant stdio
/stdout
settings.
Snowplow "local mode" could be helpful for debugging Snowplow tracker implementations before putting tags live.
Stream Enrich uses slf4j logging. If you run the executable jarfile using the java -jar
command, you can set the log level as a system property:
$ java -jar -Dorg.slf4j.simpleLogger.defaultLogLevel=debug \
snowplow-stream-enrich-0.x.0 --config my.conf --resolver file:resolver.json
This will also affect messages logged by the Kinesis Client Library(which Stream Enrich uses to read from Kinesis.)
You have setup Stream Enrich! You are now ready to setup alternative data stores.
Return to the setup guide.
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2021 Snowplow Analytics Ltd. Documentation terms of use.
HOME » SNOWPLOW SETUP GUIDE » Step 3: Setting up Enrich » Step 3.2: setting up Stream Enrich
- Step 1: Setup a Collector
- Step 2: Setup a Tracker
-
Step 3: Setup Enrich
- 3.1: Setup EmrEtlRunner
-
3.2: Setup Stream Enrich
- 3.2.1: Install Stream Enrich
- 3.2.2: Configure Stream Enrich
- 3.2.3: Run Stream Enrich
- 3.2.4: GCP Stream Enrich setup guide
- Configurable enrichments
- Step 4: Setup alternative data stores
- Step 5: Data modeling
- Step 6: Analyze your data!
Useful resources