Batch pipeline steps r86

This documentation is outdated!

🚧 The latest setup guidance for Snowplow can be found on the Snowplow documentation site.

This page refers to Snowplow R86 and earlier

Click here for the corresponding documentation for other releases

Dataflow diagram

Recovery steps

The below table summarizes the actions to be taken at each particular step failure from the dataflow diagram above.

Failed step	Recovery actions
1	If no files have been moved yet (`raw:processing` [A] is empty), rerun the EmrEtlRunner as usual. If (on the other hand) some files have already been moved, rerun the EmrEtlRunner with `--skip staging` option to proceed with processing of those log files.
2	Rerun the EmrEtlRunner with `--skip staging` option.
3	Rerun the EmrEtlRunner with `--skip staging` option. Note: The `enriched:bad` [D] and `enriched:error` [E] could contain the files produced as a result of the step 3. Therefore rerunning the EmrEtlRunner could result in duplicated `bad`/`error` files. This could be significant if `elasticsearch` step [8-9] is engaged for examining `bad` data [D]. The outcome would be the same data timestamped with different time values by different EMR runs.
4	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option.
5	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option.
6	Delete `enriched:good` files [F] and rerun the EmrEtlRunner with `--skip staging` option. Note: The `enriched:bad` [D] and `shredded:bad` [H] could contain the files produced as a result of the step 3 and 6 respectively. Therefore rerunning the EmrEtlRunner could result in duplicated `bad` files. This could be significant if `elasticsearch` step (8-9) is engaged for examining `bad` data ([D],[H]). The outcome would be the same data timestamped with different time values by different EMR runs.
7	Delete `enriched:good` [F] and `shredded:good` [K]. Rerun the EmrEtlRunner with `--skip staging` option.
8	If duplicated `bad` data is not critical rerun the EmrEtlRunner with `--skip staging,emr` option. If duplicated bad data is critical, instructions to come (#2593).
9	If duplicated `bad` data is not critical rerun the EmrEtlRunner with `--skip staging,emr` option. If duplicated bad data is critical, instructions to come (#2593).
10	Rerun the EmrEtlRunner with `--skip staging,emr,elasticsearch` option.
11	The data load cannot result in partial load due to the use of `COMMIT`. However, if more than one data target is used you would need to rerun the StorageLoader with the successfully loaded target removed from the `config.yml` configuration file to retry loading the "failed" target. Note: If the failure occurred at `analyze` stage, skip it with `--skip download,load` option (currently `analyze` is bound with `load` step and cannot be rerun separately).
12	Rerun the StorageLoader with `--skip download,load` option.

HOME > SNOWPLOW SETUP GUIDE

Setup Snowplow

Useful resources

Troubleshooting
AWS sub-account setup
IAM Setup
Hosted assets
Glossary of Terms
Upgrade Guide
Snowplow Version Matrix
Batch Pipeline Steps (block dataflow diagram)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch pipeline steps r86

This documentation is outdated!

Dataflow diagram

Recovery steps

Clone this wiki locally