-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Python Tracker v0.3
🚧 The documentation for the latest version can be found on the Snowplow documentation site.
This page refers to version 0.3.0 of the Snowplow Python Tracker.
-
- 2.1 Importing the module
- 2.2 Creating a tracker
- 2.2.1 `namespace
- 2.2.2
app_id
- 2.2.3
context_vendor
- 2.2.4
encode_base64
- 2.2.5
contracts
- 2.3 Creating multiple trackers
-
- 3.1
set_platform()
- 3.2
set_user_id()
- 3.3
set_screen_resolution()
- 3.4
set_viewport()
- 3.5
set_color_depth()
- 3.6
set_lang()
- 3.1
-
- 4.1 Common
- 4.1.1 Custom contexts
- 4.1.2 Optional timestamp argument
- 4.2
track_screen_view()
- 4.3
track_page_view()
- 4.4
track_ecommerce_transaction()
- 4.5
track_ecommerce_transaction_item()
- 4.6
track_struct_event()
- 4.7
track_unstruct_event()
- 4.7.1 Supported datatypes
- 4.1 Common
The Snowplow Python Tracker allows you to track Snowplow events from your Python apps and games.
The tracker should be straightforward to use if you are comfortable with Python development; any prior experience with Snowplow's JavaScript Tracker or Lua Tracker, Google Analytics or Mixpanel (which have similar APIs to Snowplow) is helpful but not necessary.
Note that this tracker has access to a more restricted set of Snowplow events than the JavaScript Tracker and covers almost all the events from the Lua Tracker.
Assuming you have completed the Python Tracker Setup for your Python project, you are now ready to initialize the Python Tracker.
Require the Python Tracker's module into your Python code like so:
from snowplow_tracker.tracker import Tracker
That's it - you are now ready to initialize a tracker instance.
The simplest tracker initialization only requires you to provide the URI of the collector to which the tracker will log events:
tracker = Tracker("my-collector.cloudfront.net")
There are other optional keyword arguments:
Argument Name | Description | Required? | Default |
---|---|---|---|
collector_uri |
The collector URI | Yes | None |
namespace |
The name of the tracker instance | No | None |
app_id |
The application ID | No | None |
context_vendor |
[Context vendor][context-vendor] | No | None |
encode_base64 |
Whether to enable [base 64 encoding][base64] | No | True |
contracts |
Whether to enable [PyContracts][contracts] | No | True |
A more complete example:
tracker = Tracker("my-collector.cloudfront.net", "cf", "com.my_company", False, False)
If provided, the namespace
argument will be attached to every event fired by the new tracker. This allows you to later identify which tracker fired which event if you have multiple trackers running.
The app_id
argument lets you set the application ID to any string.
The context_vendor
argument identifies the company which defined the custom contexts attached to events the tracker fires. It should be a string containing no characters other than lowercase letters, underscores and dots. It should be the company's reversed internet domain name - for example, "com.example" for custom contexts developed at a company with domain name "example.com". Whenever the new tracker fires an event with a custom context attached, the context vendor will also be attached. This helps to avoid confusion between custom contexts defined by different companies.
By default, unstructured events and custom contexts are encoded into Base64 to ensure that no data is lost or corrupted. You can turn encoding on or off using the Boolean encode_base64
argument.
Python is a dynamically typed language, but each of our methods expects its arguments to be of specific types and value ranges, and validates that to be the case. These checks are done using the PyContracts library.
If the validation check fails, then a runtime error is thrown:
t = Tracker.hostname("localhost")
t.set_color_depth("walrus")
contracts.interface.ContractNotRespected: Breach for argument 'depth' to Tracker:set_color_depth().
Expected type 'int', got 'str'.
checking: Int for value: Instance of str: 'walrus'
checking: $(Int) for value: Instance of str: 'walrus'
checking: int for value: Instance of str: 'walrus'
Variables bound in inner context:
- self: Instance of Tracker: <snowplow_tracker.tracker.Tracker object... [clip]
If your value is of the wrong type, convert it before passing it into the track...()
method, for example:
level_idx = 42
t.track_screen_view("Game Level", str(level_idx))
We specify the types and value ranges required for each argument below.
You can turn off type checking to improve performance by setting the contracts
argument to False
when initializing a tracker. Note that this will disable type checking for every tracker you have initialized.
Each tracker instance is completely sandboxed, so you can create multiple trackers as you see fit.
Here is an example of instantiating two separate trackers:
t1 = Tracker("my-collector.cloudfront.net", "t1")
t1.set_platform("cnsl")
t1.track_unstruct_event("save-game", { save_id = 23 }, 1369330092)
t2 = Tracker("my-company.c.snplow.com", "t2")
t2.set_platform("cnsl")
t2.track_screen_view("Game HUD", "23")
t1.track_screen_view("Test", "23") # Back to first tracker
You may have additional information about your application's environment, current user and so on, which you want to send to Snowplow with each event.
The tracker instance has a set of set...()
methods to attach extra data to all tracked events:
We will discuss each of these in turn below:
You can change the platform the tracker is running on by calling:
t.set_platform(platform_code)
For example:
t.set_platform("tv") # Running on a Connected TV
For a full list of supported platforms, please see the Snowplow Tracker Protocol.
You can set the user ID to any string:
t.set_user_id( "{{USER ID}}" )
Example:
t.set_user_id("alexd")
If your Python code has access to the device's screen resolution, then you can pass this in to Snowplow too:
t.set_screen_resolution( {{WIDTH}}, {{HEIGHT}} )
Both numbers should be positive integers; note the order is width followed by height. Example:
t.set_screen_resolution(1366, 768)
If your Python code has access to the device's screen resolution, then you can pass this in to Snowplow too:
t.set_viewport( {{WIDTH}}, {{HEIGHT}} )
Both numbers should be positive integers; note the order is width followed by height. Example:
t.set_viewport(300, 200)
If your Python code has access to the bit depth of the device's color palette for displaying images, then you can pass this in to Snowplow too:
t.set_color_depth( {{BITS PER PIXEL}} )
The number should be a positive integer, in bits per pixel. Example:
t.set_color_depth(32)
This method lets you pass a user's language in to Snowplow:
t.set_lang( {{LANGUAGE}} )
The number should be a positive integer, in bits per pixel. Example:
t.set_lang('en')
Snowplow has been built to enable you to track a wide range of events that occur when users interact with your websites and apps. We are constantly growing the range of functions available in order to capture that data more richly.
Tracking methods supported by the Python Tracker at a glance:
Function | Description |
---|---|
track_page_view() |
Track and record views of web pages. |
track__ecommerce_transaction() |
Track an ecommerce transaction on transaction level. |
track_ecommerce_transaction_item() |
Track an ecommerce transaction on item level. |
track_screen_view() |
Track the user viewing a screen within the application |
track_struct_event() |
Track a Snowplow custom structured event |
track_unstruct_event() |
Track a Snowplow custom unstructured event |
All events are tracked with specific methods on the tracker instance, of the form track_XXX()
, where XXX
is the name of the event to track.
In short, custom contexts let you add additional information about the circumstances surrounding an event in the form of a Python dictionary object. Each tracking method accepts an additional optional contexts parameter after all the parameters specific to that method:
def track_page_view(self, page_url, page_title=None, referrer=None, context=None, tstamp=None):
The context argument is a Python dictionary. Each of its keys is the name of a context, and each of its values is the flat (not nested) dictionary for that context. So if a visitor arrives on a page advertising a movie, the context argument might look like this:
{ "movie_poster": { # Context entry
"movie_name": "Solaris",
"poster_country": "JP",
"poster_year": new Date(1978, 1, 1)
}
}
This is how to fire a page view event with the above custom context:
t.track_page_view("http://www.films.com", "Homepage", context={
"movie_poster": {
"movie_name": "Solaris",
"poster_country": "JP",
"poster_year$dt": new Date(1978, 1, 1)
}
})
Each track...()
method supports an optional timestamp as its final argument; this allows you to manually override the timestamp attached to this event.
If you do not pass this timestamp in as an argument, then the Python Tracker will use the current time to be the timestamp for the event.
Here is an example tracking a structured event and supplying the optional timestamp argument.We can explicitly supply None
s for the intervening arguments which are empty:
t.track_struct_event("some cat", "save action", None, None, None, 1368725287)
Alternatively, we can use the argument name:
t.track_struct_event("some cat", "save action", tstamp=1368725287)
Timestamp is counted in seconds since the Unix epoch - the same format as generated by time.time()
in Python.
Warning: this feature is implemented in the Lua and Python tracker, but it is not currently supported in the Enrichment, Storage or Analytics stages in the Snowplow data pipeline. As a result, if you use this feature, you will log screen views to your collector logs, but these will not be parsed and loaded into e.g. Redshift to analyse. (Adding this capability is on the roadmap.)
Use track_screen_view()
to track a user viewing a screen (or equivalent) within your app. Arguments are:
Argument | Description | Required? | Validation |
---|---|---|---|
name |
Human-readable name for this screen | Yes | Non-empty string |
id_ |
Unique identifier for this screen | No | String |
context |
Custom context for the event | No | Dict |
tstamp |
When the screen was viewed | No | Positive integer |
Example:
t.track_screen_view("HUD > Save Game", "screen23", 1368725287)
Use track_page_view()
to track a user viewing a page within your app.
Arguments are:
Argument | Description | Required? | Validation |
---|---|---|---|
page_url |
The URL of the page | Yes | Non-empty string |
page_title |
The title of the page | No | String |
referrer |
The address which linked to the page | No | String |
context |
Custom context for the event | No | Dict |
tstamp |
When the pageview occurred | No | Positive integer |
Example:
t.track_page_view("www.example.com", "example", "www.referrer.com")
Use track_ecommerce_transaction()
to track an ecommerce transaction on the transaction level.
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
order_id |
ID of the eCommerce transaction | Yes | Non-empty string |
tr_total_value |
Total transaction value | Yes | Int or Float |
tr_affiliation |
Transaction affiliation | No | String |
tr_tax_value |
Transaction tax value | No | Int or Float |
tr_shipping |
Delivery cost charged | No | Int or Float |
tr_city |
Delivery address city | No | String |
tr_state |
Delivery address state | No | String |
tr_country |
Delivery address country | No | String |
context |
Custom context for the event | No | Dict |
tstamp |
When the transaction event occurred | No | Positive integer |
Examples:
t.track_ecommerce_transaction("order-456", 142, None, 20, 12.99, "London", None, "United Kingdom")
t.track_ecommerce_transaction("order-456", 142, tr_city="Paris", tr_country="France")
Use track_ecommerce_transaction_item()
to track an individual line item within an ecommerce transaction.
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
ti_id |
Order ID | Yes | Non-empty string |
ti_sku |
Item SKU | Yes | Non-empty string |
ti_price |
Item price | Yes | Int or Float |
ti_quantity |
Item quantity | Yes | Int |
ti_name |
Item name | No | String |
ti_category |
Item category | No | String |
context |
Custom context for the event | No | Dict |
tstamp |
When the transaction event occurred | No | Positive integer |
Example:
t.track_ecommerce_transaction_item("order-789", "2001", 49.99, 1, "Green shoes", "clothing")
Use track_struct_event()
to track a custom event happening in your app which fits the Google Analytics-style structure of having up to five fields (with only the first two required):
Argument | Description | Required? | Validation |
---|---|---|---|
category |
The grouping of structured events which this action belongs to |
Yes | Non-empty string |
action |
Defines the type of user interaction which this event involves | Yes | Non-empty string |
label |
A string to provide additional dimensions to the event data | No | String |
property |
A string describing the object or the action performed on it | No | String |
value |
A value to provide numerical data about the event | No | Int or Float |
context |
Custom context for the event | No | Dict |
tstamp |
When the structured event occurred | No | Positive integer |
Example:
t.track_struct_event("shop", "add-to-basket", None, "pcs", 2)
Warning: this feature is implemented in the Python tracker, but it is not currently supported in the Enrichment, Storage or Analytics stages in the Snowplow data pipeline. As a result, if you use this feature, you will log unstructured events to your collector logs, but these will not be parsed and loaded into e.g. Redshift to analyse. (Adding this capability is on the roadmap.)
Use track_unstruct_event()
to track a custom event which consists of a name and an unstructured set of properties. This is useful when:
- You want to track event types which are proprietary/specific to your business (i.e. not already part of Snowplow), or
- You want to track events which have unpredictable or frequently changing properties
The arguments are as follows:
Argument | Description | Required? | Validation |
---|---|---|---|
event_vendor |
The company which defined the event | Yes | Non-empty string |
name |
The name of the event | Yes | Non-empty string |
properties |
The properties of the event | Yes | Non-empty table |
context |
Custom context for the event | No | Dict |
tstamp |
When the unstructured event occurred | No | Positive integer |
Example:
t.track_unstruct_event("save-game", "com.example_company", {
"save_id": "4321",
"level": 23,
"difficultyLevel": "HARD",
"dl_content": true
}, 1369330929 )
The properties table consists of a set of individual name = value
pairs. The structure must be flat: properties cannot be nested. Be careful here as this is not currently enforced through validation.
The event vendor is the reversed domain name of the company which defined the event. It should contain no characters other than lower case letters, underscores, and dots. For example, for a company with domain name "example_company.com", the event vendor would be "com.example_company".
Snowplow unstructured events support a relatively rich set of datatypes. Because these datatypes do not always map directly onto Python datatypes, we have introduced some "type suffixes" for the Python property names, so that Snowplow knows what Snowplow data types the Python data types map onto:
Snowplow datatype | Description | Python datatype | Type suffix(es) | Supports array? |
---|---|---|---|---|
Null | Absence of a value | N/A | - | No |
String | String of characters | string | - | Yes |
Boolean | True or false | boolean | - | Yes |
Integer | Number without decimal | number | $int |
Yes |
Floating point | Number with decimal | number | $flt |
Yes |
Geo-coordinates | Longitude and latitude | { number, number } | $geo |
Yes |
Date | Date and time (ms precision) | number |
$dt , $ts , $tms
|
Yes |
Array | Array of values | {x, y, z} | - | - |
Let's go through each of these in turn, providing some examples as we go:
Tracking a Null value for a given field is currently untested in the Python Tracker. TODO.
Tracking a String is easy:
{
"product_id" = "ASO01043"
}
Tracking a Boolean is also straightforward:
{
"trial" = True
}
To track an Integer, use a Python number but add a type suffix like so:
{
"in_stock$int" = 23
}
Warning: if you do not add the $int
type suffix, Snowplow will assume you are tracking a Floating point number.
To track a Floating point number, use a Python number; adding a type suffix is optional:
{
"price$int" = 4.99,
"sales_tax" = 49.99 # Same as sales_tax$flt = ...
}
Tracking a pair of Geographic coordinates is done like so:
{
"check_in$geo" = (40.11041, -88.21337) # Lat, long
}
Please note that the datatype takes the format latitude followed by longitude. That is the same order used by services such as Google Maps.
Warning: if you do not add the $geo
type suffix, then the value will be incorrectly interpreted by Snowplow as an Array of Floating points.
Snowplow Dates include the date and the time, with milliseconds precision. There are three type suffixes supported for tracking a Date:
-
$dt
- the Number of days since the epoch -
$ts
- the Number of seconds since the epoch -
$tms
- the Number of milliseconds since the epoch. This precision is hard to access from within Python
You can track a date by adding a Python number to your properties
object. The following are all valid dates:
{
"birthday2$dt" = 3996,
"registered2$ts" = 1371129610,
"last_action$tms" = 1368454114215, # Accurate to milliseconds
}
Note that the type prefix only indicates how the Python number sent to Snowplow is interpreted - all Snowplow Dates are stored to milliseconds precision (whether or not they include that level of precision).
Two warnings:
- If you specify a Python number but do not add a valid Date suffix (
$dt
,$ts
or$tms
), then the value will be incorrectly interpreted by Snowplow as a Number, not a Date - If you specify a Python number but add the wrong Date suffix, then the Date will be incorrectly interpreted by Snowplow, for example:
{
"last_ping$dt" = 1371129610 # Should have been $ts. Snowplow will interpret this as the year 3756521449
}
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2021 Snowplow Analytics Ltd. Documentation terms of use.
HOME » TECHNICAL DOCUMENTATION
1A. Trackers
Overview
ActionScript3 Tracker
Android Tracker
Arduino Tracker
CPP Tracker
Golang Tracker
Google AMP Tracker
iOS Tracker
Java Tracker
JavaScript Tracker
Lua Tracker
.NET Tracker
Node.js Tracker
PHP Tracker
Pixel Tracker
Python Tracker
Ruby Tracker
Scala Tracker
Unity Tracker
Building a Tracker
1B. Webhooks
Iglu webhook adapter
CallRail webhook adapter
MailChimp webhook adapter
Mandrill webhook adapter
PagerDuty webhook adapter
Pingdom webhook adapter
SendGrid webhook adapter
Urban Airship Connect webhook adapter
Mailgun webhook adapter
StatusGator webhook adapter
Unbounce webhook adapter
Olark webhook adapter
Marketo webhook adapter
Vero webhook adapter
2. Collectors
Overview
Cloudfront collector
Clojure collector (Elastic Beanstalk)
Scala Stream Collector
3. Enrich
Overview
EmrEtlRunner
Stream Enrich
Beam Enrich
Snowplow Event Recovery
Hadoop Event Recovery
C. Canonical Snowplow event model
4. Storage
Overview
Relational Database Shredder
Relational Database Loader
S3 Loader
Elasticsearch Loader
Storage in Redshift
Storage in PostgreSQL
Storage in Infobright (deprecated)
D. Snowplow storage formats (to write)
5. Analytics
Analytics-documentation
Event-manifest-populator
Common
Shredding
Artifact repositories