Snowplow - The Open-Source, Web-Scale Analytics Platform Powered By Hadoop, Spark, Hive And Redshift.

Sept. 4, 2018, 12:13 p.m. By: Vishakha Jha


Today, Open source software has transformed the data landscape completely and Snowplow, is an Open source, web-scale analytics platform that is built on the shoulders of open source giants like Hadoop, Spark, Hive and Redshift.

The platform, at present, is being used by 1000's of companies throughout the world and is both real-time and batch.

So, what does Snowplow basically do?

The world's most powerful analytics platform Snowplow is an enterprise-strength marketing and product analytics platform that does the following three things:

  • The platform identifies the users and keeps a track on the way they are engaged with your website or application.

  • It further stores the behavioural data of your users in a scalable "event data warehouse" that you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres.

  • In order to analyze that data, it also lets you leverage the biggest range of tools, including big data tools via EMR or more traditional tools so that the behavioural data can be analyzed.

Basic Functions:

The complete, loosely coupled web analytics platform further lets you capture, store and analyze granular, customer-level and event-level data. So a user can:

  • Drill down to individual customers as well as events

  • Zoom out in order to compare behaviours between cohorts and over time

  • Join web analytics data with other datasets (e.g. offline data, media catalogue, product catalogue, CRM)

  • Segment your audience by the behaviour

  • Develop recommendations and personalization engines.

Now, Since We Know What Snowplow Is, Why Was It Designed?

Snowplow has been technically designed for the following two reasons:

  • To Give all its users the access, ownership as well as control of their own web analytics data (no lock in).

  • To be loosely coupled and extensible, so that it is easy to make any additions. For example, New trackers to capture data from new platforms (e.g. mobile, TV) and put the data to new uses.

Snowplow How:

We know what the platform does but how does Snowplow actually help its users to get control and access to their own data? It does so with the following:

  • Ownership: All the users of the platform own their own data. Snowplow never mediates the users access to their own data.

  • Control: It is the users who decide what data is to be collected by them, what questions they want to ask of it, what analytics techniques and technologies they want to use in order to process it and how they want to act on the insight that has been generated.

  • Freedom: Not limiting itself here, the platform provides their users with the freedom to do what they want with their own data. No vendor lock-in. No assumptions are made about their business or how they should make the use of their data. The only thing that can limit the users is their own imagination.

In the end, most of the benefits that are provided come down to just one aspect: The control you have on your data. With solutions like Snowplow, the user can have access to true event-level data, that is, data in its rawest form.

Hence, a complete game changer that is hoped to drive further innovation in data. For further information related to its architecture, FAQ's, How to contribute and more, you can refer to the links given below:

Website: snowplowanalytics

For More Information: GitHub