Transform → Visualize → Model for Sequence Analytics

Transform → Visualize → Model for Sequence Analytics

Going from data to business value isn’t just a single step, it’s a collection of tasks that we iterate through in order to gradually get to an answer, build confidence that it is correct, and make it easy to digest for stakeholders.

One of my favorite explanations of this process is in Hadley Wickham’s R for Data Science book, which provides the following diagram:

We think about this process quite similarly at Motif , including developing our own event-focused definition of tidy data. This post explores how we've approached modifying these steps from traditional analysis in order to accommodate the complexities of event sequence data.

Transform

We realized quite early that novel visualizations and fancy models would never deliver real value to users without an expressive set of Transform operations. A few well-designed transform steps can make a complex analysis easy, simplifying and streamlining our sequence visualizations and make them easy to communicate.

Motif’s Sequence Operations Language (SOL) is purpose-built for event sequence transformations. I wanted to highlight a few transform steps that we keep coming back to that are easy to implement in SOL. These steps usually make use of our replace operation, which allows you to rewrite sequences in-place, almost like find-replace in your text editor was available for your entire data set.

For example there are often streaks where the same event is triggered multiple times. Conceptually we might think of these as a single event with a property for how many times it occurred. This is quite easy to specify in SOL!

Many transformations steps in sequence analytics boil down to shortening and simplifying sequences until they can be visualized properly. SOL makes these steps easy because it has the right building blocks for data cleaning.

Visualize

The visualize step is so important that Motif was designed as a visualization-first tool. That means the results of every query you run in Motif are directly sent to one of our sequence visualizations. After transformation and modeling steps, sequence events are mapped into 2-D coordinates, leveraging color, size, opacity, and interactivity to produce an accurate and interpretable visualization.

Motif’s visualizations rely on key principles to help you process as much information as possible:

  1. Provide rapid feedback: Motif displays query results within a second or two of you issuing the query, making it easy to understand the results of your transform and modeling steps.
  2. Show as much variation as possible: User journeys are characterized by their variety and we strive to show you possibilities rather than the average or typical cases. In many datasets, even the most likely sequences are quite rare!
  3. Access raw event sequences: In nearly all of our visualizations you can inspect individual example sequences that help you understand aggregates better.

Motif’s Barcode visualization is our workhorse for sequence analytics. A variant of the icicle plot, it uses the x-axis to characterize the passage of time (or steps), and the y-axis to capture variety. The barcode provides a cohesive surface for viewing every step of the analysis process, showing results of transformations, modeling assumptions, and different visualization choices.

Motif provides a number of coloring strategies to highlight different aspects of your data. For example our comparison mode shows the differences between two independent sets of sequences, e.g. before/after a product launch, or variants in an A/B test:

We are continuing to refine this visualization over time, including visualizing predictions from our GLEAM models.

Model

In the modeling step, we apply assumptions and structure to interpret the raw data and draw conclusions. While developing our approach to sequence analysis, we’ve found a key modeling step to be tagging parts of event sequences with names, roughly corresponding to states that users can enter and exit.

Motif’s pattern matching query engine efficiently determines if it can locate your pattern in the sequence and if so, applies tags to the sequence.

Tags in Motif provide programming references — they act as arrays of events and can be used to compute quantities like durations or point-in-time event properties — and visual references through the color coding of our visualizations to help you identify steps in a process. They coarsen very granular event data into something easier to work with and visualize.

After a match operation, you’ll see the Sequence Model populated with the tags you created, along with how frequently they were matched. Tags can carry semantic information that you can express here. Any tag can be marked as an “Outcome” ⛳️ which is often a goal or target event, similar to a dependent variable in a regression model. Motif’s visualizations can use this additional information, for instance to indicate how the probability of reaching that outcome changes with each event in the sequence.

Better Together

Building Motif required us to think about every step of the analysis process individually and also how they fit together into a fast feedback loop. Thinking about the composition of these tasks is important for:

  • Speed: Quality analysis requires going through this loop many times, and your analysis is only as fast as the slowest step in your loop. A better composition of these tasks means you can answer more questions, faster, and iterate the analysis based on earlier results.
  • Flexibility: Your results are only as expressive as the least flexible step in your process. If you can't transform the data adequately, no visualization can fix that. If you only have simplistic visualizations available, then no amount of transformation can help you understand what's going on.

Give me a ping if you're interested in trying Motif -- I think you'll find it as powerful and fun to work in as we do!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics