Measuring Cross-Product Adoption Using dbt_set_similarity | by Matthew Senick | Dec, 2024

Enhancing cross-product insights within dbt workflows

For multi-product companies, one critical metric is often what is called “cross-product adoption”. (i.e. understanding how users engage with multiple offerings in a given product portfolio)

One measure suggested to calculate cross-product or cross-feature usage in the popular book Hacking Growth [1] is the Jaccard Index. Traditionally used to measure the similarity between two sets, the Jaccard Index can also serve as a powerful tool for assessing product adoption patterns. It does this by quantifying the overlap in users between products, you can identify cross-product synergies and growth opportunities.

A dbt package dbt_set_similarity is designed to simplify the calculation of set similarity metrics directly within an analytics workflow. This package provides a method to calculate the Jaccard Indices within SQL transformation workloads.

To import this package into your dbt project, add the following to the packages.yml file. We will also need dbt_utils for the purposes of this articles example. Run a dbt deps command within your project to install the package.

packages:
- package: Matts52/dbt_set_similarity
version: 0.1.1
- package: dbt-labs/dbt_utils
version: 1.3.0

The Jaccard Index, also known as the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two sets. It is defined as the size of the intersection of the sets divided by the size of their union.

Mathematically, it can be expressed as:

The Jaccard Index represents the “Intersection” over the “Union” of two sets (image by author)

Where:

A and B are two sets (ex. users of product A and product B)
The numerator represents the number of elements in both sets
The denominator represents the total number of distinct elements across both sets

The Jaccard Index is particularly useful in the context of cross-product adoption because:

It focuses on the overlap between two sets, making it ideal for understanding shared user bases
It accounts for differences in the total size of the sets, ensuring that results are proportional and not skewed by outliers

For example:

If 100 users adopt Product A and 50 adopt Product B, with 25 users adopting both, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the two user bases by the Jaccard Index.

The example dataset we will be using is a fictional SaaS company which offers storage space as a product for consumers. This company provides two distinct storage products: document storage (doc_storage) and photo storage (photo_storage). These are either true, indicating the product has been adopted, or false, indicating the product has not been adopted.

Additionally, the demographics (user_category) that this company serves are either tech enthusiasts or homeowners.