Market Basket Analysis

IMG_1450Market basket analysis (MBA), also known as affinity analysis, is an unsupervised machine learning technique that is often used in recommendation engines.  While the terms are unfamiliar to many, the concepts are straight forward.  The math is simple probability calculations, but we won’t get into that here.  The goal of this blog post is to provide some intuition on the three key metrics for market basket analysis: support, confidence, and lift.

Here’s the scenario.  Let’s say you manage a convenience store in one of the small towns in the Minnesota lakes area where I live.  Your goal is to make a data driven decision on what product(s) should be put on the valuable “end cap” near the front of the store.  The end cap is the shelving at the end of an aisle that is the most visible.  The calendar has recently flipped to June.  That means the traffic to the lakes has increased and your store is in its busiest time of the year.  So you want to maximize the sale of “summer” products.

The first thing you might do is to analyze last summer’s transactions to see which individual products were in the highest percentage of transactions.  This represents the first metric of MBA – support.  You calculate the support for each product, identify which of those products are unique to summer, and order from highest percent to lowest.

What you might discover is that sunscreen is your summer product with the highest support (in the highest % of transactions).  That makes sense to you, so you decide to start re-arranging shelves to get sun screen on the end cap.

But now you realize you have an opportunity.  What if you put products that typically sell with sunscreen right next to it on the end cap.  What you are now dealing with is the second metric of MBA – confidence.  The notation for confidence is A->B, or “given A is purchased, what is the probability that B is also purchased?”  So you want to calculate the confidence for different products (P), or Sunscreen->P.

Sorting the confidence, Sunscreen->P, shows that ball caps and lip balm are the top sellers with sunscreen.  Now you know what to pair with the sunscreen, but what if you only have room to put one of the two products?  The third metric, lift, can help you with that.  Lift will give you a measure of how much impact purchasing sunscreen has on the purchase of the second product.  In your scenario, ball caps have a higher lift because they are sold the majority of the time with sunscreen whereas lip balm is sold frequently without sunscreen.  So ball caps would be favored over lip balm based on lift.

In summary…

  1. You used support to learn that the most common summer product sold is sunscreen.
  2. You used confidence to learn that the two most common products sold with sunscreen are ball caps and lip balm.
  3. You used lift to determine that sunscreen provides more lift to ball caps and that should be the product displayed with sunscreen.

The math behind this is basic and can be done efficiently with algorithms like the apriori algorithm.  I worked through an example doing the calculations in code for a sample dataset, but there are also Python and R implementations.

Bottom line… if you need a basic recommendation capability, start with market basket analysis.