lionfish

Interactive Visualization for Market Segmentation


Ursula Laa
BOKU University, Vienna

work with Matthias Medl and Di Cook

Introduction

  • In market segmentation clustering can be used to find groups of similar customers, and visualization is vital for the interpretation of the results.
  • With interactive visualization we can improve our understanding of how the identified clusters differ, but we can also update the solution based on our own knowledge and interests.
  • This is related to how we manually define clusters from scratch in a spin-and-brush approach.

lionfish conception

We want a tool that allows us to:

  1. Visualize the partitions in combinations of features using tour methods, both grand, guided and manual tour.
  2. Interactively select points (brushing) to refine the cluster solution, spin-and-brush.
  3. Link multiple displays to focus on particular clusters, and to better understand the clustering solution.
  4. Update displays based on user selections such as feature selection or cluster selection, or re-scaling.

Tour visualization

With a (grand) tour we use linear projections to view data in \(p \geq 3\) dimensions from different directions. Projections are interpolated along a geodesic path to provide a smooth animation.

Example: for the palmerpenguins data we have 4 size measurements, and we can look at the distribution for 3 different species in the 4D space using a grand tour.

Check if you can see that the three species are three separate clusters, and if you can spot any outliers during the animation!

library(tourr)
library(tidyverse)
library(palmerpenguins)

# prepare the data
p_tidy <- penguins |>
  select(species, bill_length_mm:body_mass_g) |>
  rename(bl=bill_length_mm,
         bd=bill_depth_mm,
         fl=flipper_length_mm,
         bm=body_mass_g) |>
  na.omit()
p_tidy_std <- p_tidy |>
  mutate_if(is.numeric, function(x) (x-mean(x))/sd(x))

# run the animation with tourr
animate_xy(p_tidy_std[,-1], col = p_tidy_std$species)

Spin-and-brush

When watching the animation we can notice the clusters based on how the points move, and in some views we might see that one group appears really separate from the others.

In this case we pause the animation, brush the separate cluster in a new color and then re-start the animation (spinning the view).

This is iterated until no further clusters can be identified visually.

We can use the detourr package to generate an interactive tour animation where we can brush points.

The visualization now has a play/stop button, rectangular selection of points, and we can assign any color to the selected points. Install the most recent version from GitHub to be able to download the assgined group information.

We could do this for the penguins data and compare to the species grouping, but here I picked a simpler example.

library(detourr)
library(mulgar)

set.seed(645)
detour(mulgar::c2, 
       tour_aes(projection = x1:x6)) |>
  tour_path(grand_tour(2), fps = 60, 
            max_bases=40) |>
  show_scatter(alpha = 0.7, 
               axes = FALSE)

Implementation

A graphical user interface has been implemented in the R package lionfish:

  • R package to work with implementations of clustering algorithms, and with the tourr package to generate tour paths
  • python interface to use TKinter and matplotlib for the GUI and the interactive graphics
  • matplotlib enables fast rendering and interactivity for linked brushing and manual tours

Running lionfish - minimal example

library(lionfish)
library(tourr)
# setting up python environment
init_env()
# saving a tour path
guided_tour_history <- save_history(data,
                                    tour_path =
                                    guided_tour(holes()))
# generating the displays
obj1 <- list(type="2d_tour", obj=guided_tour_history)
obj2 <- list(type="scatter", obj=c("x1", "x2"))
# launching the GUI
interactive_tour(data=data,
                 plot_objects=list(obj1, obj2),
                 feature_names=colnames(data))

For full overview of displays and interactivity check out the vignettes from https://mmedl94.github.io/lionfish/

Example: Austrian Vacation Activities

  • Survey by the Europäisches Tourismus Institut GmbH at the University of Trier, 2,961 adult tourists spending their winter holiday in Austria in the 1997/98 season
  • 27 vacation activities were ranked between “totally important” and “not important” (4 levels)
  • We work with binary values (“totally important” or not), see Dolnicar S, Leisch F (2003) for details
  • Start from k-means cluster solution with k=6, visual feature selection of 12 vacation activities that are different between the clusters

Refining cluster labels

Workflow:

  1. We start by exploring the current cluster assignments in the GUI, to identify observations of interest
  2. Use the brush to assign such a group of observations to a new cluster
  3. Examine the new group across different displays, here for example we use a mosaic plot for the binary replies

Spin-and-brush

Original clustering Brushing new cluster

Spin-and-brush

Original clustering Brushing new cluster

Example: Risk Taking

  • Survey of 563 Australian tourists, see Dolnicar S, Grün B, Leisch F (2018)
  • Six different types of risks: recreational, health, career, financial, social and safety
  • Rated on a scale from 1 (never) to 5 (very often)
  • Observations were grouped using k-means clustering, k=5
  • Understand cluster differences with a combination of guided and manual tour, where we can select clusters of interest interactively

Manual tour exploration

Looking for better separation

Summary

  • The lionfish package was tailored to market segmentation analysis, but can also be used for other tasks
  • The implementation is modular, new types of visualizations can easily be added to the package
  • Integration of R and python seamless thanks to reticulate
  • The package is on CRAN, an article describing the implementation and market segmentation analyses is published in the Austrian Journal of Statistics

The lionfish package implementation was supported by Google Summer of Code