In market segmentation clustering can be used to find groups of similar customers, and visualization is vital for the interpretation of the results.
With interactive visualization we can improve our understanding of how the identified clusters differ, but we can also update the solution based on our own knowledge and interests.
This is related to how we manually define clusters from scratch in a spin-and-brush approach.
lionfish conception
We want a tool that allows us to:
Visualize the partitions in combinations of features using tour methods, both grand, guided and manual tour.
Interactively select points (brushing) to refine the cluster solution, spin-and-brush.
Link multiple displays to focus on particular clusters, and to better understand the clustering solution.
Update displays based on user selections such as feature selection or cluster selection, or re-scaling.
With a (grand) tour we use linear projections to view data in \(p \geq 3\) dimensions from different directions. Projections are interpolated along a geodesic path to provide a smooth animation.
Example: for the palmerpenguins data we have 4 size measurements, and we can look at the distribution for 3 different species in the 4D space using a grand tour.
Check if you can see that the three species are three separate clusters, and if you can spot any outliers during the animation!
library(tourr)library(tidyverse)library(palmerpenguins)# prepare the datap_tidy <- penguins |>select(species, bill_length_mm:body_mass_g) |>rename(bl=bill_length_mm,bd=bill_depth_mm,fl=flipper_length_mm,bm=body_mass_g) |>na.omit()p_tidy_std <- p_tidy |>mutate_if(is.numeric, function(x) (x-mean(x))/sd(x))# run the animation with tourranimate_xy(p_tidy_std[,-1], col = p_tidy_std$species)
When watching the animation we can notice the clusters based on how the points move, and in some views we might see that one group appears really separate from the others.
In this case we pause the animation, brush the separate cluster in a new color and then re-start the animation (spinning the view).
This is iterated until no further clusters can be identified visually.
We can use the detourr package to generate an interactive tour animation where we can brush points.
The visualization now has a play/stop button, rectangular selection of points, and we can assign any color to the selected points. Install the most recent version from GitHub to be able to download the assgined group information.
We could do this for the penguins data and compare to the species grouping, but here I picked a simpler example.
A graphical user interface has been implemented in the R package lionfish:
R package to work with implementations of clustering algorithms, and with the tourr package to generate tour paths
python interface to use TKinter and matplotlib for the GUI and the interactive graphics
matplotlib enables fast rendering and interactivity for linked brushing and manual tours
Running lionfish - minimal example
library(lionfish)library(tourr)# setting up python environmentinit_env()# saving a tour pathguided_tour_history <-save_history(data,tour_path =guided_tour(holes()))# generating the displaysobj1 <-list(type="2d_tour", obj=guided_tour_history)obj2 <-list(type="scatter", obj=c("x1", "x2"))# launching the GUIinteractive_tour(data=data,plot_objects=list(obj1, obj2),feature_names=colnames(data))
Survey by the Europäisches Tourismus Institut GmbH at the University of Trier, 2,961 adult tourists spending their winter holiday in Austria in the 1997/98 season
27 vacation activities were ranked between “totally important” and “not important” (4 levels)