layout: false class: split-75 background-image: url("plots/title_slide_bkg.png") background-position: center background-size: contain <style type="text/css"> .remark-slide-content{ font-size: 30px; } code.r{ font-size: 24px; } </style> <style type="text/css"> /* custom.css */ :root{ --main-color1: #509e2f; --main-color2: #bcbddc; --main-color3: #efedf5; --main-color4: #9DDAE5; --text-color3: black; --text-color4: #505050; --code-inline-color: #4e5054; --link-color: #006CAB; } .large { font-size: 150% } .largeish { font-size: 120% } .summarystyle { font-size: 150%; line-height:150%;} .my-gray {color: #606060!important; } .tiny{ font-size: 25%} </style> .column[.content[ <br> ## **New displays for the visualization of multivariate data in the tourr package** .my-gray[ .large[**Ursula Laa**] .largeish[ Institute of Statistics <br> University of Natural Resources and Life Sciences ] ] useR! 2021 ]] .column[.top_abs.content[ <img src="plots/Boku-wien.png" width="60%"> ]] --- # Grand tour The **grand tour** displays a sequence of smoothly **interpolated projections**, and can reveal the shape of the distribution, clusters and outliers. <iframe src="4cube.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <iframe src="bbh.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <iframe src="pdfsense.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- # Grand tour and large data With the tour we can learn about **multivariate** features in the data. Each view is showing a **linear projection** of the data, making interpretation straightforward. The standard displays are however limited in the case of **large data**: - Large number of **observations**: overplotting can hide features, especially in the case of concave distributions. - Large number of **variables**: projected data points tend to fall close to the center (crowding problem) -- .center[Can we adapt the **display** to work better in those situations?] --- # New displays The **slice tour** highlights only a subset of data points based on a sectioning condition to reveal local information and concave structures. This works well for data with a *large number of observations*. The **sage tour** adjusts the resolution based on the distance from the center to address the crowding problem. Even with about ten dimensions this can already be important! Both approaches have been implemented as new *display* functions in the `tourr` package: - `display_slice()` - `display_sage()` --- # Slice tour The slice tour uses the **orthogonal distance** of each data point from a centered projection plane to define subsets of the data. Points close to the plane are **highlighted** in each projected view and can be compared to the overall (projected) distribution of points. .center[<img src="plots/slice.png" width="60%">] --- # Slice tour of geometric shapes We can combine the slice display with a grand tour to gain intuition about a surface. .center[ <div > <div style="width: 33%; float: left"> <b> 3D sphere </b> <a href=""> <img src="plots/sphere-3-anchored.gif" width = "90%"/> </a> </div> <div style="width: 33%; float: left"> <b>4D torus</b> <a href=""> <img src="plots/torus-4-centered.gif" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>Roman surface</b> <a href=""> <img src="plots/roman-surface.gif" width = "90%"/> </a> </div> </div> ] --- # Sage tour The sage display transforms the **radius** (i.e. the distance from the center) of all projected data points such that equal volume in the high-dimensional space gets projected onto equal area in the two-dimensional plane. Without any transformation the radial distribution of the volume and projected volume of a hypersphere are very different. .center[ <img src="plots/cdf-1.png" width = "75%" /> ] --- # Sage tour The sage display transforms the **radius** (i.e. the distance from the center) of all projected data points such that equal volume in the high-dimensional space gets projected onto equal area in the two-dimensional plane. We correct for this difference via the radial transformation. .center[ <img src="plots/circles-1.png" width = "95%" /> ] --- # Pollen data We can use the sage display to better resolve small features near the center of a distribution. <img src="plots/pollen_sage.gif" width="290"/> <img src="plots/pollen_sage_R1.gif" width="290"/> <img src="plots/pollen_sage_gam20.gif" width="290"/> --- # Summary - The **slice** and **sage** displays helps to better visualize large data with tour methods - We can see convex shapes, small features near the center of a distribution, and better resolve grouping structures in large datasets - Both displays are accessible through dedicated `display_` and `animate_` functions in the `tourr` package - The methods would benefit from an interactive interface for tuning of their parameters - Using the slicing definition we can also define *section pursuit* in analogy to *projection pursuit* and this was implemented as a *guided section tour* --- layout: false background-image: url("plots/title_slide_bkg.png") background-position: center background-size: contain # Thanks! <br> This is joint work done in collaboration with **Dianne Cook** and **Stuart Lee**. My slides are made using `RMarkdown`, `xaringan` and the `ninjutsu` theme. The main `R` packages used are `tourr`, `tidyverse`, `plotly`, `geozoo`. --- --- # Projection vs Slice .pull-left[<img src="plots/4D_proj.png" width="49%">] -- .pull-right[<img src="plots/4D_slice.png" width="49%">] --- class: split-50 .column[.content[ # Hand-drawn sketches **Data example**: hand-drawn sketches of six different items, using a sample of 1000 observations from each class. **Data format**: `\(28\times 28\)` pixels, gray scale `\(\rightarrow\)` `\(28\times28=784\)` variables! Here we reduce the dimensionality to the **first five principal components** (about `\(20\%\)` of the total variance) We use the tour to check for separation between the classes in that 5D space and compare the standards scatter plot display, the slice display and the sage display. ]] .column[.content.vmiddle[ <img src="figure/sketches-1.png" width="95%" style="display: block; margin: auto;" /> .smaller[.center[ from [Google quickdraw](https://quickdraw.withgoogle.com)]] ]] --- # Hand-drawn sketches <br> .center[ <div > <div style="width: 33%; float: left"> <b> Projection </b> <a href=""> <img src="gif/projections.gif" width = "90%"/> </a> </div> <div style="width: 33%; float: left"> <b>Slice</b> <a href=""> <img src="gif/slices.gif" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>Sage</b> <a href=""> <img src="gif/sage.gif" width = "90%"/> </a> </div> </div> ] --- # Hand-drawn sketches <br> .center[ <div > <div style="width: 33%; float: left"> <b> Projection </b> <a href=""> <img src="gif/projections_banana.gif" width = "90%"/> </a> </div> <div style="width: 33%; float: left"> <b>Slice</b> <a href=""> <img src="gif/slices_banana.gif" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>Sage</b> <a href=""> <img src="gif/sage_banana.gif" width = "90%"/> </a> </div> </div> ]