layout: false class: split-75 background-image: url("plots/title_slide_bkg.png") background-position: center background-size: contain <style type="text/css"> .remark-slide-content{ font-size: 30px; } code.r{ font-size: 24px; } ul li { margin-bottom: 15px; } </style> <style type="text/css"> /* custom.css */ :root{ --main-color1: #509e2f; --main-color2: #bcbddc; --main-color3: #efedf5; --main-color4: #9DDAE5; --text-color3: black; --text-color4: #505050; --code-inline-color: #4e5054; --link-color: #006CAB; } .large { font-size: 150% } .largeish { font-size: 120% } .summarystyle { font-size: 150%; line-height:150%;} .my-gray {color: #606060!important; } .tiny{ font-size: 25%} </style> .column[.content[ <br> ## ** Tours for the dynamic visualization of high-dimensional data** .my-gray[ .large[**Ursula Laa**] .largeish[ Institute of Statistics <br> University of Natural Resources and Life Sciences <br> Vienna, Austria ] ] JSM 2022, Washington DC ]] .column[.top_abs.content[ <img src="plots/Boku-wien.png" width="60%"> ]] --- # The grand tour and hypercubes With the grand tour we are using linear projections to visualize data in more than 3D - a nice way of visualizing the tour is to show it for hypercubes (generated with `geozoo`) <br> .center[ <div > <div style="width: 33%; float: left"> <b> 3D </b> <a href=""> <img src="plots/c3.gif" width = "90%"/> </a> </div> <div style="width: 33%; float: left"> <b>4D</b> <a href=""> <img src="plots/c4.gif" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>6D</b> <a href=""> <img src="plots/c6.gif" width = "90%"/> </a> </div> </div> ] --- class: split-40 .column.content[ <br> # Implementation in R <br> ```r library(tourr) library(palmerpenguins) penguins <- na.omit(penguins) animate_xy(penguins[,3:6], col=penguins$species, axes="bottomleft", fps=15) ``` ] .column.content[ <img src="plots/penguins2d.gif" width = "90%"/> ] --- # Short overview Short history: - First proposed in [Asimov (1985)](https://doi.org/10.1137/0906011) - Lot of new developments in the 1990's, e.g. the guided tour [Cook et al (1995)](https://www.jstor.org/stable/1390844)) - R implementation in 2011 in the `tourr` package [Wickham et al (2011)](https://www.jstatsoft.org/article/view/v040i02) Common usage: - explore the shape of a multivariate distribution - understand class differences and which variables are important - identify (multivariate) outliers - visualize statistical models, as described in [Wickham et al (2015)](https://doi.org/10.1002/sam.11271) These and more recent developments were summarized in a recent review: [Lee et al (2021)](https://doi.org/10.1002/wics.1573) --- # What is new? Recent work has focused on two questions: ### How can we use tours in the case of **large data**? - Large number of **observations**: overplotting can hide features, especially in the case of concave distributions. - Large number of **variables**: projected data points tend to fall close to the center (crowding problem) ### Can we make tour displays interactive to learn more? - Having **manual controls** of the projection allows the viewer to understand which variables are important, e.g. for separating groups. - Leverage javascript interfaces, shiny apps, etc. to build an **interactive display**, e.g. for replaying or exporting projections, or for linked brushing --- # Sage tour When we project linearly from a high-dimensional space points start to **pile up** near the center of the projection, obscuring features or separation between groups. We can understand this as a large part of the high-D volume being projected onto a small area. The sage display transforms the **radius** (i.e. the distance from the center) of all projected data points such that equal volume in the high-dimensional space gets projected onto equal area in the two-dimensional plane. <center> <img src="plots/sage-density-1.png" width="75%"> </center> <center> .footnote[[Laa, Cook, Lee (2022) Burning sage: Reversing the curse of dimensionality](https://doi.org/10.1080/10618600.2021.1963264)] </center> --- # Clustering and sage display <img src="plots/mouse_grand_2c.gif" width="450"/> <img src="plots/mouse_sage_2c_gam3.gif" width="450"/> --- # Liminal With high-dimensional data a common approach is to use non-linear dimension reduction like t-SNE, also here we can leverage the tour in a linked display: <iframe title="Animation linking the tSNE plot with a tour plot, of data with 6 clusters, 2 really large, 3 very small and close together, and 1 close to the three but slightly bigger. tSNE spreads them into 6 clusters, three big and three small roughly equidistant." src="https://player.vimeo.com/video/439635905" width="80%" height="400" frameborder="0" allowfullscreen></iframe> .footnote[[Lee, Cook, Laa (2022) Casting Multiple Shadows JDSSV](https://doi.org/10.52933/jdssv.v2i3)] --- # The slice tour display Even when projecting from relatively low number of dimensions we can still have overplotting issues if there are a lot of observations. When we have a lot of data there might be features inside the distribution that are hidden in projections but visible when "slicing" through the distribution. .center[<img src="plots/slice.png" width="60%">] .footnote[[Laa, Cook, Valencia (2020) A slice tour for finding hollowness in high-dimensional data](https://doi.org/10.1080/10618600.2020.1777140)] --- # Slice tour of geometric shapes We can combine the slice display with a grand tour to gain intuition about a surface. .center[ <div > <div style="width: 33%; float: left"> <b> 3D sphere </b> <a href=""> <img src="plots/sphere-3-anchored.gif" width = "90%"/> </a> </div> <div style="width: 33%; float: left"> <b>4D torus</b> <a href=""> <img src="plots/torus-4-centered.gif" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>Roman surface</b> <a href=""> <img src="plots/roman-surface.gif" width = "90%"/> </a> </div> </div> ] The slice tour is especially useful when looking at classification models using tools from the `classifly` package, and it also inspired work on "section pursuit". --- # Manual tour in mathematica Ongoing work (with Alex Auman, Di Cook, German Valencia): interactive manual slice tour implemented in mathematica, with manual controls for the projection and the slicing. .column.content[ <img src="plots/exploring_penguin_data.gif" width = "90%"/> ] --- # Interactivity Hart and Wang (2022) [detourr:](https://cran.biotools.fr/web/packages/detourr/index.html) Animations for {tourr} using htmlwidgets for performance and portability. <br> Harrison (2022) [Langevin dynamics based tours](https://logarithmic.net/langevitour/) of data, in Javascript with R wrapper. <br> <center> <iframe src="detourr.html" width="500" height="350" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> </center> --- # Summary .largeish[ - Tour methods can visualize high-dimensional data and reveal multivariate structure and outliers, and help us understand grouping - The `tourr` package includes different methods for basis selection, an algorithm for geodesic interpolation and many display functions - Visualization of modern (big) data is a challenge and recent work is already providing interesting solutions - Tours can be used as a complementary method together with non-linear dimension reduction - Interactive interfaces help with exploration and especially with interpretation, this was challenging within R but interfaces (in particular with Javascript) help - more to come soon! ]