background-image: url() background-size: contain class: hide-slide-number split-70 title-slide count: false .column[.content[.monash-gray80[ <br> # .monash-blue[ Burning sage] <h2 style="font-weight:900!important;">Reversing the curse of dimensionality in the visualization of high-dimensional data</h2> .center[.monash-gray50[work in collaboration with Di Cook and Stuart Lee]] .center[.monash-gray50[arXiv:2009.10979]] .bottom_abs.width100[ ## .monash-blue[Ursula Laa] Monash University Department of Econometrics and Business Statistics & School of Physics and Astronomy
<i class="fas fa-envelope faa-float animated "></i>
ursula.laa@monash.edu 6th November 2020 <br> ] ]] ] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- ## Open problem described in 2018 (Di Cook) <br><br> .center[ <div > <div style="width: 33%; float: left"> <b> pD object </b> <a href=""> <img src="images/uncut_cake.png" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>3D slice</b> <br><br> <br> <a href=""> <img src="images/inside_cake_original.png" width = "90%" /> </a> </div> <div style="width: 33%; float: left"> <b>pD slice</b> <br><br> <br> <a href=""> <img src="images/inside_cake_pinched.png" width = "90%" /> </a> </div> </div> ] <br> <small> Picture sources: [Samantha Cooper](https://www.pinterest.com.au/pin/551831760586431358Samantha Cooper) and [CakesDekor](https://www.pinterest.com.au/pin/383368987005449767/) </small> --- ## Curse of dimensionality paradox - **Origin**: Bellman (1961) described difficulty of optimization in high dimensions given exponential growth in space - **Consequence**: most points are far from the sample mean, near the edge of the sample space - **Paradox**: using dimension reduction we instead get an excessive amount of observations near the center of the distribution, most projections are approximately Gaussian .center[ <img src="images/density-1.png" width = "75%" /> ] --- ## Concepts of projected volume - To understand the piling near the center of projections, we can think about the high-dimensional volume projected onto a 2D area - To impose rotation invariance and avoid edge effects, we start from a hypersphere in p dimensions <br> -- .center[ <img src="images/diagram.png" width = "55%" /> ] --- ## Concepts of projected volume - To understand the piling near the center of projections, we can think about the high-dimensional volume projected onto a 2D area - To impose rotation invariance and avoid edge effects, we start from a hypersphere in p dimensions <br> .center[ <img src="images/cdf-1.png" width = "75%" /> ] --- ## Burning sage transformation We can define a radial transformation that will redistribute the projected volume such that equal pD volume is projected onto equal 2D area `$$r'_y = R \sqrt{1-\left(1-\left(\frac{r_y}{R}\right)^2\right)^{p/2}}$$` -- .center[ <img src="images/radii-1.png" width = "55%" /> ] --- ## Burning sage transformation We can define a radial transformation that will redistribute the projected volume such that equal pD volume is projected onto equal 2D area `$$r'_y = R \sqrt{1-\left(1-\left(\frac{r_y}{R}\right)^2\right)^{p/2}}$$` .center[ <img src="images/circles-1.png" width = "85%" /> ] --- ## Sage tour The new transformation is especially useful combined with a tour display, showing sequences of low-dimensional projections: for each new view we project the data to 2D and then show the sage display of the projected data <br> <iframe src="4cube.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <iframe src="4sphere.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <iframe src="10sphere.html" width="310" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Needle in a haystack: Pollen data <br><br> <img src="gifs/pollen_sage.gif" width="300"/> <img src="gifs/pollen_sage_R1.gif" width="300"/> <img src="gifs/pollen_sage_gam20.gif" width="300"/> --- ## Clustering in high dimensions: Single Cell Mouse Retina Data <br> .center[ <img src="gifs/mouse_grand_2c.gif" width="400"/> <img src="gifs/mouse_sage_2c_gam3.gif" width="400"/> ] --- ## Clustering in high dimensions: Single Cell Mouse Retina Data <img src="slides_files/figure-html/unnamed-chunk-1-1.png" width="95%" style="display: block; margin: auto;" /> --- ## Discussion & Outlook <br> - New display that reverses piling effects when visualizing high-dimensional data in low-dimensional projections - This is especially useful in combination with a tour, implemented as the **sage tour** display - Related approach: **slice tour** [Laa, Cook, Valencia (2020)](https://doi.org/10.1080/10618600.2020.1777140) - These new displays are complementary to non-linear dimension reduction methods for visualization, e.g. t-SNE - Displays should be implemented in an interactive interface, for efficient tuning - Thinking about new transformations and slicing methods --- # Acknowledgements <br> My slides are made using `RMarkdown`, `xaringan` and the `ninjutsu` theme, and based on a Monash themed template from **Emi Tanaka**. <br> The main `R` packages used are `tourr`, `tidyverse`, `plotly`, `geozoo`. <br> This is joint work done in collaboration with **Dianne Cook** and **Stuart Lee**. --- background-image: url() background-size: contain class: hide-slide-number split-70 count: false .column[.content[ <br><br> # Thanks! <br> <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. .bottom_abs.width100[ <h2 style="font-weight:900!important;">Ursula Laa</h2> Monash University Department of Econometrics and Business Statistics & School of Physics and Astronomy
<i class="fas fa-envelope faa-float animated "></i>
ursula.laa@monash.edu ] ]] <div class="column transition monash-m-new delay-2s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div>