Section pursuit

layout: false
class: split-75
background-image: url("plots/title_slide_bkg.png")
background-position: center
background-size: contain

}
code.r{
  font-size: 24px;
}

</style>

.column[.content[

# **Section Pursuit**

**Based on arXiv:2004.13327**

.my-gray[
.large[**Ursula Laa**]

.largeish[ Institute of Statistics <br>
Department of Landscape, Spatial and Infrastructure Sciences <br>
University of Natural Resources and Life Sciences
]
]

CMStatistics 2021

]]

.column[.top_abs.content[
<img src="plots/Boku-wien.png" width="60%">
]]

---
# What is hidden in projections?

.pull-left[<img src="plots/4d_proj.png" width="49%">]
--
.pull-right[ <img src="plots/4d_slice.png" width="49%">]

---
# Combining sectioning and projections

**Sectioning** in visualization means that we select a subset of points that are kept or **highlighted** in a view of the data. The subset is generally defined by a set of **inequality conditions** defining the **section**.

Combining projections and sections of the data can provide the viewer with new insights, as discussed in *Furnas and Buja (1994)*. The combination is referred to as **prosections** and it was shown how they can be used to determine the intrinsic dimensionality of a distribution.

More generally sectioning can be useful to reveal features that are hidden in all projections of the data, but typically relies on user interaction (**linked brushing**). However, if we can define a systematic approach to sectioning we could also optimize an index function to identify interesting sections: **section pursuit**.

---
# Slicing

We can define sections based on projections of the data: a **slice** is using the **orthogonal distance** of each data point from a centered projection plane to define a section, see *UL, Cook, Valencia (2019)*. Points close to the plane are **highlighted** in the projected view and can be compared to the overall (projected) distribution of points.

.center[<img src="plots/slice.png" width="60%">]

---
# What makes a slice interesting?

A first approach to **section pursuit** is therefore to optimize a section pursuit index function over all possible **slices** of the data. If we keep the projections centered through the mean, and the slice thickness fixed, this means we are optimizing over all possible projections and can use methods from **projection pursuit** in combination with information about the orthogonal distances.

One approach would be to directly plug in index functions from the projection pursuit literature, but compute them only for the points inside the slice.

However, in this case we have a direct comparison: how is the distribution **inside** the slice different from the **projected** distribution? An interesting slice would be one that reveals a difference between the two!

---
# Notation

- The projected data points are computed as `\(Y=X\cdot A\)`, where `\(X\)` is an `\(n\times p\)` data matrix, `\(A\)` is a `\(p\times d\)` (orthonormal) basis for the `\(d\)`-dimensional space onto which the data is being projected
- To generate a 2-dimensional slice we compute the orthogonal distance between every point and the plane (defined by `\(A = (\mathbf{a}_1, \mathbf{a}_2)\)`) as the Euclidean norm
`\begin{equation}
h_i = ||\mathbf{x}_i - (\mathbf{x}_i\cdot \mathbf{a}_1) \mathbf{a}_1 - (\mathbf{x}_i\cdot \mathbf{a}_2) \mathbf{a}_2||.
\end{equation}`
- Observations are considered inside the slice if `\(h_i < h\)`.
- We denote the set of points inside the slice `\(S\)`, and the set of points outside the slice `\(C\)`.

---
# Notation

- The two sets are separately binned into `\(K\)` bins: `\(S_{k}=\sum_{i} I(Y_i \in b_{k})I(h_i < h)\)` and `\(C_{k}=\sum_{i} I(Y_i \in b_{k})I(h_i \geq h)\)`
- The relative counts are denoted `\(s_k = S_k / \sum_i S_i\)` and `\(c_k = C_k / \sum_i C_i\)`

Using these relative counts `\(s_k\)` and `\(c_k\)` we will build an index that compares the two distributions, this is based on the decomposition proposed in *Gous and Buja (2012)*.

---
# Index definition

We define two index functions that aim to detect either regions of low density (holes) or high density (up, grains) inside the slice distribution:

`\begin{equation}
I_A^{low} = \sum_{k}\left[\left(c_{k}-s_{k}\right)\right]_{>\varepsilon},
\label{eq:index}
\end{equation}`

`\begin{equation}
I_A^{up} = \sum_{k}\left[\left(s_{k}-c_{k}\right)\right]_{>\varepsilon}
\label{eq:indexup}
\end{equation}`

where we only sum up positive differences above the threshold value `\(\varepsilon\)` (based on the estimated size of sampling fluctuations).

---
# Index definition (generalization)

We also consider generalizing the definition to

`\begin{equation}
I_A^{low} = \sum_{k}w_{k}\left(\left[c_{k}^{1/q}-s_{k}^{1/q}\right]_{>\varepsilon}\right)^{q},
\label{eq:index2}
\end{equation}`
`\begin{equation}
I_A^{up} = \sum_{k}w_{k}\left(\left[s_{k}^{1/q}-c_{k}^{1/q}\right]_{>\varepsilon}\right)^{q}.
\label{eq:index2up}
\end{equation}`

allowing for bin-wise weights `\(w_k\)` and where `\(q\)` controls the sensitivity to small differences.

---
# Practial considerations

When computing this index in practice, we will see systematic differences between `\(S\)` and `\(C\)` that depend on the underlying distributions. To proceed we will make a few assumptions:

- The data shall be distributed inside a hypersphere (no systematic differences based on from which angle we are looking at it)
- In the absence of structure we will assume points to be distributed uniformly within the sphere (capturing volume effects rather than assuming specific structure, one alternative could be a multivariate normal distribution instead)
- We will use polar binning in the 2D projection plane (in `\(r\)` and `\(\theta\)`)

With these assumptions we can study the expected differences between the two distributions, and define a **bin-wise** reweighting scheme for `\(S\)` and `\(C\)` separately.

---
# Simulated data example

To illustrate the method we simulated a large sample of points inside a hypersphere, but with *holes* that are defined through hyperspherical harmonics, such that some slices are interesting and others are not.

.center[<img src="plots/allSlices-1.png" width="70%">]

---
# Reweighting in action

We can compare the binned distributions with and without reweighting to see why this is important, for example looking at S1 from set B:

.center[<img src="plots/data_and_densities-1.png" width="60%">]

---
# Tracing the index

To understand the index performance we can show how the index value changes as we slowly rotate the angle of the projection away from the most or least informative slice:

- Start by randomly selecting a large number directions along which we move away from the starting plane.
- Use geodesic interpolation (as available in the `tourr` package) to interpolate projections along those directions, up to a maximum angle `\(\alpha\)`.
- Evaluate the index value for each slice along the interpolated path.
- The **topotrace plot** shows how the index value changes with `\(\alpha\)`, we can see characteristics of the index function (smoothness, squint angle) and if there are local maxima or ridges for the considered distribution.

---
# Tracing the index

.center[<img src="plots/traces_q1-1.png" width="75%">]

---
# Example application

We can use section pursuit to better understand non-linear boundaries in high-dimensional spaces:

- Classification boundaries, for example from a SVM with polynomial or radial kernel
- Boundaries from inequality conditions, for example physically meaningful regions in a parameter space

We start by **sampling** points parameter space: points are drawn uniformly within a p-dimensional hypersphere. We then **evaluate** the model predictions or inequality conditions for all samples, and create **hollow** regions by dropping points from one class, or where the inequality condition is not met.

---
# Example application

**Example**: Classification problem from physics (PDFSense data), with three groups in six dimensions. When fitting a SVM with radial kernel we find that across most of the parameter space we predict one of the three groups. Where are sections with predictions for the other two?

- Use the `classifly` package to sample in a grid across the 6D parameter space and evaluate predictions for all points.
- Shave points to go from a hypercube to a hypersphere.
- Input to section pursuit is the set of points for which the prediction is the most commonly predicted class (filling most of the hypersphere).
- Use section pursuit to identify a slice where this set of points shows a large *hole* - we get a larger fraction of alternative predictions in this section of the parameter space.
- Show all samples with their predicted class in this slice.

---
# Example application

.center[<img src="plots/pdffit-1.png" width="80%">]

---
# Summary and Outlook

- We can define **section pursuit** using slices, and optimizing an index that compares the projected distribution to that inside the slice.
- In doing so we make **assumptions** about the underlying distribution to adjust for expected differences (and estimate the expected fluctuations `\(\varepsilon\)`).
- One application is to find interesting slices to better **understand non-linear decision boundaries** or inequality conditions.
- The current implementation is **limited** to datasets with large sample size and intermediate number of dimensions (up to seven dimensions were considered so far).
- There is a lot of room to develop **additional index functions** for section pursuit and further develop the proposed generalization, and for **better optimization** methods.
- Alternative **definitions of sectioning** should also be considered to generalize the method.

---
layout: false
background-image: url("plots/title_slide_bkg.png")
background-position: center
background-size: contain

# Thanks!

<br>

This is joint work done in collaboration with **Dianne Cook**, **German Valencia** and **Andreas Buja**.

My slides are made using `RMarkdown`, `xaringan` and the `ninjutsu` theme.
The main `R` packages used are `tourr`, `classifly` and the `tidyverse`.