Edge detecting new physics the Voronoi way

Dipsikha Debnath; James S. Gainer; Doojin Kim; Konstantin T. Matchev

doi:10.1209/0295-5075/114/41001

Introduction

Experimental searches for new physics (NP) are ultimately searches for "features" in the data. In high-energy physics, the data is represented by a collection of "events", which are distributed in phase space, ${\cal P}$ , according to the fully differential cross-section

$\begin{equation} \frac{\text{d}\sigma}{\text{d}\vec{\bm x}}\equiv f(\vec{\bm x},\left\{\alpha\right\}). \end{equation} \tag{ 1 }$

Here $\vec{\bm x}\in {\cal P}$ is a particular phase space point, typically labelled by momentum components of final-state particles, while $\left\{\alpha\right\}$ is a set of model parameters, e.g., particle masses, widths, couplings, etc. Thus, the distribution of events in phase space is nothing but a Monte Carlo sampling of the function (1), which generally consists of two contributions:

$\begin{equation} f(\vec{\bm x}, \left\{\alpha\right\}) \equiv f_{SM} (\vec{\bm x}, \left\{\alpha_{SM} \right\}) +f_{NP}(\vec{\bm x}, \left\{\alpha_{NP}\right\}). \end{equation} \tag{ 2 }$

In eq. (2), f_SM is the distribution expected from standard model (SM) processes, also know as "the background", while f_NP describes possible new physics, i.e., "the signal".

The traditional method to search for new physics is via counting experiments, where one measures the total number of events in a suitably chosen region of phase space, ${\cal P}_0$ . New physics then manifests itself as an excess over the SM expectation $\int_{{\cal P}_0} f_{SM} (\vec{\bm x}, \left\{\alpha_{SM}\right\})\, \text{d} {\vec{\bm x}}$ . However, a much more powerful approach is to look at the differential properties of the observed events in phase space and attempt to identify structural features in their distributions, which might be present in f_NP, but not in f_SM. An example of this method is the bump-hunting technique in resonance searches, where the Breit-Wigner peak in f_NP "stands out" over the smooth background described by f_SM.

The situation gets much more complicated if some of the decay products (e.g., neutrinos or dark-matter particles) are invisible in the detector. While more challenging, this scenario is important; in many motivated models of new physics such as supersymmetry (SUSY) [1] or universal extra dimensions (UED) [2,3], generically events will involve the production of at least two particles which cannot be resolved in the detector. It is therefore important to look for special features in the signal distribution f_NP¹. Examples of such features include kinematic endpoints [5–8], kinematic boundaries [9–13], kinks [14–18] and cusps [19–22], which are absent in the background distribution f_SM. Of particular interest to us in this letter will be kinematic endpoints and kinematic boundaries, as these will lead to "edges", i.e., discontinuities in the observed distribution, $f(\vec{\bm x}, \left\{\alpha\right\})$ .

In this letter, we focus on two-dimensional high-energy particle physics data, leaving the straightforward generalization to higher dimensions to a future study [23]. While edge detection is a well-studied problem in the experimental and observational sciences (see, e.g., [24]), there are several challenges for edge detection in particle physics that may frustrate standard approaches², namely:

1) The data may be relatively sparse. Most work in edge detection has focused on images, where there is a data point at each pixel. By contrast, in particle physics we may want to discover new physics with a relatively small number of signal events, when large regions of phase space may remain unpopulated.

2) We may not know analytically the class of distributions, $f_{SM}+f_{NP}$ , that describe the data. If we know the parametric form of the distribution (2), likelihood methods can be used to determine edges. However, it is generally difficult to obtain an exact analytical form for f_SM, particularly in the case of reducible backgrounds, where detector effects play a major role. We may also wish to be sensitive to "surprises" in the data with regards to new physics —after all, we cannot be sure, a priori, that we have correctly guessed the specific new physics model [29]. Even if we have some idea of where the new physics edges may be found, a general procedure may still be of greater practical use.

3) The data may be in more than two dimensions. As we mentioned above, edge detection is generally applied to two-dimensional images. However, multivariate analyses [30] are ubiquitous in particle physics; in general, we will face the problem of finding an (n − 1)-dimensional kinematic boundary in an n-dimensional parameter space.

The class of methods for edge detection which we propose and explore here can handle all three of these significant challenges, making them an important addition to the experimentalist's toolkit for Run 2 at the CERN Large Hadron Collider (LHC).

The starting point of our analyses is the Voronoi tessellation³ of our two-dimensional data, where each "event", i, is treated as the corresponding generator point for the i-th Voronoi polygon (see, e.g., [33]). Voronoi tessellations have been successfully used in various areas of science, including condensed-matter physics [34], astronomy [35] and astrophysics [36,37], as well as for jet clustering [38] and the model-independent definition of search regions [39–42] in particle physics. However, Voronoi methods have, as far as we know, been applied directly neither to the sort of edge-detection procedures we develop and describe here, nor to the direct identification of new features in high-energy physics data. In any case, such methods have been significantly under-utilized in particle physics given their substantial promise, though their application⁴ to situations where points (or events) are sampled from a non-trivial distribution⁵ is natural.

The Voronoi approach is ideally suited for finding interesting (e.g., singular) features in f_NP, since it preserves the maximum spatial resolution in the data [44]. To see this, we consider the toy example in fig. 1. The upper left panel shows N = 2000 data "points" $(x,y)$ generated within a square of total area $A=2\times2=4$ according to the periodic function

$\begin{equation} f(\vec{\bm x}) = 1 + \sin \left(6\pi \sqrt{x^2+y^2} \right). \end{equation} \tag{ 3 }$

**Fig. 1:** (Color online) Top left: a scatter plot of N = 2000 data points generated according to (3). The same data is then represented as a Voronoi tessellation (top right) or binned histograms of $10\times 10$ bins (bottom left) and $30\times 30$ bins (bottom right).
Download figure:
Standard image

The standard approach is to bin the data, e.g., as shown in the lower panels of fig. 1. It is clear that interesting features of the underlying distribution are being lost as a result of averaging within each bin. This loss of information is particularly noticeable for the coarse $10\times 10$ grid shown in the lower left panel. Of course, choosing a finer binning, as in the lower right panel, begins to reveal the radial symmetry in the data. Thus, in order to understand the existing structure in the data, an intelligent choice of binning typically has to be made, and the associated additional effort could be substantial for a less trivial example.

In contrast, the Voronoi tessellation of the same data, shown in the upper right panel of fig. 1, clearly displays the radial periodicity and rotational symmetry of the data. The cells are color-coded according to their normalized area $\bar a_i = a_i / \langle a \rangle$ , where a_i is the area of the i-th Voronoi polygon, and $\langle a \rangle = A / N$ . Furthermore, by construction, the areas, a_i, serve as local estimators of the values of the generating function $f(\vec{\bm x})$ at the location $\vec{\bm x}_i$ of each generator point p_i (see footnote ⁶):

$\begin{equation} f(\vec{\bm x}_i) \simeq \frac{1}{N a_i}. \end{equation} \tag{ 4 }$

Thus, sufficient spatial information can be obtained from the Voronoi tessellation without the additional effort of determining an optimal binning strategy or an intelligent transformation of variables.

Voronoi methods for edge detection

Since we do not assume the exact knowledge of f_NP, we do not attempt here to reconstruct the function itself, but focus instead on finding edge features such as discontinuities (for a study in one dimension, see [45]). Edge detection algorithms for binned data exist [46]; our methods will apply to the corresponding Voronoi tessellation and include the following steps:

1)
construct the Voronoi tessellation for the data set;
2)
compute relevant attributes of the Voronoi cells;
3)
(optionally) use the information from the previous step to further process the data in some way;
4)
use some criterion to flag "candidate" edge cells.

We can gain useful intuition from a toy example illustrating this procedure. Consider the probability distribution within the unit square,

$\begin{equation} f(x,y) =\frac{2}{1+\rho}\left[ \rho H(0.5-x) + H(x-0.5) \right], \end{equation} \tag{ 5 }$

where H(x) is the Heaviside step function and ρ is a constant density ratio. We generate N = 350 points and show the resulting Voronoi tessellation in fig. 2. Our goal will be to investigate the vertical edge at $x=0.5$ (yellow solid line), which divides the square into two regions of constant, but unequal densities. The Voronoi cells crossed by the edge at $x=0.5$ will from now on be referred to as "edge" cells (for the convenience of the reader, in fig. 2 they are outlined in black). The remaining Voronoi cells away from the edge will be referred to as "bulk" cells, and their boundaries are kept white.

**Fig. 2:** (Color online) The Voronoi tessellation of 350 data points distributed according to the probability density (5) with $\rho=6$ . The Voronoi polygons have been color-coded by the amplitude A_i (top left) and phase $|\varphi_i|$ in degrees (top right) of the local gradient (6), the average dot product $\bar{s}_i$ (10) (bottom left), or the scaled variance $\bar{\sigma}_i$ (11) (bottom right).
Download figure:
Standard image

**Fig. 2:** (Color online) The Voronoi tessellation of 350 data points distributed according to the probability density (5) with $\rho=6$ . The Voronoi polygons have been color-coded by the amplitude A_i (top left) and phase $|\varphi_i|$ in degrees (top right) of the local gradient (6), the average dot product $\bar{s}_i$ (10) (bottom left), or the scaled variance $\bar{\sigma}_i$ (11) (bottom right).
Download figure:
Standard image

Since an edge discontinuity will be signaled by a large gradient, we first attempt to compute the gradient vector

$\begin{equation} (\nabla f)_i \equiv (A_i \cos\varphi_i, A_i \sin\varphi_i) \equiv \vec{\bm A}_i \end{equation} \tag{ 6 }$

at each data point p_i, i.e., we devise a method for measuring the amplitude A_i and phase $\varphi_i$ from the tessellation. The presence of a set N_i of neighboring cells surrounding the i-th polygon allows the calculation of $|N_i|$ directional derivatives. The directions can be specified by either the locations of the neighboring data points $p_j\in N_i$ , or better yet, the locations $\vec{r}_j$ of their centroids (choosing to work with the centroids instead of the data points themselves has the benefit of filtering out random noise and producing more stable results).

Then, a unit vector $\hat{n}_{ij}$ pointing from the centroid of the i-th cell towards the centroid of its j-th neighbor is given by

$\begin{equation} \hat{n}_{ij}=\frac{\vec{r}_j-\vec{r}_i}{| \vec{r}_j-\vec{r}_i |} \equiv (\cos\varphi_{ij}, \sin\varphi_{ij}), \end{equation} \tag{ 7 }$

and from (4) the directional derivative $(\nabla_{\hat{n}_{ij}} f)_i$ is

$\begin{equation} (\nabla_{\hat{n}_{ij}} f)_i = \left(a_ia_j\right)^{\frac{3}{4}}\frac{f(\vec{\bm x}_j)- f(\vec{\bm x}_i)}{| \vec{r}_j-\vec{r}_i |}, \end{equation} \tag{ 8 }$

where the pre-factor of $(\text{area})^{3/2}$ is included in order to render the derivative dimensionless.

From the measured $|N_i|$ directional derivatives (8) we can extract the amplitude A_i and phase $\varphi_i$ by fitting to

$\begin{equation} (\nabla_{\hat{n}_{ij}} f)_i \equiv (\nabla f)_i \cdot \hat{n}_{ij} = A_i \cos(\varphi_i- \varphi_{ij}). \end{equation} \tag{ 9 }$

The result is shown in the upper two panels of fig. 2. As anticipated, edge cells are characterized with relatively large gradient magnitudes, and the directions of their gradients appear correlated. This motivates us to consider the dot products of the gradient vectors for neighboring cells, and define the average such dot product $\bar{s}_i$ for the i-th cell as

$\begin{equation} \bar{s}_i\equiv\frac{1}{|N_i|}\sum_{j\in N_i} \vec{\bm A}_i\cdot \vec{\bm A}_j=\frac{1} {|N_i|}\sum_{j\in N_i} A_i A_j \cos(\varphi_i-\varphi_{j}). \end{equation} \tag{ 10 }$

The corresponding result is shown in the lower left panel of fig. 2.

Finally, we observe that the neighbors of edge cells are typically different —some are large, while others are small. This motivates us to introduce the scaled variance of the areas of the neighboring cells,

$\begin{equation} \bar{\sigma}_i \equiv\frac{1}{\bar{a}}\, \sqrt{\sum_{j\in N_i} \frac{\left(a_j- \bar{a} \right)^2}{|N_i|-1}}, \end{equation} \tag{ 11 }$

where $\bar{a}(N_i)\equiv \sum_{j\in N_i} a_j/|N_i|$ is the mean area of the neighbors of the i-th cell. The result for (11) is shown in the lower right panel in fig. 2.

Figure 2 shows that all three quantities, A_i, $\bar{s}_i$ , and $\bar\sigma_i$ are quite successful in identifying edge cells. Therefore, it is prudent to compare quantitatively their performance, e.g., in terms of ROC curves [47]. For this purpose, we generate high statistics samples for (5), where we treat the edge cells as "signal" and the bulk cells as "background". We then plot the surviving signal fraction, $\varepsilon_S$ , vs. the surviving background fraction, $\varepsilon_B$ , for different values of the minimum cut on the selection variable (left panel in fig. 3). We observe that the scaled variance $\bar\sigma_i$ does best in the relevant range of very low $\varepsilon_B$ (due to the lower dimensionality of the edge features, the bulk cells typically greatly outnumber the edge cells). Thus, from now on we shall choose $\bar\sigma_i$ as our main selection variable. In the right panel of fig. 3 we show the variation of its ROC curve as a result of smearing of the initial data by 1% (blue) and 5% (green), due to the finite detector resolution.

**Fig. 3:** (Color online) Left: ROC curves based on the cell attributes illustrated in fig. 2. Right: the variation of the $\bar\sigma_i$ ROC curve as a result of smearing of the initial data by 1% (blue) and 5% (green).
Download figure:
Standard image

Voronoi relaxation via Lloyd's algorithm

Since we are dealing with a stochastic process, statistical fluctuations are inevitably present in the data. In particular, the lower right panel in fig. 2 reveals isolated pockets of bulk cells with relatively high values of $\bar\sigma_i$ . Here we propose to filter out such extraneous cells by first applying a few iterations of Lloyd's algorithm [48], where at each iteration, the generator point is moved to the centroid of the corresponding Voronoi cell and the tessellation is redone⁷. The left panel in fig. 4 shows the result of this operation after one Lloyd iteration. As expected, the Lloyd relaxation causes the Voronoi polygons to become more regularly shaped⁸. More importantly, the fluctuations within the bulk regions are washed out, thus increasing the contrast between edge cells and bulk cells.

**Fig. 4:** (Color online) Left: evolution of the Voronoi tessellation from fig. 2 after one Lloyd relaxation step. The cells are color-coded by the scaled variance (11). Right: a zoomed-in region near the vertical edge, showing the original generator points, and their subsequent locations in the course of several Lloyd iterations. The points are color-coded according to the dimensionless (scaled) displacement, $d_i/\sqrt{a_i}$ .
Download figure:
Standard image

**Fig. 4:** (Color online) Left: evolution of the Voronoi tessellation from fig. 2 after one Lloyd relaxation step. The cells are color-coded by the scaled variance (11). Right: a zoomed-in region near the vertical edge, showing the original generator points, and their subsequent locations in the course of several Lloyd iterations. The points are color-coded according to the dimensionless (scaled) displacement, $d_i/\sqrt{a_i}$ .
Download figure:
Standard image

Figure 4 reveals that Voronoi relaxation causes a net flow of the data points from the dense region (left) towards the sparse region (right). The right panel in fig. 4 takes a closer look at one representative area near the edge and shows the result of several successive Lloyd iterations. We see that each generator point, i, is displaced a certain distance d_i from its original location. It is interesting to note that the edge points appear to be displaced the farthest, which can be understood by analogy diffusion (or by considering a membrane between regions of unequal pressure). This observation suggests another criterion for selecting edge cells —based on their (properly normalized) displacement during the Lloyd relaxation.

In the left panel of fig. 5 we show several ROC curves $\varepsilon_S (\varepsilon_B)$ , for different values of the density ratio ρ, either with (solid) or without (dashed) Lloyd relaxation. We see that the algorithm works better for higher density contrasts between the two regions. Note also the significant improvement as a result of adding the Voronoi relaxation.

In order to quantify the accuracy of our selection of edge cells, we use the standard area under the curve [49] (AUROC) as represented by the Gini coefficient

$\begin{equation} G_1\equiv 2\, \text{AUROC} - 1 = 2\int_0^1 \text{d}\varepsilon_B \times \varepsilon_S (\varepsilon_B) -1, \end{equation} \tag{ 12 }$

where a value of 1 is obtained from the ROC curve of a perfectly discriminating variable, while a value of 0 corresponds to a totally random selection of events. The right panel of fig. 5 shows the dependence of G₁ on the number of Lloyd steps. We see that the accuracy improves very quickly within the first few iterations and reaches an optimum plateau, after which the power of the test is degraded as the Voronoi grid begins to asymptote to a regular hexagonal lattice.

An example from SUSY

As an application of the proposed edge detection method, we consider a standard benchmark example from SUSY, namely squark pair production at the 13 TeV LHC. For simplicity, we focus on asymmetric events in which one squark undergoes a long cascade decay through a heavy neutralino, $\tilde \chi^0_2$ ; a slepton, $\tilde \ell$ ; and a light neutralino, $\tilde\chi^0_1$ ; while the other decays directly to the LSP, $\tilde\chi^0_1$ . The mass spectrum we utilize has $m_{\tilde q}=400$ GeV, $m_{\tilde\chi^0_2}=300\ \text{GeV}$ , $m_{\tilde \ell}=280\ \text{GeV}$ , and $m_{\tilde\chi^0_1}=200\ \text{GeV}$ . The observed final-state particles are two jets and two leptons, whose invariant-mass distributions are well studied and are known to exhibit kinematic edges. Here we focus on the dilepton invariant mass, $m_{\ell\ell}$ , and the three-body jet-lepton-lepton invariant mass, $m_{j\ell\ell}$ . With the correct jet assignment, signal events are constrained to the region outlined by the solid black line in fig. 6 [11,50], where for plotting convenience we use the $(m^2_{\ell\ell}, (m^2_{j\ell\ell}-m^2_{\ell\ell})/6)$ -plane. Since we cannot measure the charge of the jet, there is a twofold combinatorial ambiguity, thus the plot contains two entries per event. We also include the main SM background from $t\bar{t}$ dilepton events.

**Fig. 6:** (Color online) Various Voronoi tessellations of the data from the SUSY example considered in the text, in the $(m^2_{\ell\ell}, (m^2_{j\ell\ell}-m^2_{\ell\ell})/6)$ -plane.
Download figure:
Standard image

In the upper two panels of fig. 6, the Voronoi cells are color-coded by their scaled variance (11). The left panel is the original data, while the right panel includes 5 Lloyd steps. In the lower left panel we reconsider the original data, but extend the calculation of (11) to include up to five tiers of nearest neighbors. We see that both Voronoi relaxation as well as including more tiers of neighbors have the benefit of reducing the fluctuations and sharpening the edge. The two procedures can also be done simultaneously —the lower right panel of fig. 6 shows the result after three Lloyd steps and including three tiers of neighbors.

The results from fig. 6 can be contrasted with the output from a more conventional edge detecting algorithm, e.g. the Canny edge detector [46] implemented in Mathematica [51], see fig. 7. The comparison of the two panels in fig. 7 reveals that the resolution of the Canny method is limited by the finite binning. At the same time, if the binning is too fine, the algorithm tends to produce spurious edge features, as seen in the right panel of fig. 7.

**Fig. 7:** (Color online) The result from applying the Canny edge detector [46] to the data in the upper left panel of fig. 6, binned in $10\times 10$ (left panel) and $30\times 30$ (right panel) bins. The plots were made using the `EdgeDetect()` function in `Mathematica` [51] with an appropriate adjusting of the thresholds.
Download figure:
Standard image

Summary and conclusions

In this letter, we have argued that the identification of new kinematic features in the data is an essential step in the discovery of physics beyond the standard model. A kinematic edge is a particularly important feature, therefore developing edge detection techniques which work a) for relatively sparse data, b) when the underlying distribution is unknown, and c) when the data is in more than two dimensions, is of paramount importance for a discovery. We have demonstrated (and advocated) the use of Voronoi methods for this purpose. At the same time, any edge-finding algorithm should be accompanied with a procedure for estimating the statistical significance of an edge feature, in order to guard against the possibility that the feature is simply due to statistical fluctuations in the data. Such procedures exist in the literature, see, e.g., [52], and will be investigated in a companion paper [23].

Looking forward, we believe that many elements of the analyses we describe here can be generalized to searches for other features, as well as for parameter measurements involving new particles, see, e.g., [9–13]. The great flexibility of Voronoi methods will be a blessing for the experimentalist; many useful properties of the Voronoi cells can be used to construct powerful variables tailored to specific new physics scenarios.

Acknowledgments

We thank S. Das, C. Kilic, Z. Liu, R. Lu, P. Ramond, X. Tata, J. Thaler, B. Tweedie, and D. Yaylali for useful discussions and acknowledge the use of SLAC computing resources. Work supported in part by U.S. Department of Energy Grants DE-SC0010296 and DE-SC0010504. DK acknowledges support by LHC-TI postdoctoral fellowship under grant NSF-PHY-0969510.

Edge detecting new physics the Voronoi way

Article metrics

Permissions

Author affiliations

Dates

Abstract

Introduction

Voronoi methods for edge detection

Voronoi relaxation via Lloyd's algorithm

An example from SUSY

Summary and conclusions

Acknowledgments

Footnotes

Edge detecting new physics the Voronoi way

Article metrics

Permissions

Share this article

Author affiliations

Dates

Abstract

Introduction

Voronoi methods for edge detection

Voronoi relaxation via Lloyd's algorithm

An example from SUSY

Summary and conclusions

Acknowledgments

Footnotes