Full length articleFinding cosmic voids and filament loops using topological data analysis
Introduction
The large-scale distribution of matter in the Universe forms a connected network known as the cosmic web (Klypin and Shandarin, 1993, Bond et al., 1996). Anisotropic gravitational collapse of matter has resulted in a picture where galaxy clusters form the nodes of this web and are interconnected by filaments, which form at the intersections of walls. The remaining majority of space is filled by cosmic voids: vast underdense regions that have experienced minimal non-linear growth of structure.
In the current era of multiband, high-resolution large-scale structure (LSS) surveys, there has been prolific investigation of the large dark matter halos at the nodes of the cosmic web, including cosmological analysis via the abundance of galaxy clusters (e.g., Vikhlinin et al., 2009, Mantz et al., 2015, Planck Collaboration et al., 2016a), and analysis of the large-scale matter distribution in dark matter halos via cosmic shear studies (e.g., Joudaki et al., 2018, Troxel et al., 2018) and the clustering of galaxies (e.g., Tinker et al., 2012, Cacciato et al., 2013). However, there has been a growing tension between cosmological constraints derived using cosmic microwave background (CMB) and LSS measurements (Planck Collaboration et al., 2016a, Joudaki et al., 2018, Troxel et al., 2018).
Recently, voids have begun to attract interest as a cosmological probe complementary to halos. Unlike halos, which are regions that have experienced high levels of growth and virialization that can partially destroy signatures of the primordial density field, voids have only evolved minimally. Information about the geometry of the initial density field present in the voids has the potential to help break degeneracies in the cosmological parameters and tighten their current constraints (Hamaus et al., 2016). Additionally, motivated by the growing tension between CMB and LSS measurements, void statistics may be able to provide a complementary probe of the growth of structure that is less sensitive to non-linear structure formation physics (Lavaux and Wandelt, 2010, Lavaux and Wandelt, 2012). Furthermore, methods have been proposed for utilizing voids to constrain dark energy (Pisani et al., 2015) and the sum of the neutrino masses (Kreisch et al., 2018, Massara et al., 2015), both of which have large effects on large-scale, low-density regions.
Various observables can be used for constraining cosmology through voids, some of which include weak gravitational lensing (Sánchez et al., 2017, Cai et al., 2017, Kaiser and Squires, 1993), the integrated Sachs–Wolfe effect in the CMB (Nadathur, 2016), redshift-space distortions (Cai et al., 2016, Hamaus et al., 2015), void ellipticities (Lee and Park, 2009), among others. Current datasets that are available and have been used to characterize some of the aforementioned void observables include galaxy redshift surveys such as the Sloan Digital Sky Survey (SDSS; Abolfathi et al., 2018) and the Dark Energy Survey (DES; Abbott et al., 2016) as well as CMB anisotropy maps from Planck (Planck Collaboration et al., 2016c). In the near future, next-generation galaxy surveys will go online, including the Large Synoptic Survey Telescope (LSST; Ivezić et al., 2008) and the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al., 2016), and the secondary CMB anisotropies will be observed to greater precision than ever by the Simons Observatory (SO; The Simons Observatory Collaboration et al., 2018) and the CMB-S4 experiment (Abazajian et al., 2016). In anticipation of these large upcoming datasets, many theoretical and computational approaches for identifying voids from an input simulation or galaxy survey have been and are currently being developed in order to characterize the potential future constraining power of void clustering and abundance statistics (Libeskind et al., 2018, Sutter et al., 2012, Neyrinck, 2008, Platen et al., 2007, Pranav et al., 2016 and references therein). The majority of these methods allow for the identification of the physical location of the voids in the matter field, enabling one to study clustering statistics such as the void two-point correlation function and abundance statistics such as the void mass and volume functions.
In this paper, we propose a method called Significant Cosmic Holes in Universe (SCHU)1 for relating the cosmic matter distribution to topology using persistent homology. Persistent homology quantifies and summarizes the shape of a dataset by its hole structure, and SCHU uses this information to assign a measure of statistical significance to the individual holes and records locations of the representations of these structures back in the data volume, which enables analysis of void clustering and abundance. The different dimensional homology groups are associated with different cosmic environment types. For example, connected components (0th-dimensional homology groups, ), loops (1st-dimensional homology groups, ), and low density 3D volumes (2nd-dimensional homology groups, ) are analogous to galaxy clusters, closed loops of filaments, and cosmic voids, respectively. Thus, cosmic voids can be identified as representations of homology group generators and newly-proposed filament loops can be identified as representations of homology group generators.
Topological methods have previously been employed in cosmology. For example, the topological evolution of the matter distribution of the Universe was studied in van de Weygaert et al. (2011) by analyzing the changing Betti numbers, which are ranks of different order homology groups (i.e., number of clusters, filament loops, and voids), across a filtration, which is an indexed sequence of nested sets, constructed using alpha shapes2 ; in particular, they demonstrated that the Betti numbers across the filtration can be used to distinguish the matter distribution resulting from different dark energy models. Additionally, Pranav et al. (2016) introduced a multiscale topological measurement of the cosmic matter distribution and explored the analysis of Betti numbers and topological persistence of different cosmological models. A scale-free and parameter-free method for identifying the cosmic environments (voids, walls, filaments, nodes) called Discrete Persistent Structures Extractor (DisPerSE) was proposed in Sousbie (2011). DisPerSE computes the discrete Morse–Smale complex of a spatial dataset using the Delaunay tessellation field estimator (DTFE) technique (Schaap and van de Weygaert, 2000, van de Weygaert and Schaap, 2009). The mathematical background and algorithm implementation is described in Sousbie (2011) and applications to 3D simulation datasets and observed galaxy surveys are found in Sousbie et al. (2011).
As noted previously, persistent homology is a tool within topological data analysis (TDA) that finds different dimensional holes in data (e.g. connected components, loops, and voids) and summarizes the generators by their lifetime in a particular filtration. These persistence diagrams and their associated Betti numbers can then be used for various types of statistical inference or as inputs into machine learning algorithms (Reininghaus et al., 2015). Persistent homology has proven to be useful in a variety of applications, such as natural language processing (Zhu, 2013), computational biology (Xia and Wei, 2014), Lyman-alpha forest studies (Cisewski et al., 2014), angiography (Bendich et al., 2016), and dynamical systems (Emrani et al., 2014).
Though useful for summarizing information for complex data, one shortcoming of persistent homology is that the homology group generators identified are not uniquely mapped back into the data volume. This is because the homology group generators displayed on the summary diagrams each represent an equivalence class of representations of that particular hole. SCHU uses the output of the persistent homology algorithm (Edelsbrunner et al., 2002, Zomorodian and Carlsson, 2005) in order to find a representation of the equivalence class back in the original data volume. SCHU detects and captures the locations of cosmic voids ( generators) along with another cosmic structure that we call filament loops ( generators). Filament loops are formed when filaments are connected together in such a way that a loop forms, surrounding an empty or low density region, as shown in Fig. 1. Thus, SCHU, and the persistent homology underlying SCHU, enable analysis of the cosmological density field: the void and filament loop locations and sizes enable the standard clustering and abundance statistics, and the persistence diagrams and Betti numbers provide additional topological summary statistics of the density field that can be used to further discriminate between cosmological models.
This article is organized as follows. In Section 2, we provide an overview of the formalism of persistent homology, describing filtrations, persistence diagrams, and bootstrap confidence bands. In Section 3, we present SCHU for identifying statistically significant voids and filament loops in astronomical datasets. In Section 4, we test the void identification capabilities of SCHU on Voronoi foam simulation data, which is generated such that the ground truth void locations are known. In Section 5, we apply SCHU to identify voids and filament loops in a subset of the SDSS galaxy survey dataset. Additionally, we identify voids and filament loops in the cosmological N-body simulation from Libeskind et al. (2018) and compare the void locations to those found by other methods. We then study the Betti numbers of two simulations from the MassiveNuS simulation suite (Liu et al., 2018). Finally, in Section 6, we summarize our results and provide concluding remarks.
Section snippets
Background
Homology describes different dimensional holes of a manifold. To be specific, the generators of describe connected components, the generators of describe closed loops, and the generators of describe voids (i.e. low-density or empty regions). Put into the context of cosmic web environments, the generators represent clusters of galaxies, the generators represent filaments that form loops, and the generators represent cosmic voids. Fig. 2 illustrates an example of and : Fig.
Method
The SCHU code consists of four main steps described in Algorithm 1, and the persistent homology computation is performed using the TDA package (Fasy et al., 2014a). Below, we describe two key steps of SCHU in further detail: (i) computing -values of the homology group generators of a dataset by adapting the framework from Section 2.3 and (ii) addressing how to find a representation (i.e. physical locations and boundaries) of the and homology group generators from the persistence diagram
Application to Voronoi foam data
In order to demonstrate the performance of SCHU for finding statistically significant generators on a persistence diagram and then locating those generators in the original data, we consider a simulation study using data that mimic the large-scale structure of the Universe and focus on locating cosmic voids using generator representations. In the simulation study, we know the ground truth of where the voids are located and so can test its ability to find the true voids.
The generated data
Comparison studies
In this section, SCHU is applied to galaxy survey and N-body simulation datasets in order to compare to several other void-finding techniques. We first study the results of SCHU as applied to a subset of the Sloan Digital Sky Survey (SDSS) galaxy catalog (Strauss et al., 2002) used in Sutter et al. (2012). Next, we apply SCHU to the dark matter halo catalog from a cosmological simulation that is used in the cosmic web identification comparison study from Libeskind et al. (2018). Finally, we
Conclusions and discussions
In this work, we present a novel method, SCHU, for applying modern statistical methods from topological data analysis, specifically persistent homology, to identify filament loops and cosmic voids in astronomical datasets. While previous works used topological ideas to explore the underlying matter density field in order to study its Betti numbers and topological persistence (Pranav et al., 2016), SCHU strengthens and extends this by assigning -values to individually-identified homology group
Acknowledgments
The authors thank Jisu Kim, Alessandra Rindalo, and Larry Wasserman for helpful discussions in the early stages of this work. The authors thank the Yale Center for Research Computing for guidance and use of the research computing infrastructure.
References (84)
- Abazajian, K.N., Adshead, P., Ahmed, Z., Allen, S.W., Alonso, D., Arnold, K.S., Baccigalupi, C., Bartlett, J.G.,...
- et al.
Cosmology from cosmic shear with dark energy survey science verification data
Phys. Rev. D
(2016) - et al.
The fourteenth data release of the sloan digital sky survey: First spectroscopic data from the extended baryon oscillation spectroscopic survey and from the second phase of the apache point observatory galactic evolution experiment
Astrophys. J. Suppl.
(2018) - et al.
The multiscale morphology filter: identifying and extracting spatial patterns in the galaxy distribution
Astron. Astrophys.
(2007) - et al.
The spine of the cosmic web
Astrophys. J.
(2010) - et al.
The hierarchical nature of the spin alignment of dark matter haloes in filaments
Mon. Not. R. Astron. Soc. Lett.
(2014) - et al.
The ROCKSTAR phase-space temporal halo finder and the velocity offsets of cluster cores
Astrophys. J.
(2013) - et al.
Persistent homology analysis of brain artery trees
Ann. Appl. Stat.
(2016) - Berry, E., Chen, Y.-C., Cisewski-Kehe, J., Fasy, B.T., 2018. Functional Summaries of Persistence...
- et al.
How filaments of galaxies are woven into the cosmic web
Nature
(1996)