Elsevier

Astronomy and Computing

Volume 27, April 2019, Pages 34-52
Astronomy and Computing

Full length article
Finding cosmic voids and filament loops using topological data analysis

https://doi.org/10.1016/j.ascom.2019.02.003Get rights and content

Abstract

We present a method called Significant Cosmic Holes in Universe (SCHU) for identifying cosmic voids and loops of filaments in cosmological datasets and assigning their statistical significance using techniques from topological data analysis. In particular, persistent homology is used to find different dimensional holes. For dark matter halo catalogs and galaxy surveys, the 0-, 1-, and 2-dimensional holes can be identified with connected components (i.e. clusters), loops of filaments, and voids. The procedure overlays dark matter halos/galaxies on a three-dimensional grid, and a distance-to-measure (DTM) function is calculated at each point of the grid. A nested set of simplicial complexes (a filtration) is generated over the lower-level sets of the DTM across increasing threshold values. The filtered simplicial complex can then be used to summarize the birth and death times of the different dimension homology group generators (i.e., the holes). Persistent homology summary diagrams, called persistence diagrams, are produced from the dimension, birth times, and death times of each homology group generator. Using the persistence diagrams and bootstrap sampling, we explain how p-values can be assigned to each homology group generator. The homology group generators on a persistence diagram are not, in general, uniquely located back in the original dataset volume so we propose a method for finding a representation of the homology group generators. This method provides a novel, statistically rigorous approach for locating informative generators in cosmological datasets, which may be useful for providing complementary cosmological constraints on the effects of, for example, the sum of the neutrino masses. The method is tested on a Voronoi foam simulation, and then subsequently applied to a subset of the SDSS galaxy survey and a cosmological simulation. Lastly, we calculate Betti functions for two of the MassiveNuS simulations and discuss implications for using the persistent homology of the density field to help break degeneracy in the cosmological parameters.

Introduction

The large-scale distribution of matter in the Universe forms a connected network known as the cosmic web (Klypin and Shandarin, 1993, Bond et al., 1996). Anisotropic gravitational collapse of matter has resulted in a picture where galaxy clusters form the nodes of this web and are interconnected by filaments, which form at the intersections of walls. The remaining majority of space is filled by cosmic voids: vast underdense regions that have experienced minimal non-linear growth of structure.

In the current era of multiband, high-resolution large-scale structure (LSS) surveys, there has been prolific investigation of the large dark matter halos at the nodes of the cosmic web, including cosmological analysis via the abundance of galaxy clusters (e.g., Vikhlinin et al., 2009, Mantz et al., 2015, Planck Collaboration et al., 2016a), and analysis of the large-scale matter distribution in dark matter halos via cosmic shear studies (e.g., Joudaki et al., 2018, Troxel et al., 2018) and the clustering of galaxies (e.g., Tinker et al., 2012, Cacciato et al., 2013). However, there has been a growing tension between cosmological constraints derived using cosmic microwave background (CMB) and LSS measurements (Planck Collaboration et al., 2016a, Joudaki et al., 2018, Troxel et al., 2018).

Recently, voids have begun to attract interest as a cosmological probe complementary to halos. Unlike halos, which are regions that have experienced high levels of growth and virialization that can partially destroy signatures of the primordial density field, voids have only evolved minimally. Information about the geometry of the initial density field present in the voids has the potential to help break degeneracies in the cosmological parameters and tighten their current constraints (Hamaus et al., 2016). Additionally, motivated by the growing tension between CMB and LSS measurements, void statistics may be able to provide a complementary probe of the growth of structure that is less sensitive to non-linear structure formation physics (Lavaux and Wandelt, 2010, Lavaux and Wandelt, 2012). Furthermore, methods have been proposed for utilizing voids to constrain dark energy (Pisani et al., 2015) and the sum of the neutrino masses mν (Kreisch et al., 2018, Massara et al., 2015), both of which have large effects on large-scale, low-density regions.

Various observables can be used for constraining cosmology through voids, some of which include weak gravitational lensing (Sánchez et al., 2017, Cai et al., 2017, Kaiser and Squires, 1993), the integrated Sachs–Wolfe effect in the CMB (Nadathur, 2016), redshift-space distortions (Cai et al., 2016, Hamaus et al., 2015), void ellipticities (Lee and Park, 2009), among others. Current datasets that are available and have been used to characterize some of the aforementioned void observables include galaxy redshift surveys such as the Sloan Digital Sky Survey (SDSS; Abolfathi et al., 2018) and the Dark Energy Survey (DES; Abbott et al., 2016) as well as CMB anisotropy maps from Planck (Planck Collaboration et al., 2016c). In the near future, next-generation galaxy surveys will go online, including the Large Synoptic Survey Telescope (LSST; Ivezić et al., 2008) and the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al., 2016), and the secondary CMB anisotropies will be observed to greater precision than ever by the Simons Observatory (SO; The Simons Observatory Collaboration et al., 2018) and the CMB-S4 experiment (Abazajian et al., 2016). In anticipation of these large upcoming datasets, many theoretical and computational approaches for identifying voids from an input simulation or galaxy survey have been and are currently being developed in order to characterize the potential future constraining power of void clustering and abundance statistics (Libeskind et al., 2018, Sutter et al., 2012, Neyrinck, 2008, Platen et al., 2007, Pranav et al., 2016 and references therein). The majority of these methods allow for the identification of the physical location of the voids in the matter field, enabling one to study clustering statistics such as the void two-point correlation function and abundance statistics such as the void mass and volume functions.

In this paper, we propose a method called Significant Cosmic Holes in Universe (SCHU)1 for relating the cosmic matter distribution to topology using persistent homology. Persistent homology quantifies and summarizes the shape of a dataset by its hole structure, and SCHU uses this information to assign a measure of statistical significance to the individual holes and records locations of the representations of these structures back in the data volume, which enables analysis of void clustering and abundance. The different dimensional homology groups are associated with different cosmic environment types. For example, connected components (0th-dimensional homology groups, H0), loops (1st-dimensional homology groups, H1), and low density 3D volumes (2nd-dimensional homology groups, H2) are analogous to galaxy clusters, closed loops of filaments, and cosmic voids, respectively. Thus, cosmic voids can be identified as representations of H2 homology group generators and newly-proposed filament loops can be identified as representations of H1 homology group generators.

Topological methods have previously been employed in cosmology. For example, the topological evolution of the matter distribution of the Universe was studied in van de Weygaert et al. (2011) by analyzing the changing Betti numbers, which are ranks of different order homology groups (i.e., number of clusters, filament loops, and voids), across a filtration, which is an indexed sequence of nested sets, constructed using alpha shapes2 ; in particular, they demonstrated that the Betti numbers across the filtration can be used to distinguish the matter distribution resulting from different dark energy models. Additionally, Pranav et al. (2016) introduced a multiscale topological measurement of the cosmic matter distribution and explored the analysis of Betti numbers and topological persistence of different cosmological models. A scale-free and parameter-free method for identifying the cosmic environments (voids, walls, filaments, nodes) called Discrete Persistent Structures Extractor (DisPerSE) was proposed in Sousbie (2011). DisPerSE computes the discrete Morse–Smale complex of a spatial dataset using the Delaunay tessellation field estimator (DTFE) technique (Schaap and van de Weygaert, 2000, van de Weygaert and Schaap, 2009). The mathematical background and algorithm implementation is described in Sousbie (2011) and applications to 3D simulation datasets and observed galaxy surveys are found in Sousbie et al. (2011).

As noted previously, persistent homology is a tool within topological data analysis (TDA) that finds different dimensional holes in data (e.g. connected components, loops, and voids) and summarizes the generators by their lifetime in a particular filtration. These persistence diagrams and their associated Betti numbers can then be used for various types of statistical inference or as inputs into machine learning algorithms (Reininghaus et al., 2015). Persistent homology has proven to be useful in a variety of applications, such as natural language processing (Zhu, 2013), computational biology (Xia and Wei, 2014), Lyman-alpha forest studies (Cisewski et al., 2014), angiography (Bendich et al., 2016), and dynamical systems (Emrani et al., 2014).

Though useful for summarizing information for complex data, one shortcoming of persistent homology is that the homology group generators identified are not uniquely mapped back into the data volume. This is because the homology group generators displayed on the summary diagrams each represent an equivalence class of representations of that particular hole. SCHU uses the output of the persistent homology algorithm (Edelsbrunner et al., 2002, Zomorodian and Carlsson, 2005) in order to find a representation of the equivalence class back in the original data volume. SCHU detects and captures the locations of cosmic voids (H2 generators) along with another cosmic structure that we call filament loops (H1 generators). Filament loops are formed when filaments are connected together in such a way that a loop forms, surrounding an empty or low density region, as shown in Fig. 1. Thus, SCHU, and the persistent homology underlying SCHU, enable analysis of the cosmological density field: the void and filament loop locations and sizes enable the standard clustering and abundance statistics, and the persistence diagrams and Betti numbers provide additional topological summary statistics of the density field that can be used to further discriminate between cosmological models.

This article is organized as follows. In Section 2, we provide an overview of the formalism of persistent homology, describing filtrations, persistence diagrams, and bootstrap confidence bands. In Section 3, we present SCHU for identifying statistically significant voids and filament loops in astronomical datasets. In Section 4, we test the void identification capabilities of SCHU on Voronoi foam simulation data, which is generated such that the ground truth void locations are known. In Section 5, we apply SCHU to identify voids and filament loops in a subset of the SDSS galaxy survey dataset. Additionally, we identify voids and filament loops in the cosmological N-body simulation from Libeskind et al. (2018) and compare the void locations to those found by other methods. We then study the Betti numbers of two simulations from the MassiveNuS simulation suite (Liu et al., 2018). Finally, in Section 6, we summarize our results and provide concluding remarks.

Section snippets

Background

Homology describes different dimensional holes of a manifold. To be specific, the generators of H0 describe connected components, the generators of H1 describe closed loops, and the generators of H2 describe voids (i.e. low-density or empty regions). Put into the context of cosmic web environments, the H0 generators represent clusters of galaxies, the H1 generators represent filaments that form loops, and the H2 generators represent cosmic voids. Fig. 2 illustrates an example of H0 and H1: Fig.

Method

The SCHU code consists of four main steps described in Algorithm 1, and the persistent homology computation is performed using the TDA package (Fasy et al., 2014a). Below, we describe two key steps of SCHU in further detail: (i) computing p-values of the homology group generators of a dataset by adapting the framework from Section 2.3 and (ii) addressing how to find a representation (i.e. physical locations and boundaries) of the H1 and H2 homology group generators from the persistence diagram

Application to Voronoi foam data

In order to demonstrate the performance of SCHU for finding statistically significant generators on a persistence diagram and then locating those generators in the original data, we consider a simulation study using data that mimic the large-scale structure of the Universe and focus on locating cosmic voids using H2 generator representations. In the simulation study, we know the ground truth of where the voids are located and so can test its ability to find the true voids.

The generated data

Comparison studies

In this section, SCHU is applied to galaxy survey and N-body simulation datasets in order to compare to several other void-finding techniques. We first study the results of SCHU as applied to a subset of the Sloan Digital Sky Survey (SDSS) galaxy catalog (Strauss et al., 2002) used in Sutter et al. (2012). Next, we apply SCHU to the dark matter halo catalog from a cosmological simulation that is used in the cosmic web identification comparison study from Libeskind et al. (2018). Finally, we

Conclusions and discussions

In this work, we present a novel method, SCHU, for applying modern statistical methods from topological data analysis, specifically persistent homology, to identify filament loops and cosmic voids in astronomical datasets. While previous works used topological ideas to explore the underlying matter density field in order to study its Betti numbers and topological persistence (Pranav et al., 2016), SCHU strengthens and extends this by assigning p-values to individually-identified homology group

Acknowledgments

The authors thank Jisu Kim, Alessandra Rindalo, and Larry Wasserman for helpful discussions in the early stages of this work. The authors thank the Yale Center for Research Computing for guidance and use of the research computing infrastructure.

References (84)

  • Abazajian, K.N., Adshead, P., Ahmed, Z., Allen, S.W., Alonso, D., Arnold, K.S., Baccigalupi, C., Bartlett, J.G.,...
  • AbbottT. et al.

    Cosmology from cosmic shear with dark energy survey science verification data

    Phys. Rev. D

    (2016)
  • AbolfathiB. et al.

    The fourteenth data release of the sloan digital sky survey: First spectroscopic data from the extended baryon oscillation spectroscopic survey and from the second phase of the apache point observatory galactic evolution experiment

    Astrophys. J. Suppl.

    (2018)
  • Aragón-CalvoM.A. et al.

    The multiscale morphology filter: identifying and extracting spatial patterns in the galaxy distribution

    Astron. Astrophys.

    (2007)
  • Aragón-CalvoM.A. et al.

    The spine of the cosmic web

    Astrophys. J.

    (2010)
  • Aragon-CalvoM.A. et al.

    The hierarchical nature of the spin alignment of dark matter haloes in filaments

    Mon. Not. R. Astron. Soc. Lett.

    (2014)
  • BehrooziP.S. et al.

    The ROCKSTAR phase-space temporal halo finder and the velocity offsets of cluster cores

    Astrophys. J.

    (2013)
  • BendichP. et al.

    Persistent homology analysis of brain artery trees

    Ann. Appl. Stat.

    (2016)
  • Berry, E., Chen, Y.-C., Cisewski-Kehe, J., Fasy, B.T., 2018. Functional Summaries of Persistence...
  • BondJ.R. et al.

    How filaments of galaxies are woven into the cosmic web

    Nature

    (1996)
  • BosE.P. et al.

    The darkness that shaped the void: dark energy and cosmic voids

    Mon. Not. R. Astron. Soc.

    (2012)
  • CacciatoM. et al.

    Cosmological constraints from a combination of galaxy clustering and lensing - III. Application to SDSS data

    Mon. Not. R. Astron. Soc.

    (2013)
  • CaiY.-C. et al.

    The lensing and temperature imprints of voids on the cosmic microwave background

    Mon. Not. R. Astron. Soc.

    (2017)
  • CaiY.-C. et al.

    Redshift-space distortions around voids

    Mon. Not. R. Astron. Soc.

    (2016)
  • CautunM. et al.

    Evolution of the cosmic web

    Mon. Not. R. Astron. Soc.

    (2014)
  • CautunM. et al.

    NEXUS: tracing the cosmic web connection

    Mon. Not. R. Astron. Soc.

    (2012)
  • ChazalF. et al.

    Geometric inference for probability measures

    Found. Comput. Math.

    (2011)
  • ChazalF. et al.

    Robust topological inference: Distance to a measure and kernel distance

    J. Mach. Learn. Res.

    (2017)
  • CisewskiJ. et al.

    Non-parametric 3d map of the intergalactic medium using the lyman-alpha forest

    Mon. Not. R. Astron. Soc.

    (2014)
  • ClevelandW.S.

    Robust locally weighted regression and smoothing scatterplots

    J. Amer. Stat. Association

    (1979)
  • Cohen-SteinerD. et al.

    Vines and vineyards by updating persistence in linear time

  • ColbergJ.M. et al.

    The aspen–amsterdam void finder comparison project

    Mon. Not. R. Astron. Soc.

    (2008)
  • De SilvaV. et al.

    Dualities in persistent (co)homology

    Inverse Problems

    (2011)
  • DESI Collaboration, ., Aghamousa, A., Aguilar, J., Ahlen, S., Alam, S., Allen, L.E., Allende Prieto, C., Annis, J.,...
  • EdelsbrunnerH. et al.

    Computational Topology: An Introduction

    (2010)
  • EdelsbrunnerH. et al.

    On the shape of a set of points in the plane

    IEEE Trans. Inf. Theory

    (1983)
  • Edelsbrunner et al.

    Topological persistence and simplification

    Discrete Comput. Geom.

    (2002)
  • EmraniS. et al.

    Persistent homology of delay embeddings and its application to wheeze detection

    IEEE Signal Proc. Let.

    (2014)
  • FalckB. et al.

    The persistent percolation of single-stream voids

    Mon. Not. R. Astron. Soc.

    (2015)
  • FalckB.L. et al.

    ORIGAMI: delineating halos using phase-space folds

    Astrophys. J.

    (2012)
  • Fasy, B.T., Kim, J., Lecci, F., Maria, C., 2014a. Introduction to the R package...
  • FasyB.T. et al.

    Confidence sets for persistence diagrams

    Ann. Statist.

    (2014)
  • Forero-RomeroJ. et al.

    A dynamical classification of the cosmic web

    Mon. Not. R. Astron. Soc.

    (2009)
  • HahnO. et al.

    Properties of dark matter haloes in clusters, filaments, sheets and voids

    Mon. Not. R. Astron. Soc.

    (2007)
  • HamausN. et al.

    Constraints on cosmology and gravity from the dynamics of voids

    Phys. Rev. Lett.

    (2016)
  • HamausN. et al.

    Probing cosmology and gravity with redshift-space distortions around voids

    J. Cosmol. Astropart. Phys.

    (2015)
  • HoffmanY. et al.

    A kinematic classification of the cosmic web

    Mon. Not. R. Astron. Soc.

    (2012)
  • HuchraJ. et al.

    Groups of galaxies. I-Nearby groups

    Astrophys. J.

    (1982)
  • IckeV. et al.

    Fragmenting the universe

    Astron. Astrophys.

    (1987)
  • IckeV. et al.

    The galaxy distribution as a Voronoi foam

    Q. J. R. Astron. Soc.

    (1991)
  • Ivezić, Ž., Kahn, S.M., Tyson, J.A., Abel, B., Acosta, E., Allsman, R., Alonso, D., AlSayyad, Y., Anderson, S.F.,...
  • JoudakiS. et al.

    KiDS-450 + 2dFLenS: Cosmological parameter constraints from weak gravitational lensing tomography and overlapping redshift-space galaxy clustering

    Mon. Not. R. Astron. Soc.

    (2018)
  • Cited by (0)

    View full text