Avila Perez Santiago Javier
Avila Perez Santiago Javier
of the Universe:
from simulations to observations
by Santiago Javier Ávila Pérez
PhD Thesis in Theoretical Physics
May 6, 2016
supervised by
Alexander Knebe &
Juan Garcı́a-Bellido Capdevila
Facultad de Ciencias
Departamento de Fı́sica Teórica
&
Instituto de Fı́sica Teórica (UAM-CSIC)
A mi familia,
que luchó por verme llegar aquı́.
3
La Estructura a Gran Escala
del Universo:
Simulando las Observaciones
5
Authorship
[3] nIFTy Cosmology: galaxy/halo mock catalogue comparison project on clustering statistics
Chuang C.-H., Zhao C., Prada F., Munari E., Avila S., Itzard, A., et al. Kitaura F.S., Monaco, P.,
Murray S., Knebe A., Scoccola C.G., Yepes G., Garcia-Bellido J., Marin F., Muller V., et al.
2015 MNRAS 452 p. 686-700
7
Contents
Authorship 6
Prólogo 12
Preface 23
9
10 Contents
Closure 139
Epı́logo 142
Bibliography 146
Prólogo
ȧ
q
H(a) = = H0 (Ωc + Ωb ) a−3 + Ωrad a−4 + Ωk a−2 + ΩDE a−3(1+w) (1)
a
donde Ωi representa el parámetro de densidad de la especie i. Es decir, el cociente
3H 2
entre la densidad ρi y la densidad crı́tica ρcrit = 8πG0 en la época actual.
Los experimentos encuentran una componente de materia ordinaria (o bariónica)
de Ωb = 0.049, y una pequeña componente de radiación (fotones y neutrinos) de
Ωrad = 9 · 10−5 . La novedad con respecto a la antigua teorı́a del Big Bang es el efecto
dominante de dos nuevas especies: la Materia Oscura Frı́a (Ωc = 0.266) y la Energı́a
Oscura (ΩDE = 0.685). En el Modelo Estándar ΛCDM, la componente de curvatura
es insignificante (Ωk = 0), y la ecuación de estado de la Energia Oscura es w = −1,
que corresponde a una Constante Cosmológica Λ. La velocidad de expansión actual
del Universo es H0 = 67.3(km/s)/M pc. Todos los valores citados proceden de [4].
La Materia Oscura Frı́a (CDM, en adelante todas las siglas tienen su origen en la
nomenclatura en lengua inglesa) es indistinguible de la materia ordinaria en su com-
portamiento gravitatorio y, por tanto, la forma en que afecta a la expansión del
Universo (Ecuación 1). Sin embargo, es necesaria una componente de materia acol-
13
14 Prólogo
w = w0 + (1 − a)wa (2)
teria bariónica a esas escalas es limitado y puede estar detrás de la causa (e.g. [12]).
De hecho, con nuevas observaciones se están encontrando soluciones a algunos de
los problemas que llevaban tiempo esperando respuesta (e.g. el descubrimiento de
nuevas galaxias satélites [13]). Respecto a los otros dos casos, estimar la probabili-
dad de eventos extraños (la existencia de cúmulos extremos y las peculiaridades del
CMB), una vez que sabemos que ocurren, puede ser una tarea complicada y subje-
tiva. Nuevas estimaciones de otros grupos encuentran que las citadas anomalı́as son
compatibles con ΛCDM [14–16].
La Revolución Cosmológica
Energy Survey (DES)3 [19], y en la derecha con los datos del futuro cartografiado Eu-
clid4 [20]. El principal objetivo de estos dos experimentos es medir con una precisión
sin precedentes la ecuación de estado de la Energı́a Oscura y su variación temporal
(Ecuación 2). Esto nos ayudará a comprender la naturaleza de esa misteriosa fuerza
que domina la densidad de energı́a del Universo.
Como hemos visto en la parte superior de la Figura 1, tres tipos principales de
experimentos han contribuido en las primeras etapas de la Cosmologı́a de Precisión.
A continuación, los repasaré brevemente.
Las Supernovas de tipo Ia (SNIa) son explosiones violentas que pueden ser us-
adas como candelas estándares (objetos cuya luminosidad es conocida) y observadas
a distancias cosmológicas. Midiendo su desplazamiento al rojo o redshift z, podemos
estudiar la relación entre éste y la distancia de luminosidad:
z
c dz 0
Z
dL (z) = (1 + z) (3)
0 H(z 0 )
que depende fuertemente de los parámteros cosmológicos que determinan la evolución
del Universo ([21, 22]).
Hacia el final del pasado siglo (1998-1999), los equipos de High-Z Supernova Search
Team5 y Supernova Cosmology Project6 midieron dL (z), encontrando que –al con-
trario de lo que se esperaba– la expansión del Universo se estaba acelerando [23, 24].
Esto suposo la primera de una serie de evidencias acerca de la existecia de la Energı́a
Oscura.
Z adec
c da
χBAO =√ p (4)
3 0 a2 H(a) 1 + 3Ωb /(4Ωγ )
z
c dz 0
Z
1
dA (z) = (5)
1+z 0 H(z 0 )
que se relaciona con la Ecuación 3 a través de dM (z) = dA (z)·(1+z) = dL (z)/(1+z).
En la Figura 2 se muestra conjuntamente la señal del BAO y de las SNIa.
A pesar de que entender profundamente la fı́sica altamente no lı́neal asociada a
la formación de estructuras y su relación con los observables no es tarea fácil, la
fı́sica que determina χBAO se basa en princios mucho más sencillos. De este modo
las señales del BAO medidas en la Estructura a Gran Escala (LSS) se convirtieron
pronto en una fuente muy provechosa para determinar los parámetros cosmológicos.
El BAO fue detectado por primera vez en la distribución de galaxias por las co-
laboraciones 2dFGRS7 [27] y SDSS8 [28]. Más adelante otras medidas más precisas
fueron establecidas por 6dFGS9 [29], WiggleZ10 [30] y BOSS11 [31]. Además de estas
medidas, BOSS midió el BAO a través de los llamados bosques de Lyman-α: re-
costruyendo la distribución tridimensional de la posición de las balsas de hidrógeno
7
http://www.2dfgrs.net/
8
http://www.sdss.org/
9
http://www.6dfgs.net/
10
http://wigglez.swin.edu.au/site/
11
http://Cosmology.lbl.gov/BOSS/
18 Prólogo
neutro intergaláctico que dejan lı́neas de absorción en los espectros de cuásares le-
janos [32, 33].
A concordance model has been reached in the field of Cosmology, able to reconcile
all the cosmological experiments: ΛCDM. This model is based on the principles of
the Hot Big Bang theory, that explains the expansion of the Universe through the
Friedman equation, which can be expressed as
ȧ
q
H(a) = = H0 (Ωc + Ωb ) a−3 + Ωrad a−4 + Ωk a−2 + ΩDE a−3(1+w) (1)
a
with Ωi representing the density parameter of the species i. This is, the ratio of the
3H 2
density ρi to the critical density ρcrit = 8πG0 at the current epoch.
Experiments find a component of ordinary (baryonic) matter of Ωb = 0.049 and a
small component of radiation (photons and neutrinos) of Ωrad = 9·10−5 . The novelty
with respect to the old Big Bang theory is the dominant effect of two new species:
the Cold Dark Matter (Ωc = 0.266) and the Dark Energy (ΩDE = 0.685). In the
standard ΛCDM model, the curvature is negligible (Ωk = 0) and the equation of
state of Dark Energy is w = −1, corresponding to a Cosmological Constant Λ. The
measured current expansion rate is H0 = 67.3(km/s)/M pc (all quoted values from
[4]).
The Cold Dark Matter is indistinguishable from ordinary matter in its gravitational
behaviour and, hence, in the way it affects the expansion of the Universe (Equa-
tion 1). However, we need a non-baryonic collisionless component to explain the
formation of the structures that we find in the Universe. Its presence has been
23
24 Preface
w = w0 + (1 − a)wa (2)
Figure 2: Distance-redshift relation. Compilation of SNIa from [42] (2008) and BAO
measurements from [29–31, 33]. Figure from C. Blake in [43].
Cosmological Revolution
represented in the middle panels. The top panes used to appear in any presentation
slightly related to Cosmology, nowadays, Planck data have taken over that role. But
there is still much work to be done in the future, as we keep advancing towards
Precision Cosmology. The two bottom panels show constraints forecast for the next
decade: on the left for the completed Dark Energy Survey17 [19] and on the right
for the future survey Euclid18 [20]. The main target of both of these experiments
is to set unprecedented constraints on the time-dependent equation of state of Dark
Energy (Equation 2). This will help us understanding the nature of this mysterious
force that dominates the energy density of the Universe.
As seen at the top of Figure 1, three main type of experiments contributed to the
early stages of Precision Cosmology. I will briefly review them below.
Type Ia Supernova (SNIa) are violent explosions that can be used as standard
candles and detected at cosmological distances. Measuring their redshift z, we can
study the luminosity distance-redshift relation
z
c dz 0
Z
dL (z) = (1 + z) (3)
0 H(z 0 )
highly dependent on the cosmological parameters that determine the evolution of
the late Universe ([21, 22]).
At the end of the last century (1998-1999) the High-Z Supernova Search Team19
and Supernova Cosmology Project20 measured dL (z) determining that instead of
decelerating –as it was expected– the expansion of the universe was accelerating
[23, 24]. This was the first evidence for Dark Energy.
the fluid following sound-waves caused by the pressure gradients. Eventually, the
Universe becomes neutral at recombination and baryonic matter and photons stop
interacting shortly after: at decoupling. At this moment, oscillations also freeze and
leave imprinted the scale of the sound horizon χBAO in the distribution of matter
[25, 26]:
Z adec
c da
χBAO =√ p (4)
3 0 a2 H(a) 1 + 3Ωb /(4Ωγ )
being adec the scale factor at decoupling.
This scale can be found at different cosmological times in the distribution of galaxies
(and other tracers of matter) as a bump in the correlation function at large scales.
We can use it as a standard ruler to determine the angular distance-redshift relation
z
c dz 0
Z
1
dA (z) = (5)
1+z 0 H(z 0 )
being related to Equation 3 through dM (z) = dA (z) · (1 + z) = dL (z)/(1 + z). Both
BAO and SNIa measurements are shown together in Figure 2.
Even though disentangling the highly non-linear physics involved in galaxy and struc-
ture formation might be arduous, BAO measurements from the Large Scale Structure
(LSS) became soon a very powerful tool to constrain Cosmology. This is partially due
to the size of χBAO relying on more basic principles. BAO was first detected in the
galaxy distribution by the 2dFGRS collaboration21 [27] and SDSS22 [28], later more
precise measurements were performed by 6dFGS23 [29], WiggleZ24 [30] and BOSS25
[31]. Additionally, BOSS measured BAO from Lyman-α forests: 3D reconstruction
of intergalactic blobs of neutral hydrogen that imprint absorption lines in spectra
from distant quasars [32, 33].
21
http://www.2dfgrs.net/
22
http://www.sdss.org/
23
http://www.6dfgs.net/
24
http://wigglez.swin.edu.au/site/
25
http://Cosmology.lbl.gov/BOSS/
28 Preface
Currently, the strongest constraints on Cosmology are set by CMB. This is in part
due to the fact that CMB physics can be easily understood and modelled from linear
perturbation theory, this boosted these type of experiments, pioneering Precision
Cosmology. But its information exploitation reached a maximum and now attention
focuses in higher order and more subtle effects (polarization, spectral distortions,
etc.)
On the other hand, our understanding of the Large Scale Structure (LSS) is notably
improving with time. This is due to a better control of the systematic experimental
errors, more precise and specialised instruments, but also due to a better under-
standing of the astrophysics involved and a more precise knowledge of the Cosmology
behind.
26
http://lambda.gsfc.nasa.gov/product/cobe/
27
http://map.gsfc.nasa.gov/
29
(scheme representing the history of halos). These are analysed by means of the halo
age, merger rate and mass evolution.
Additionally, measurements from LSS need error bars and covariance matrices ac-
counting for systematic errors, cosmic variance and their interplay. In order to
estimate them we do not only need one simulation but hundreds or even thousands.
Precise N -Body simulations are very costly in terms of computing resources, and
running that many is prohibitive (see Section 2.1 and references therein). Hence,
we need a new generation of simulating tools to generate approximate synthetic cat-
alogues. In Chapter 2 I present halogen, a technique designed to generate halo
catalogues with the correct 2-point correlation function at large scales, reducing the
CPU-hours required by a factor of ∼ 103−5 compared to an N -Body simulation, and
memory in a factor ∼ 101−2 (Table 2.5). Other approximate methods for fast gener-
ation of halo mock catalogues from the literature are also presented and compared
in Section 2.6.
Finally, in Chapter 3, I present the application of halogen in the context of the
Dark Energy Survey data analysis. Firstly, the catalogues are adapted with three
additional observational features: construction of a lightcone, simulation of photo-
metric redshift and the implementation of a Halo Occupation Distribution scheme
(HOD, see Section 3.2.3 and references therein) fitted to reproduce the observed
galaxy clustering. Then, a batch of mock catalogues is generated and we show its
applicability to: gain insight into the modelling, optimise the analysis methodol-
ogy and compute error bars and covariance matrices for the Large Scale Structure
analysis.
Chapter 1
In the early stages of evolution of the Universe, the homogeneous and isotropic as-
sumption represents a good approximation. For studies of CMB where perturbations
are very small (δ ∼ 10−5 ), we can use linear perturbation theory to model the dis-
tribution of matter in the Universe. However, these small fluctuations continue to
grow due to gravitational collapse forming the complex cosmic web (with filaments,
knots and walls) that we find around us at the present epoch (Figure 3).
The details of structure formation can not be properly modelled from perturbation
theory because collapsed objects enter in the highly non-linear regime of gravity. In
this regime, we can only rely on N -Body simulations, where particles are let evolved
with gravity step by step. We will shortly review the methods to perform N -Body
simulations in Section 1.1.1
The final outcome of an N -Body simulation represents the dark matter density field
as shown in Figure 1.1, which is not a direct observable. Hence, for each type of
N -Body simulation and associated observation, a post-processing is needed (Sec-
tion 1.1.2). Halo finders and merger tree builders are part of the analysis pipeline of
33
34 Chapter 1. Merger Trees and Halo Finder Comparison
N -Body simulations. The former finds collapsed objects called halos within the dark
matter distribution at a given time-step or snapshot (Section 1.2). The latter links
those halos across different time-steps and identifies the merger of halos generating
a scheme called merger tree (Figure 1.2 and Section 1.3).
There is a wide variety of methods used by the community for halo finding and
merger tree building. In this chapter we analyse the differences and similarities in
the outcome merger trees for the different combination of methods, see Section 1.1.3
for a more detailed description of the context and motivation of this study. This
analysis is done on one hand from the geometrical point of view (Section 1.4) and
on the other hand studying the halo mass evolution Section 1.5. Finally, conclusions
are presented in Section 1.6.
N -Body simulations are used in many fields of Astrophysics and other fields of
physics. Depending on the area, there are different physical processes that may
be relevant, and the simulations will have different requirements. For Large Scale
Structure, we only simulate dark matter particles, for which only gravity and the
expansion of the universe are relevant. These particles are not fundamental particles,
but collisionless tracers of the phase-space, with very high masses exceeding a mil-
lion (and often a billion) solar masses. Even if baryons –which represent a relevant
fraction of the matter of the Universe– are not collisionless, in these simulations the
hydrodynamics of baryons is neglected since its effects are only relevant at small
scales (. 2M pc) and represent a severe increment in the computing time. See [45]
for a thorough study of N -Body simulations in different fields and a detailed deriva-
tion of the computations presented below. A review more specialised in N -Body
methods in Cosmology is [46], and [47] is a more recent review more contextualised
with experiments.
Cosmological simulations represent the Universe in a box of constant comoving vol-
ume sampled with N particles. In order to model an infinite and boundless Universe,
we impose periodic conditions, i.e. a particle leaving one side of the box appears
in the opposite side, and the gravitational potential is generated not only by the
1.1. Introduction: Cosmological Simulations 35
Figure 1.1: MICE Grand Challenge N -Body Simulation Dark Matter distribution,
brighter parts represent denser regions (Table 2.1). This figure explores the very
large scales up to 3072M pc/h, which is the simulation size (being the extension
replicas following the periodical conditions, see text), as well as intermediate scales
(100M pc/h), where the deviations from homogeneities are more pronounced.
particles in the box, but also by an infinite number of replicas of the same box. Each
particle is represented by its comoving coordinate ~x = ~r/a and comoving velocity ~u,
being ~r its physical position. In this comoving frame, the equations of motion are
left as
d~x
= ~u
dt (1.1)
d~u 1
= −2 · H · ~u − 3 ∇x φ
dt a
where φ(~x) is the Newtonian potential determined by the density perturbations re-
spect to the mean ρ̄:
Having all the equations and physics that govern the system, we now need to specify
the method to generate the initial conditions, numerically compute the potential and
integrate the equation of motion.
36 Chapter 1. Merger Trees and Halo Finder Comparison
Initial Conditions
The initial power spectrum of matter is well known as it is measured from the
CMB. It can be easily calculated given that cosmology is known with codes as camb
[48]. Starting from a completely uniform Universe, we use Lagrangian Perturbation
Theory to perturb the field and generate a density field with the correct power
spectrum.
Lagrangian Perturbation Theory (LPT) studies how particles (fluid elements) move
across the fixed coordinate space, unlike Eulerian theory where the matter of study
is the variation of density and velocity field at a given position [49, 50]. At z = ∞
particles are distributed on a regular grid with coordinates ~q (Lagrangian Position).
As the Universe expands particles are displaced to their Eulerian position ~x:
~ ~q)
~x(t) = ~q + Ψ(t, (1.3)
~ =Ψ
Ψ ~ (1) + Ψ
~ (2) + ... (1.4)
In Zel’dovich Approximation (ZA [51], traditional name given to 1st -order LPT), we
only keep the first term, whereas for 2nd -order LPT (2LPT) we keep up to 2nd order
terms. Resolving the corresponding equations of motion, one arrives at:
~ qΨ
∇ ~ (1) = −D1 (t)δ(q)
being D1 and D2 the 1st and 2nd order growth factors, respectively.
Initial conditions have to be generated at a high enough redshift such that perturba-
tions are still in the linear regime. But if we set them at too large redshift the gravity
solver will integrate numerical noise. The standard method for many years has been
to use ZA at the redshift when density perturbations reach δ ∼ 0.1. However, it has
been shown that transients may appear with that method [52], and using 2LPT is
1.1. Introduction: Cosmological Simulations 37
Force computation
Once the initial conditions are set, particles move according to gravity via the force
1
F~ = ∇φ (1.6)
a
There are different ways to compute it depending on the N -Body code. The most
naive way to compute this force is using Newton’s law for each particle i, under the
so-called Particle-Particle (PP) approach:
X Gmi mj
F~ (~
xi ) = − 2
r̂ij (1.7)
j6=i
rij
being ~rij = x~i − x~j with rij = |~rij | and r̂ij = ~rij /rij
This method is very accurate, but very slow since it scales as O(N 2 ). Tree solvers
[53–55] transform it to a O(N logN ) problem by arranging particles in a tree (a
scheme where particles are hierarchically grouped by proximity) and only resolving
groups of particles that subtend an angle θ > θ0 from the position x~i .
Another associated problem with both of these methods is that we have to manually
add a softening to the force because we need to avoid strong accelerations caused by
2-body interactions at small distances (recall that we are simulating a collisionless
fluid). This softening arises naturally in the Particle-Mesh (PM) approach [56],
where Equation 1.2 is solved on a grid in Fourier space and the force derived from
Equation 1.6. This method is really fast, but it lacks accuracy at small scales.
Combining the efficiency of PM and the accuracy of PP we find the P3 M method
[57, 58]. It computes the large scale part of φ with the PM and the small scale
contributions with Equation 1.7. This can be still computationally costly in very
clustered regions and an another solution is the TreePM method [59, 60]. In this
method, small scale forces are computed using the Tree approach, whereas large
scales are computed with the Fourier transform as in PM. Nowadays, one of the
38 Chapter 1. Merger Trees and Halo Finder Comparison
Time integration
At every time-step after calculating the force, we need to move the particles according
to the velocity and accelerate them according to their forces. It is actually found that
it is desirably to do this alternatively using a leap-frog scheme. This is, position and
momentum are not updated simultaneously, but with a delay of half the time-step
∆t:
k+1/2 k ∆t p~ k
~x = ~x +
2 a2 m
p~ k+1
= p~ k
+ ∆t · F~ k+1/2 (1.8)
k+1 k+1/2 ∆t p~ k+1
~x = ~x +
2 a2 m
Note that the third part of the step k = l its identical to the first part of the step
k = l + 1, so they can be applied together forming the leap-frog scheme. It is in the
second part where we need to compute the force at every step, as explained before.
The N -Body simulations give us the distribution of dark matter of the Universe.
However, this is not a direct observable, and we need to include galaxies if want to
compare them with observations. There are different methods to do so, which rely
on different techniques that will be explained below:
1
http://wwwmpa.mpa-garching.mpg.de/gadget/
1.1. Introduction: Cosmological Simulations 39
• Halo Finder. Galaxies live in dark matter halos: self-bound, virialised and
very dense objects with spheroidal shape. Halo finders are codes that identify
these objects from the distribution of particles in the simulation at a given
time-step or snapshot. Some halo finders also identify sub-halos (halos that lie
in another halo). See Section 1.2 and [63] for a review.
• Merger Tree builders. A merger tree is a scheme that traces back a halo
from the latest snapshot to the origin of all its progenitors, it tells us about
the history of halos, including the age, merger rate, etc. A merger tree builder
is a code that links halos across different snapshots. See Section 1.3 and [64]
for a description of most methods.
Figure 1.2: Merger Trees representation. On the left panel we see the particles of an
N -Body Simulation as red dots, the halos as blue circles with radius R200c (defined by
Equation 1.9) and merger trees as arrows linking halos across snapshots (the green
one represents a merger). On the right panel (from [80]) we find a chart representing
a merger tree following the mergers of halos (circles) through different snapshots ti .
We just saw in Section 1.1.2 that many techniques and prescriptions implemented in
codes are used to analyse simulations. Different codes are used by different research
groups in the community for the same purpose. Each of these codes make some
approximations and assumptions but, do they lead to the same results? This is
the question posed by the Mocking Astrophysics programme2 . During a series of
workshops and subsequent studies this programme has been analysing and validating
the post-processing pipeline used by the community. Among the target of study we
find the halo finders [81], merger tree builders [64] and Semi-analytical Models [65].
The analysis presented in the remainder of this chapter was done as part of this
programme and as a consequence of the SussingMergerTree workshop3 . The
aim is to address the question of how the combination of halo finders and merger
tree builders affect the properties of the final merger tree [1].
2
http://www.nottingham.ac.uk/~ppzfrp/mockingastrophysics/
3
http://popia.ft.uam.es/SussingMergerTrees
1.1. Introduction: Cosmological Simulations 41
As already explained, halo finders search dark matter halos within the particle dis-
tribution of a simulation snapshot. The exact definition of halos in simulations is
actually set by the halo finder itself and can vary significantly from one finder to
another, specially when it comes to subhalos (halos lying inside another halo). The
subtleties of each code will be explained below, but we will introduce here the two
basic types of halo finding techniques at the main halo level:
While FoF can only give main halos, the SO method may also be used for subhalos.
A minimum number of particles Nmin must be considered for halos to be valid,
generally Nmin = 20 is chosen.
The halo catalogues used for this study are extracted from 62 snapshots of a cosmo-
logical dark-matter-only simulation undertaken using the Gadget-3 N -body code
[98] with initial conditions drawn from the WMAP-7 cosmology [99]. We use 2703
particles in a box of comoving width 62.5 h−1 Mpc/h, with a dark-matter particle
mass of mp = 9.31 × 108 h−1 M . We use 62 snapshots (000,. . . ,061) evenly spaced in
log a from redshift 50 to redshift 0.
While in previous comparison projects [e.g. 81, 93, 96] the same mass definition was
imposed (or even used a common post-processing pipeline to assure this), it was not
request any such thing this time, i.e. every halo finder was allowed to use its own
mass definition.
1.2. Halo Finding Techniques 43
On the one hand, AHF and Rockstar define a spherically truncated mass through
4π 3
Mref (< Rref ) = ∆ref × ρref × R , (1.9)
3 ref
adopting the values ∆ref = 200 and ρref = ρcrit (we will call this mass M200c )
and iteratively removing particles not bound to the structure. On the other hand,
HBThalo and SUBFIND return arbitrarily shaped self-bound objects based upon
initial Friends-of-Friends (FoF) groups, assigning them the mass of all (i.e. no spher-
ical truncation) particles gravitationally bound to the halo.
Furthermore, some halo finders include the mass of any bound substructures in the
main halo mass whereas others do not include the mass of any bound substructures.
Technically, finders for which particles can only belong to one halo are termed exclu-
sive while finders for which particles can belong to more than one halo are termed
inclusive. As substructures can typically account for 10% of the halo mass this choice
alone can make a substantial difference to the halo mass function.
Given these definitions we can now describe the general properties of the halo finders
applied to the data:
• HBThalo [102] is a tracking algorithm working in the time domain that fol-
lows structures from one time-step to the next. It returns exclusive arbitrarily
shaped gravitationally bound objects. It uses FoF groups for the initial particle
collection.
z=0 z=2
105
AHF
HBThalo
Rockstar (Mass)
Rockstar (N x mp)
104 Subfind
101
100 10
10 1011 1012 1013 1014 1010 1011 1012 1013 1014
M (MO• h-1) M (MO• h-1)
Figure 1.3: Cumulative mass functions at redshift z = 0 (left panel) and z = 2 (right
panel) for the four halo finders. There are two lines for Rockstar corresponding to
the two mass definitions discussed in the text: one corresponding to M200c (Mass)
and one based upon the particle list (N × mp , being N the number of particles and
mp the particle mass). The upper set of curves in each panel is based upon main
halos whereas the lower set of curves in each panel refers only to subhalos.
the 20 particle threshold). Given that some tree builders only use particle mem-
bership information for a halo whereas others combine this with a table of global
properties (including halo mass), this choice of mass definition will also contribute
to the differences in the final trees.
We find that other than for the largest 100 main halos the different mass definitions
make little difference for the main halos at z = 0 unless the mass taken from the
returned Rockstar particle membership is used. This mass is systematically higher
that the other estimates (and Rockstar’s own returned mass). The differences in
mass for main halos are slightly more pronounced at z = 2.
For subhalos there are noticeably different mass functions: AHF is incomplete at
the low-mass end, with a trend that appears to worsen as the redshift increases4 .
However, despite generally finding more subhalos the other finders do not appear
to have converged to a common set. Part of this relates to the rather ambigu-
ous definition of subhalo mass: whereas for main halos it simply appears to be a
matter of choice for ∆ref and ρref (or some other well-defined criterion for viriali-
sation/boundness/linkage), subhalos – due to the embedding within the inhomoge-
neous background of the host – cannot easily follow any such rule. Again, each finder
has been allowed to pick its favourite definition for subhalo mass. But please note
that the variations seen here are not the prime focus of this study; they should nev-
ertheless be taken into account when interpreting the results presented and discussed
below. Further, the scatter in subhalo mass functions seen in previous comparisons
was much reduced due to the use of a common post-processing pipeline that ensured
a unique subhalo mass definition [93, 94, 96].
All these differences should and will certainly leave an imprint and be reflected in
the outcome when building merger trees.
4
It was checked that a more restrictive parameter set for AHF leads to the recovery of the
missing low-mass subhalos at high redshift. As already shown by [101] (Fig.5 in there) there is a
direct dependence of the applied refinement threshold used by AHF to construct its mesh hierarchy
(upon which halos are based) to the number of low-mass objects found.
46 Chapter 1. Merger Trees and Halo Finder Comparison
Figure 1.4: A summary of the main features and requirements of the different merger
tree algorithms. For details see the individual descriptions in the text.
In the first merger tree comparison paper [64], we can find an extensive description
of most merger tree builders available and a terminology convention to describe
them, that we also use here. A lot of the methodology is similar across the various
codes used for this study, the main features and requirements have been captured
in Figure 1.4. We first categorise tree builders into either using halo trajectories
(JMerge, and Consistent Trees) or individual particle identifiers (together with
possibly some additional information; all remaining tree builders). Consistent
Trees is the only method that utilises both types of approach. HBT constructs
halo catalogues and merger trees at the same time as it is a tracking finder that
follows structures in time. A cautionary note regarding HBT: it can be applied
both as a halo finder or a tree builder and includes elements of both so we will
always specify whether we refer to one or the other by appending ‘halo’ or ‘tree’, as
necessary.
The codes themselves are best portrayed as follows:
• HBTtree is built into the halo finder HBT. It identifies and tracks objects at
the same time using particle membership information to follow objects between
output times.
• JMerge only uses halo positions and velocities to construct connections be-
tween snapshots, i.e. halos are moved backwards/forward in time to iden-
tify matches that comply with a pre-selected thresholds for mass and position
changes.
• SubLink tracks particle IDs in a weighted fashion, giving priority to the in-
nermost parts of subhalos and allowing branches to skip one snapshot if an
object disappears.
Two codes were allowed to modify the original catalogue: Consistent Trees and
HBTtree. Consistent Trees adds halos when it considers they are missing: i.e.,
the halo was found both at an earlier and at a later snapshot. Consistent Trees
also removes halos when it considers them to be numerical fluctuations: i.e., the
halo does not have a descendant and both merger and tidal annihilation are unlikely
due to the distance to other halos. HBTtree for external halo finders (i.e. halo
catalogues not generated by its own inbuilt routine) takes the main halo catalogue
and reconstructs the substructure. This produces an exclusive halo catalogue in
which the properties of the main halos may also have changed.
48 Chapter 1. Merger Trees and Halo Finder Comparison
In this section we present the geometry and structure of merger trees and the result-
ing evolution of dark matter halos. This includes the length of the tree (Section 1.4.1)
and the tree branching ratio (Section 1.4.2). Further, it is shown graphically how
halo finders and tree builders work differently, to illustrate the features found in the
comparison.
One of the conceptually simplest properties of a tree is the length of the main branch.
It measures how far back a halo can be traced in time – starting in this case at z = 0.
This property not only relies on the performance of the halo finder and its ability
to identify halos throughout cosmic history, but also on the tree builder correctly
matching the same halo between snapshots. [64] found that the different tree building
methods produced a variety of main branch lengths, ascribing some of the features
to halo finder flaws. We shall verify this now.
Figure 1.5 shows a histogram of the main branch length l, defined as the number of
snapshots a halo main branch extends backwards in time from snapshot 61 (z = 0)
to snapshot 61 − l. This is roughly equivalent to an age, given that the last 50
snapshots are separated uniformly in expansion factor, a = 1/(1 + z). On the left,
we selected the 1000 most massive main halos, whereas on the right we see the results
for the 200 most massive subhalos. The main halo population coincides from one
halo catalogue to another in at least 85% of the objects. The subhalo population
is more complicated and, in some cases, they only agree in 15% of the objects from
one finder to another. However, if we focus on comparing AHF with Rockstar or
HBThalo with SUBFIND, we find a better agreement between catalogues, rising
to ∼ 95% for main halos and ∼ 70% for subhalos. Due to these differences, the
applied number threshold translates to mass thresholds Mth that are different from
finder to finder (see also Figure 1.3); we therefore list the corresponding values in
Table 1.1. Furthermore, when using HBTtree, the individual masses of the halos
can change and so does the mass threshold. In what follows we will consistently use
1.4. Geometry of trees 49
103 102
ConsistentTrees HBTtree ConsistentTrees HBTtree
AHF
2
10 HBThalo AHF
Rockstar 10
1 HBThalo
Rockstar
101
Subfind
Subfind
100 100
JMerge MergerTree JMerge MergerTree
102
101
1
10
N+1
0
10 100
N+1
100 100
VELOCIraptor 10 20 30 40 50 60 70 VELOCIraptor 10 20 30 40 50 60 70
Length (l)
102 Length (l)
101
101
100 100
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Length (l)
Length (l)
Figure 1.5: Histogram of the length of the main branch. The length l is defined as
the number of snapshots a halo can be traced back through from z = 0. The left
group of panels show the 1000 most massive main halos. The right group of panels
show the 200 most massive subhalos. These number selections are equivalent to the
mass cuts shown in Table 1.1. Different panels contain results from different tree
building methods (as indicated), while within each panel there is one line for each
halo finder (as marked in the legend).
50 Chapter 1. Merger Trees and Halo Finder Comparison
Table 1.1: Mass threshold in units of 1011 h−1 M needed to select at z = 0 the 1000
most massive main halos (rows 1 and 2) and the 200 most massive subhalos (rows
3 and 4) for different halo finders (columns). Odd rows show the threshold for a
general tree builder, whereas even rows show the threshold for HBTtree
40
Length (l)
30
20
10
0
0 500 1000 1500 2000
distance to host halo (Kpc/h)
Figure 1.6: Distance to the centre of the host halo vs. length of the tree l for the 200
most massive subhalos. We show the results for the four halo finders (see legend) for
the MergerTree builder.
Subhalo finding becomes especially difficult as the subhalo approaches the centre of
the host halo, as has been shown in Fig.4 of [106] and Fig.7 of [93]. In particular,
SUBFIND underestimates the mass of subhalos close to the centre of their host
halo. Given that the 200 most massive subhalos are not the same for all finders, the
subhalos selected for SUBFIND tend to be further from the host halo centres (see
Figure 1.6), and therefore they are easier to trace. AHF and especially Rockstar
find many (massive) subhalos near the centre but, due to the difficulties in that
region, a fraction of them cannot be provided with a credible progenitor in an earlier
snapshot, resulting in early tree termination. Finally, the HBThalo selection is
composed of subhalos at short, medium and large distances from the host halo centre
but, by construction, they are always required to be traceable.
On the tree builder side, JMerge allows halos to only shrink their mass by a factor
of up to 0.7 and to grow by a factor of up to 4 in one snapshot, and it estimates their
trajectories from global quantities (Section 1.3). This artificially truncates main
branches too early for massive objects when it loses track of halos. This effect is
enhanced for subhalos, whose trajectories are difficult to estimate due to the non-
linear environment and the fact that their mass is more likely to grow or shrink
52 Chapter 1. Merger Trees and Halo Finder Comparison
• AHF considers one of the merging halos to be the main halo (blue) and the
other to be a subhalo (red). In snapshot 060 the subhalo found is quite small,
so that most of the tree building codes do not link it with the (much larger) halo
in the next snapshot (061). In simple codes (JMerge, MergerTree... ) this
leads to an artificial truncation of the tree. Consistent Trees artificially
adds one halo to snapshot 060 to replace the small subhalo whereas SubLink
jumps snapshot 060 for this object. In this way both codes continue the tree.
HBTtree recomputes the substructure, creating a more traceable subhalo.
• HBThalo is able to identify at snapshot 060 two big and well defined halos
of almost the same size (only possible for exclusive halo catalogues). This is
due to the tracking nature of the finder and ensures the correct follow-up by
most tree builders. Only JMerge encounters problems due to the non-smooth
trajectories of the halos.
• Rockstar uses phase-space information so that even when the halos are over-
lapping (snapshot 060) it is able to distinguish them by their velocities. This
1.4. Geometry of trees 53
AHF HBThalo
057 059 060 061 057 059 060 061
Consistent Trees
Consistent Trees
057 059 060 061 057 059 060 061
HBTtree
HBTtree
057 059 060 061 057 059 060 061
JMerge
JMerge
057 059 060 061 057 059 060 061
Merger Tree
MergerTree
057 059 060 061 057 059 060 061
Sublink
Sublink
Rockstar Subfind
057 059 060 061 057 059 060 061
Consistent Trees
Consistent Trees
HBTtree
JMerge
Merger Tree
Sublink
Figure 1.7: Projected image of a 1.2 Mpc/h-side cube from the N -Body simulation.
Halos are represented by circles of radius corresponding to R200c . This is an example
of a merger between two halos that are found at z = 0 (snapshot 061) and linked
across snapshots by the tree builders: the blue and red colours represent the two
trees. Other halos found are represented in green. Each subfigure presents a single
halo finder, with each row representing the indicated tree builder. In each row time
evolves from left to right, with each cell a different snapshot.
54 Chapter 1. Merger Trees and Halo Finder Comparison
allows almost all tree codes (besides JMerge) to follow the evolution of the
halos.
This example neatly illustrates the difficulties that arise when dealing with subhalos.
However, the left panel of Figure 1.5 tells us that there are also situations in which
the main halo branch is truncated. We studied several of these cases and found two
main types: in the first type the main halo lies in the vicinity of a bigger halo, and is
likely to have entered it and become a subhalo a few snapshots before. In this case
the problems encountered are similar to those illustrated in the subhalo example
above, but here the in-falling halo has been classified as a main halo at z = 0.
The other type occurs when at some point the halo was wrongly associated to some
other smaller halo as happened with the red halo in Figure 1.5 for the combination
JMerge-HBThalo. In this case the incorrect halo assignment never gets corrected
and typically the much smaller halo has a much shorter prior history.
Already at this stage of the analysis we can draw some conclusions from this sub-
section:
• In general, the influence of the halo finder is at least as (if not more) important
than the tree building algorithm.
• The way the halo finder deals with substructure is crucial for merger trees.
• Tree building tricks such as the creation of artificial halos or omitting snapshots
help in some cases, but are not infallible.
1.4. Geometry of trees 55
• AHF and Rockstar catalogues lead to earlier tree truncation for most tree
builders. This is especially true for subhalos, because they try to find subhalos
close to the host halo centre and are not able to provide them with credible
progenitors.
• SUBFIND tends to find more subhalos in the outer regions of the host, which
are easier to track.
• HBT appears to be very well designed to not truncate a tree too early, both
as a halo finder and as a tree builder (as seen in Figure 1.5 and 1.7).
• Consistent Trees also stands out in avoiding low-l cases (Figure 1.5).
Another simple tree property, which is nevertheless very important for characterising
the structure or geometry of a tree, is the number of direct progenitors Ndprog (or
local branches) that a halo typically has. Figure 1.8 shows the normalised (divided by
the total number of events) histogram of Ndprog for all halos in the range 0 ≤ z ≤ 2.
For all the various combinations of tree building method and halo finder the most
common situation is to have just one single progenitor, corresponding to a halo
having no mergers on this step (which can happen multiple times during a halo
lifetime). The second most common situation is for a halo to have no progenitors,
which corresponds to a halo passing above the detection threshold and appearing for
the first time, which can happen only once. As for other properties studied in this
study, our results would certainly change if we were to use a different set of output
times, so the importance does not lie in the individual tree results, but in their
differences. For an elaborate study of the optimal choice for the temporal spacing of
snapshots to construct merger trees see [107] or [86].
It is noticeable that the Rockstar catalogue (blue dotted line) yields a tree with
significantly large branching ratio for the tree builders SubLink, TreeMaker, and
VELOCIraptor. Also, besides using a very similar technique, MergerTree
56 Chapter 1. Merger Trees and Halo Finder Comparison
shows a more moderate branching ratio. By removing objects with mass lower than
20mp (cyan dash-dotted line), we verified that this high branching ratio is caused
by objects with very low mass as these high-Ndprog cases disappear. Recall that,
even though all the halo finders cut their catalogue at 20 particles, for Rockstar
the mass M200c can be lower if some of those particles lay outside R200c . This small
change, in general, moves the curves for Rockstar from the highest branching ratio
to the lowest one. Note that the mass limited tree shown in cyan is not equivalent
to the other trees because the catalogue was reduced after running the tree building
algorithm on it, hence giving non-self-consistent trees. Nevertheless, we do not expect
great variations in Figure 1.8 between the cyan line and a fully self-consistent tree
with the same mass limit. This serves as an illustration of the great influence of the
lower mass limit, pointing out again the importance of the input halo catalogue in
the resulting tree construction. To illustrate a high branching ratio case we have
selected one of the extreme cases with Ndprog > 30 in Figure 1.9. It corresponds to
one of the two most massive halos (depending on the halo finder) at snapshot 050
(z=0.32). Figure 1.9 shows all the direct progenitors of that halo and other halos
found in the area. The blue halo is the main and most massive progenitor in the
plot. The red and magenta circles represent other direct progenitors at snapshot 049
while green circles represent other (sub)halos detected in the same region. Magenta
is used for halos whose mass is below 20mp (only possible for Rockstar), while red
halos have larger mass. SubLink also has halos that were found at snapshot 048,
but were not linked in snapshot 049, which were linked to the big halo at snapshot
050; these are marked as crosses.
Figure 1.9 tells us that, when comparing different halo catalogues, Ndprog tends to
be correlated to the number of (small) halos available to be absorbed, i.e. the more
green halos we find the more merging (red and magenta) halos we find. We further
confirm that most secondary progenitors (red and magenta circles) are subhalos of
the main progenitor (blue circle) and lie within R200c . However, in some cases sec-
ondary progenitors were found outside the volume displayed (e.g. the halos missing
in Consistent Trees with AHF). But in general, the properties of these halos
fit into the standard merging picture in which halos approaching a bigger one be-
come satellites (subhalos), lose mass via tidal stripping and are eventually totally
1.4. Geometry of trees 57
100
ConsistentTrees HBTtree
10-1
10-2
AHF
HBThalo
10-3 Rockstar all
10-4 Rockstar cut
Subfind
10-5
-6
10
JMerge MergerTree
10-1
10-2
-3
10
10-4
-5
f(Ndprog)
10
-6
10
Sublink TreeMaker
10-1
10-2
-3
10
10-4
10-5
10-6
10-1
VELOCIraptor 5 10 15 20 25 30 35 40
10-2 Ndprog
10-3
10-4
10-5
10-6
0 5 10 15 20 25 30 35 40
Ndprog
Figure 1.8: Normalised histograms of the number of direct progenitors Ndprog for all
halos from z = 0 to z = 2 (snapshots from 061 to 031). Each panel corresponds to a
single tree building method, within each panel each line represents a halo catalogue
as indicated. For Rockstar we show two lines, one with all the halos (’Rockstar
all’) and one where halos with mass lower than 20 mp were removed (’Rockstar
cut’).
58 Chapter 1. Merger Trees and Halo Finder Comparison
Figure 1.9: Projected image of a 3 Mpc/h-side cube from snapshot 049 centred on
one of the most massive objects (M > 1014 h−1 M ) for all the combinations of halo
finder (column) and tree builder (row). Symbol and colour coding explained in the
text. VELOCIraptor (omitted) gives the same results as TreeMaker. The
label Ndprog indicates the number of progenitors (some of which might be outside
this volume). For Rockstar we show a second value in which only those with mass
larger than 20mp are considered.
1.5. Mass Evolution 59
absorbed.
If all the available halos are considered, Rockstar is the catalogue with most small
halos, leading to a higher branching ratio, which drops when removing the low mass
halos. HBThalo is also able to discern more substructure, yielding a slightly higher
Ndprog than SUBFIND and AHF.
From the tree building point of view we remark that SubLink, with the possibility of
omitting one snapshot, increases Ndprog considerably for the two catalogues with more
substructure: Rockstar and HBThalo. HBTtree, in modifying the catalogue,
tends to recover the halo set generated by HBThalo. This effect is more noticeable
in the case of SUBFIND because it is also based on FoF catalogues (Section 1.2).
JMerge shows very little branching (Ndprog = 1 or 2) because by construction it
never associates a small merging halo with a much bigger one. It rather associates
the in-falling halo with another small halo.
Note, however, that this was a very extreme case and that Figure 1.9 is not necessarily
representative of the statistics seen in Figure 1.8, rather it helps to understand the
kind of factors that influence the branching ratio.
Mass growth can be characterised by the discretised logarithmic growth, defined as:
where k and k + 1 are a halo and its descendant, with masses Mk and Mk+1 at times
tk and tk+1 , respectively [64]. In order to reduce the range of possible values of this
60 Chapter 1. Merger Trees and Halo Finder Comparison
1
βM = arctan(αM ) (1.11)
π/2
Figure 1.10 shows the distribution of βM for three populations: all halos (A, on the
left), main halos (B, in the centre) and subhalos (C, on the right). All distributions
have been normalised by the total number of events found in halo sample A in each
case. Selection is done as follows: all the halos identified at z = 0 are traced back
along the main branch and at any snapshot if both a halo and its descendant are
main sub
main [sub] halos and have mass M > Mth [M > Mth ] (Table 1.1) sum to the
population B [C]. The population A is compiled similarly, but taking all pairs of
main
halos satisfying M > Mth , regardless of being main or subhalos. Note that the
distribution A is dominated by main halos, since they are more numerous.
Within the hierarchical structure formation scenario one expects halos to grow over
time. This can be appreciated in column A, where the distribution of βM is skewed
towards values βM > 0. However, there is a non-negligible number of cases (∼
15 − 30%) where it decreases (βM < 0). While mass loss could be associated with
tidal stripping of subhalos, column B shows that this is not the sole explanation
within this simulation: while subhalos have an important contribution at the very
far end of the distribution (corresponding to large mass losses), there are also many
instances leading to βM < 0 for main halos. Nevertheless, there are physical ways
for main halos to lose mass: when two main halos approach each other, the effective
radius for tidal stripping extends beyond the virial radius of the larger halo [see
108, for an elaborate discussion of exactly this phenomenon], thus, the small one
can experience mass loss before becoming a satellite. Also, when halos change their
shape, the specific halo mass definition (e.g. M200c for AHF/Rockstar) of a halo
finder can lead to an apparent mass loss.
The plot clearly shows that the differences across halo finders are greater than the
variations introduced by the tree building method, with the exception of HBTtree
(that modifies the input halo catalogue). There are two distinct classes of distribution
for main halos (B): on the one hand, Rockstar and AHF, and on the other hand,
SUBFIND and HBThalo which have a more skewed distribution. Recall from
1.5. Mass Evolution 61
10-2
10-3
AHF
HBThalo
10-4 Rockstar
Subfind
10-1
HBTtree HBTtree HBTtree
10-2
10-3
10-4
10-1
JMerge JMerge JMerge
10-2
10-3
10-4
10-1
MergerTree MergerTree MergerTree
10-2
f(βM)
10-3
10-4
10-1
Sublink Sublink Sublink
10-2
10-3
10-4
10-1
TreeMaker TreeMaker TreeMaker
10-2
10-3
10-4
10-1
VELOCIraptor VELOCIraptor VELOCIraptor
10-2
10-3
10-4
Figure 1.10: Mass growth distribution between two snapshots, βM , related to the
logarithmic mass growth through Equation 1.11, for halos that can be identified at
z = 0, with mass M > Mth at both output times. We distinguish 3 populations:
main
A which contains all halos with Mth = Mth , B with only main halos and Mth =
main sub
Mth , and C with only subhalos and Mth = Mth . Mth is tabulated in Table 1.1 for
the different halo finders. Each row displays a different tree building algorithm (as
indicated). Each halo finder has its own line style as indicated in the legend. The
distribution is computed as a histogram, normalised by the total number of events
found by the corresponding halo finder for the population A.
62 Chapter 1. Merger Trees and Halo Finder Comparison
(Section 1.2) that the former use an inclusive mass definition, thus, for a subhalo
that just crossed the centre and is moving away, the total (inclusive) mass of the
host halo can decrease if part of that subhalo crosses R200c .
We finally remark that while subhalos are present in our somewhat low-resolution
simulation (when compared to the state-of-the-art), they contribute significantly to
neither the shape nor the amplitude of the mass growth distribution shown in column
A (all halos). However, their own distribution (column C) is interesting in its own
regard: we primarily observe mass loss due to tidal stripping, i.e. an imbalance
of the distribution towards negative βM values. In this case we find that whereas
HBThalo follows one distribution, the other three follow their own. This reflects
the inconsistency in subhalo mass functions already seen in Figure 1.3.
In conclusion, most of the differences in the mass growth βM can be accounted
for by the choices made by the respective halo finder when defining quantities. In
particular, HBThalo and SUBFIND agree best with the a priori expectation from
hierarchical structure formation.
βM (k, k + 1) − βM (k − 1, k)
ξM = (1.12)
2
where k − 1, k, k + 1 represent consecutive time-steps. When far from zero, it implies
a growth followed by a dip in mass (ξM < 0) or vice versa (ξM > 0). Within the
hierarchical structure formation scenario this behaviour can be considered unphys-
ical and equates to a snapshot where the halo finder might not have assigned the
correct mass – though there are certainly situations where the definition of correct
mass remains arguable. Nevertheless, it provides another means of quantifying the
influence of the halo finder upon a merger tree.
The (normalised) distribution of ξM is presented in Figure 1.11 in the same way
as Figure 1.10, i.e. three distinct columns for all halos (A, left), main halos (B,
1.5. Mass Evolution 63
middle), and subhalos (C, right). It reconfirms most of the claims of Section 1.5.1.
We again find the distribution is essentially independent of the tree builder (besides
HBTtree) for all three populations. We find two types of distributions for main
halos (B): on the one hand, the SUBFIND and HBThalo catalogues give the
broadest distributions and on the other hand, Rockstar and AHF have a more
peaked distribution. This implies that the first pair of halo finders present more
mass fluctuations (ξM 6= 0) than the second one. Note that this pairing is identical
to the one reported in Section 1.5.1. And we also find (again) that subhalos (C)
do not provide an explanation for the wings of the mass fluctuation distribution in
column A, even though their own plot indicates that they predominantly undergo
abrupt changes, i.e. they have easily distinguished wings.
Given that subhalos often undergo fluctuations (column C of Figure 1.11), this could
cause fluctuations in main halos when the mass is defined exclusively (HBThalo
and SUBFIND). In order to study this effect, we selected a halo whose mass evo-
lution is characterised by a large ξM value (for the SUBFIND/HBThalo pair) in
Figure 1.12. We localised the same object (the blue halo) and surrounding ones
(red a green) in all four halo catalogues, showing the three consecutive snapshots
used for the calculation of ξM given at the very right hand side of each panel. The
halo undergoes a mass fluctuation for the finders HBThalo and SUBFIND, while
it keeps growing for AHF and Rockstar. Figure 1.12 shows that, although it is
true that for HBThalo/ SUBFIND the total mass of the subhalos increases when
the main halo decreases and vice versa, the fluctuation of subhalo mass is one order
of magnitude smaller than the main halo fluctuation and this cannot be the sole
explanation. The fact that the red halo changes from being a subhalo to a main
halo and then back to a subhalo again may be related (in a non-trivial way, since
masses are defined exclusively) to the mass fluctuation. For this simple (compared
to Figure 1.7 & Figure 1.9) configuration of halos, all the tree building algorithms
agree in the resulting trees. We also note that even small fluctuations (10% in mass)
are detected by this parameter ξM , in part due to an enhancement of ξM at late
times (cf. Equation 1.11 & Equation 1.12).
64 Chapter 1. Merger Trees and Halo Finder Comparison
10-2
10-3 AHF
HBThalo
10-4 Rockstar
Subfind
10-5
HBTtree HBTtree HBTtree
10-1
10-2
10-3
10-4
10-5
JMerge JMerge JMerge
10-1
-2
10
10-3
10-4
10-5
MergerTree MergerTree MergerTree
10-1
10-2
f(ξM)
10-3
10-4
10-5
Sublink Sublink Sublink
10-1
-2
10
10-3
10-4
10-5
TreeMaker TreeMaker TreeMaker
10-1
-2
10
10-3
10-4
10-5
VELOCIraptor VELOCIraptor VELOCIraptor
10-1
10-2
10-3
-4
10
10-5
-1 -0.5 0 0.5 -1 -0.5 0 0.5 -1 -0.5 0 0.5 1
ξM
Figure 1.11: Distribution of mass fluctuations ξM (Equation 1.12), for halos found
in three consecutive snapshots along a main branch that can be identified at z = 0,
with mass M > Mth for each appearance of the halo. We distinguish 3 populations:
main
A which contains all halos with Mth = Mth , B with only main halos and Mth =
main sub
Mth , and C with only subhalos and Mth = Mth . Mth is tabulated in Table 1.1.
Comparison is made between different tree builders (each row as labelled) and halo
finders (line styles as in the legend). The distribution is computed as a histogram
normalised by the total number of events for the corresponding halo finder for the
population A.
1.5. Mass Evolution 65
ξM=-0.13
AHF
ξM=0.87
M=104.6 M=94.5 M=105.6
M=7.5 M=7.6 M=7.2
ξM=-0.03
Rockstar
ξM=0.85
Subfind
Figure 1.12: Projected 1 Mpc/h-side cube containing two halos (three for
HBThalo) evolving from snapshot 058 (left column) to 059 (central column) to
060 (right column). Each row shows a different halo finder. The radius of the circle
is represented proportional to the mass of the object, with an extra factor of ×5 for
the small (red and green) halos. Dashed lines denote subhalos whereas solid lines
are used for main halos. The mass of each halo is also shown in units of 1010 h−1 M .
At the right of each row we can see the value of ξM for the big halo, which quantifies
the mass fluctuation as defined by Equation 1.12.
66 Chapter 1. Merger Trees and Halo Finder Comparison
0.3
0.29
0.28
0.27
0.26
0.25
σ(ξM)
0.24
0.23
0.22
ConsistentTrees
0.21 HBTtree AHF
JMerge HBThalo
MergerTree Rockstar
0.2 Sublink Subfind
TreeMaker
Velociraptor
0.19
0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88
fβ
M>0
Figure 1.13: Summary of Figure 1.10 and Figure 1.11. On the abscissa we show the
fraction of halos for which mass grows; on the ordinate we show the standard devi-
main
ation of the mass fluctuations. Only main halos satisfying M > Mth (Table 1.1)
are taken into account. Every point represents a combination of a tree builder (size
and colour-coded) and a halo catalogue (symbol-coded, see legend).
To better draw any conclusion from our study of the mass evolution of main halos
we summarise results from their βM and ξM statistics (Section 1.5.1 & Section 1.5.2)
in Figure 1.13: the x-axis shows the fraction fβM >0 of objects for which βM > 0,
whereas the y-axis shows the standard deviation σξM of ξM . Different sizes (or
colours) now represent different tree building methods whereas the symbols stand
for the input halo catalogue. The desirable feature of a tree describing hierarchical
structure formation would be to have small mass loss for main halos (high fβM >0 )
and small mass fluctuations (low σξM ), at least a priori, because we also explained
physical causes for these phenomena. Note also that the quantities plotted here do
not provide a substitute for the whole curve shown in Figure 1.10 & Figure 1.11, but
rather capture well the features of interest as they are observed. This summary plot
illustrates very well how mass evolution sensitively depends on the choice of the halo
finder:
• Points for the same halo finder (symbol) group together. The small scatter
1.6. Conclusions 67
amongst those groups represents the small influence of the tree building method
on these magnitudes.
• HBTtree points deviate from the group, approaching the area of the HBThalo
finder (crosses).
We have verified that mass growth and fluctuations are intrinsically related to the
mass definition. A simple change from an inclusive to an exclusive halo catalogue
or from M200c to arbitrarily shaped halos would change the shape of the curves seen
in Figure 1.10 & Figure 1.11 and the position of the points in Figure 1.13. But
other fundamental properties of the halo finder also leave their imprint, the evident
differences between HBThalo and SUBFIND in Figure 1.13 are a proof of this.
1.6 Conclusions
We investigated the influence of the input halo catalogue on the quality of the result-
ing merger trees. ’Quality’ in this regard has been identified as length of the main
branch, number of direct progenitors, and, quantities that are highly relevant for
semi-analytical modelling, the mass growth and mass fluctuation of halos. We also
showed some specific examples of cases that aided our understanding of the influence
of the halo finder and tree builder on the resulting properties of the trees.
In total, seven different tree building methods have been applied to the halo cata-
logues produced by four different halo finding algorithms which examined the same
cosmological simulation. This produced 28 merger trees to be analysed. The influ-
ence of both groups of codes is summarised below, and the particular achievements
and difficulties of the different methods discussed.
68 Chapter 1. Merger Trees and Halo Finder Comparison
The primary conclusion of all the studies presented here is that the influence of the
input halo catalogue is greater than the influence of the tree building method em-
ployed. This is especially clear for the mass evolution studies (Section 1.5) although
it is also noticeable from the results of the main branch length (Section 1.4.1) and
the studies on the branching ratio also suggest it (Section 1.4.2). Part of these dif-
ferences are due to the fact that for this comparison we allowed the halo finders to
choose their own definitions instead of unifying them as done in previous halo finder
comparison projects. However, this way we find the real impact a user will encounter
when choosing one or the other halo finder for his/her analysis.
Another pattern encountered along our studies is the pairing AHF/Rockstar vs.
HBThalo/SUBFIND. This is very clear in the mass evolution of main halos (cen-
tral columns of Figure 1.10 and Figure 1.11, summarised in Figure 1.13) and can
also be seen in the main branch length distribution (Figure 1.5). We interpret this
pairing to be caused by the fundamental construction of the halo catalogues, namely
spherically truncated M200c inclusive masses (Equation 1.9) for the former pair vs.
self-bound exclusive objects starting from FoF groups for the latter. These differences
can already be acknowledged in the main halo mass function shown in Figure 1.3.
The studies on the length of the tree (Section 1.4.1) are the cleanest test, since they
do not rely on arbitrary choices such as the lower mass cut (which makes a significant
difference for the branching ratio) or the mass definition (which is of great influence
in the mass evolution). The tracking nature of HBThalo showed excellent results
in this section, with no early truncation of (sub)halos. Rockstar and AHF showed
early truncation of trees, especially for subhalos near the centre of their host, whereas
SUBFIND did not show too much early truncation of subhalos, because they are
systematically missing in the centre of the hosts. AHFled to the shortest main
branches: halos disappear due to the high-z low-M incompleteness and the main
branches tend to end early.
The relevance of the lower mass cut was also seen in the study of the branching ratio
(Figure 1.8 in Section 1.4.2). In particular, for Rockstar a cut in mass was not
equivalent to a cut in the number of particles. Because of this, doing the same cut
1.6. Conclusions 69
in particles as for other catalogues, the branching ratio of Rockstar was too high.
The mass evolution of halos was found to be mostly dependent upon the mass def-
inition employed by the halo finder. However, it is not clear which finders perform
best: HBThalo/SUBFIND show less mass loss whereas AHF/Rockstar show
fewer mass fluctuations. Mass evolution is intrinsically related to the way the mass
is defined, and the choice of a different mass definition within the same halo finder
would lead to different results.
Along these lines, note that some properties of the halo finders are simple choices that
are relatively easy to change, as for example the exclusive/inclusive mass assignment
or the choice of spherical halos vs. self-bound objects. However, we have seen in [96]
that other, more fundamental, details of each halo finder (such as the initial particle
collection) leave their own unique signature in the catalogue. These are practically
unavoidable and hence users have to decide upfront which halo finder best suits their
needs.
Although we found a greater dependence on the halo finder than on the tree building
method, each of the tree codes also has its own peculiarities:
In these lines, and confirming results from [64], being able to skip snapshots or having
a tracking nature is found to be crucial in properly trace the history of halos.
Outlook
The main outcome of the present study is that the fundamental properties of halo
finders have a major impact on the merger trees constructed from them, and that
some tree building techniques can help improving those trees by correcting for halo
finder defects. We pointed out the repercussions that several properties of the halo
finders and tree building codes can have on the final trees. This should help the
community choosing, designing or modifying their pipelines to construct merger trees
idealised for their specific purposes.
It is worth mentioning that, although here we focused on the differences among the
resulting merger trees, the agreement among them is nevertheless remarkable. The
general features of the trees resulted as one would have expected, and are similar
from one tree to another. Many times the differences between trees are only seen
when plots are done on a logarithmic scale, since those differences are at the order
of a few-cases for every thousand plotted.
The series of workshops and studies within the Mocking Astrophysics programme
is helping us to quantify the degree of understanding that we actually have about
structure formation. It helps the community validating and improving their algo-
rithms used in the simulation pipeline, whose outcome we compare with observations
to learn about the physics of the Universe. This process will continue, since for every
milestone reached in Cosmology a few more arise on the horizon.
Chapter 2
2.1 Introduction
1
http://www.pausurvey.org/
2
http://Cosmology.lbl.gov/BOSS/
3
http://www.darkenergysurvey.org/
4
http://desi.lbl.gov/
5
http://www.euclid-ec.org/
6
http://www.lsst.org/
71
72 Chapter 2. HALOGEN: an approximate halo catalogue generator
In this study we seek to abstract this pattern, providing a framework in which each
step is highly modular. Whilst modular, halogen implements default behaviour
with very simple (and rapid) components – using 2nd -order Lagrangian Perturbation
Theory (2LPT) as the gravity solver, theoretical mass functions, a single-parameter
bias prescription (as opposed to 2 or more parameters for other statistical-type meth-
ods) and a direct linear transformation of the velocities. As such, halogen can be
rapidly calibrated, and easily extended. In addition, we introduce physically moti-
vated constraints for halo exclusion and mass conservation, which tie the individual
steps together.
We will compare the results from halogen to the reference N -body simulations pre-
sented in Section 2.1.2. We introduce the general ideas of the method in Section 2.2,
leaving a more detailed explanation of the spatial placement of halos – which we
consider the essence of halogen – for Section 2.3. Section 2.4 demonstrates the
effects of each parameter of halogen and how to optimise them. We present some
applications and results of halogen in Section 2.5 and compare it to other methods
section Section 2.6
Goliat Simulation This simulation was run with the Gadget2 code [60] from
initial conditions generated by 2LPTic7 at z = 32. It uses N = 5123 dark matter
particles in a box with side length Lbox = 1000h−1 Mpc. The cosmological parameters
used in this simulation are ΩM = 0.27, ΩΛ = 0.73, Ωb = 0.044, h = 0.7, σ8 = 0.8,
ns = 0.96 yielding a mass resolution of mp = 5.58 × 1011 h−1 M . In this catalogue we
use a reference halo number density of n = 2.0 · 10−4 (Mpc/h)−3 The halo catalogue
was obtained from a z = 0 snapshot and has been generated with the halo finder
AHF [123], a spherical-overdensity (SO) algorithm (see Section 1.2). Though AHF
identifies subhalos, they have been discarded for the present analysis as these scales
are too small for 2LPT to resolve. We show in Section 3.2.3 how to add substructure
in a phenomenological way following a Halo Occupation Distribution.
halogen requires an input density field obtained from 2LPT (see Section 1.1.1). For
this purpose, we run a 2LPTic snapshot at z = 0 with the same initial condition
phases as those used in goliat.
Table 2.1: Properties of the three reference N -body halo catalogues. From left to right: Side-length of the simulated
cubic volume (in h−1 Mpc), number of particles (for N -body and halogen), redshift of the snapshot, cosmological
parameters (density of baryons, total matter and dark energy, Hubble parameter, power spectrum normalisation and
spectral index), halo finding technique, halo number density (in (Mpc/h)−3 ), method used to generate the initial
conditions and redshift at which they were generated.
This simulation will also be the reference for the comparison of the methods in
Section 2.6
In this section we briefly outline our method, leaving a more detailed presentation
of the actual modus operandi of halogen for Section 2.3. The general algorithm
consists of four (major) steps:
20 20
δ \Symbol d
15 15
10 10
δ δ
5 5
0 0
Figure 2.1: Here we show the difference between performing an actual N -body sim-
ulation (left) and using 2LPT (right) to generate a particle distribution at z = 0.5,
with the same initial conditions. The image shows a slice of the density contrast δ
3
distribution in a 1h−1 Gpc box.
We aim to de-couple each of these steps from the others as far as possible so that
different algorithms may be used at each point. The first two steps are relatively
trivial, as they use pre-developed prescriptions from the literature, and we discuss
these, and basic outlines of the last two steps, in this section.
The basic scaffolding of halogen is an appropriate dark matter density field realised
at the desired redshift, sampled by N particles. For simplicity we choose to use 2nd -
order Perturbation Theory (2LPT) (see Section 1.1.1 or [49, 50]) to produce this
field, which can be obtained with the public code 2LPTic.
We show in Figure 2.1 the density distribution of an N -body simulation (left panel)
2.2. HALOGEN: the method outline 77
and a 2LPT representation (right panel)at z = 0.5. Notably, the 2LPT distribution
appears to be blurred in comparison to the N -body simulation. This is due to
the fact that 2LPTic – as the name suggests – was originally designed only to
generate initial conditions [129], since even 2nd -order perturbation theory breaks
down at low redshift when over-densities become highly non-linear. The small-scale
difference in Figure 2.1 can be explained by shell crossing, an effect in which particles
following their 2LPT trajectories cross paths and continue rather than gravitationally
attracting each other in a fully non-linear manner [130, 131]. In order to compensate
for shell-crossing, [113] advocates the use of a smoothing kernel over the input power
spectrum. We tested the effect of this smoothing in halogen but did not find any
improvement in the final catalogue.
Nevertheless, 2LPT provides a suitable approximation of the large scale distribution
of matter, where perturbations have not yet entered into the highly non-linear regime
and this is sufficient for halogen. Note that halogen is in principle agnostic about
the method in which this density field snapshot is produced. Other methods, for
instance the “Quick-PM” cola[119] or 3LPT could equally be employed by the user.
A different choice of density field will yield somewhat different results, especially at
smaller scales. As long as the chosen method reconstructs large scales correctly, the
remaining steps of halogen should be unmodified.
Despite this, we have by default incorporated 2LPTic as part of the halogen code
(which bypasses the costly I/O of writing the snapshot to disk), but also allow the
user to provide an arbitrary snapshot with a distribution of N particles in a cosmo-
logical volume. Our choice for 2LPT was mainly driven by its low computational
cost and success in the distribution of matter at large scales. We use this approach
for all results in this study.
The halo mass function (HMF) n(> M ) measures the number density of halos above
a given mass scale. It is required to generate mass-conditional clustering, which in
turn is a pre-requisite for extension to HOD-based galaxy mock generation.
78 Chapter 2. HALOGEN: an approximate halo catalogue generator
The most accurate HMF for a given cosmology, over a range of suitable scales, may
be obtained from an N -body simulation via a halo-finding algorithm –although there
are notable variations depending on the technique [81]. Since we require a full N -
body simulation for the tuning of halogen, it would be perfectly acceptable to use
this simulation to generate the HMF. However, in the hope of future improvements,
we wish to avoid using the full simulation as far as possible. Fortunately, there is a
wealth of literature concerning accurate predictions of the HMF for widely varying
cosmologies and redshifts using Extended Press-Schechter theory [39, 132].
The mass function may be calculated by any means, so as long as a discretised
function of n(> M ) is provided. For simplicity, we decided to use the online halo
mass function calculator HMFcalc10 [133] for obtaining the halo mass distribution
in this study. We produce a sampled mass function by the standard inverse-CDF
method, utilising an arbitrary input HMF and sampling it as follows:
4. Invert the cumulative mass function (n(> M )) to obtain the halo masses Mi =
n−1 (yi ).
In Figure 2.2 we demonstrate how well the input HMF is reproduced, only differing
from the mass function fit [Watson, 134] at high mass due to Poisson shot-noise
controlled by the volume V (where expected numbers are of order unity). Further,
the HMF of BigMultiDark shows similar behaviour, indicating that the chosen
fit is appropriate for this simulation.
10
http://hmf.icrar.org
2.2. HALOGEN: the method outline 79
107
Watson fit
FoF
HALOGEN
10-4
106
10-5
105
10-6
n(>M) [(Mpc/h)-3]
104
N(>M)
10-7 3
10
-8
10 2
10
10-9
101
10-10
100
The crucial step in the generation of approximate halo catalogues is the commis-
sioning of halo positions. In keeping with the philosophy of modularity, the halo-
placement step is de-coupled from the rest. Any routine which takes a vector of halo
masses and an array of dark matter particle positions and returns a subset of those
positions as the halo locations is acceptable. However, we consider this step to be
at the heart of the halogen method, as it is responsible of generating the correct
mass-dependent clustering.
To achieve an efficient placement that reconstitutes the target two-point statistics,
we recognise the validity of the clustering on large scales from the broad-brush 2LPT
field. We place halos on 2LPT field particles, essentially using the estimated density
field as scaffolding on which to build an approximate halo field. We will follow a
series of steps in the construction of the method of spatial placement to be presented
in Section 2.3 below.
The most obvious way to assign velocities to each halo would be to use the velocity
of the particle on which it is centred. However, halos are virialised systems whose
velocities tend to be lower than that of their constituent particles. This is potentially
mitigated by using the average velocity of all particles within a defined radius of the
artificially placed halo. However, this is not robust as there are often very few
particles inside the halo radius. Additionally, the 2LPT particle velocities will differ
from their N -body counterparts due to shell-crossing, especially on the small scales
associated with halos.
Thus, we prefer to take a phenomenological approach, and assume that a simple
mapping via a factor fvel can be applied to the collection of halo velocities to recover
the results of the N -body distribution
This factor could a priori depend on the velocity (i.e. a non-linear mapping) and
the mass of the halo fvel (vpart , Mhalo ). However, we will show in Section 2.4.2 that a
linear mapping is sufficient and present a way to compute fvel (Mhalo ).
Though halogen is a four-stage process, the most crucial aspect is the assignment
of halo positions, which this section describes in some detail. The general concept is
to specify a sample of particles from an underlying density field as halos.
The motivating philosophy of halogen is to start from the simplest idea and im-
prove if necessary. In this vein, we present here successive stages of evolution of
the halogen method, which we hope will show satisfactorily that the method as
it stands is optimal. Figure 2.3 will serve as the showcase for the various stages of
halogen. In it we present the 2-point correlation function (2PCF) for each stage of
development to verify that the method approaches the goliat reference catalogue
as new characteristics are added.
Note that the 2PCF is computed with the publicly available parallel code CUTE11
[135]. In the fitting routine that is included in the halogen package and described
in Section 2.4.1 we also use the same code.
We start with the simplest approach: using random particles from the 2LPT snapshot
as the sites for halos. We expect to recover the large-scale shape of the 2PCF in this
way, as this is encoded in the 2LPT density field which we trace.
However, it is clear from Figure 2.3 that this method (’random no-exc’) consistently
underestimates the 2PCF over all scales except r < 1h−1 Mpc, where it should sharply
drop to -1, but rather remains positive.
The consistent under-estimate is a realisation of an inaccurate linear bias, b, defined
11
http://members.ift.uam-csic.es/dmonge/CUTE.html
82 Chapter 2. HALOGEN: an approximate halo catalogue generator
as the scaling factor between the 2-point function of the halos and the underlying
matter density field:
ξhalo (r) = b2 ξdm (r) (2.2)
The simplest improvement to the random case is to eliminate the artificial small-scale
correlations. Though the primary application of halogen will be for large scales, a
simple improvement at small scales is useful.
As we have noted, the artificial clustering at small scales arises from the fact that
particles can be arbitrarily close, whereas simulated halos have a minimum sepa-
ration. The radius of a halo is a rather subjective quantity, and its definition is
modified in various applications and halo-finders. However, we may parametrise this
by
!1/3
3Mhalo
R∆ = , (2.3)
4π∆h ρcrit
where ∆h is the overdensity of the halo with respect to the critical density of the
Universe. For the work presented here we used ∆h = 200.
Using this scale, we introduce exclusion, a modifiable option which controls the de-
gree to which halos can overlap, which we set to mimic the halo finder’s specification.
For example, in this work we use both AHF and FOF (see Section 1.2). For the
latter we do not allow any overlap whereas for the former halogen’s halo centres
are not allowed to lie inside another halo’s radius.
The effect of exclusion is presented in Figure 2.3 (’random exc’). As expected, scales
of r < 1h−1 Mpc show a turnover while larger scales are unaffected. We note that
2.3. HALOGEN: Bias scheme 83
the turnover is at smaller scales for halogen than for AHF. This is to be expected,
as it is unlikely to find two AHF halos separated by a distance slightly exceeding
R∆ , due to reasons akin to the FOF over-linking problem. In such cases, there is an
increased likelihood of the two halos being subsumed into one, or one becoming a
subhalo of the other. It is conceivable that one could empirically model these effects
by tuning the value of ∆h by some factor which captures this suppressed probability.
However, as we are more interested in large scales and these considerations touch
upon the subtleties of halo definition, we consider these exclusion criteria sufficient
for present purposes. We will use this form of exclusion (in an appropriate form) for
all following work.
The effect of introducing a scale length, lcell , is also clearly seen in this result. There
is a turnover in the 2PCF below lcell , which corresponds to a significant reduction of
bias on these scales since a random particle is chosen within the cell.
2.3.4 α approach
We find that selecting completely random particles yields too low a bias, whereas
the ranked approach is highly biased. We require an intermediate solution, which
has higher probability of selecting dense areas than the random approach, and lower
probability than the ranked approach.
The probability that a cell is chosen is a function of its density,
In the completely random case, we have G(ρcell ) = ρcell . In principle we can tailor
G(ρcell ) so that the probability of selecting a cell reproduces the appropriate bias.
We choose to constrain G(ρcell ) to have a power-law form, i.e.
100
10
1
lcell
ξ (r)
0.1
random no-exc
0.01 random exc
ranked exc
α=1.5 exc
0.001 α=2 exc
α(M) exc
GOLIAT
0.0001
1 10 100
r (Mpc/h)
do not update the value of the probability after every halo placement because it is
3
computationally very expensive (O(Ncell )) and we have checked that doing so has a
negligible effect on output statistics.
We note that a similar method was employed in QPM [120]. In fact, the physically
meaningful distribution is fhalo (ρ) – the fraction of halos in cells with density ρ. This
can be written as
fhalo (ρ) = P (cell|ρ)fcell (ρ), (2.6)
where P (cell|ρ) specifies the relative probability of choosing a cell given its density
(in our case, ρα ), and fcell (ρ) is the intrinsic distribution of cell densities given the
cell size and cosmology (heavily related to the cosmological parameter σ8 ). QPM
specifies the target distribution fhalo (ρ) directly, as a Gaussian. In halogen we
instead specify P (cell|ρ), which is more closely tied to our algorithm. In principle
one can convert from QPM-like methods to halogen with Equation 2.6.
86 Chapter 2. HALOGEN: an approximate halo catalogue generator
The approach as it stands reproduces the 2PCF accurately down to the scale of lcell .
If the 2PCF of a sample of given number density is all that is required for a specific
application, then this will do well.
However, if we were to select a sub-sample of the most massive halos of our catalogues
and recompute the 2PCF, the bias would be incorrect, since more massive halos are
more biased [136]. For a truly representative catalogue, in which the halos are
conditionally placed based on their mass, the bias model is required to be mass-
dependent. Failing this, there is no physical meaning attached to the assignment of
masses in the second step (Section 2.2.2).
Mass-dependent halo bias is also crucial for implementing HOD models on the cat-
alogue, for use in galaxy survey statistics, as the number of galaxies associated with
a halo depends on its mass.
We incorporate this mass-dependence into the α parameter, so that we finally have
α(M )
G(ρcell , M ) = ρcell , (2.7)
i
bin Mth [h−1 M ] ni [(h−1 Mpc)−3 ] αi
0 1.64 · 1014 0.05 · 10−4 3.54
1 4.80 · 1013 0.40 · 10−4 2.26
2 2.65 · 1013
0.90 · 10−4 1.77
3 1.86 · 1013 1.40 · 10−4 1.48
4 1.38 · 1013 2.00 · 10−4 1.41
Table 2.2: Properties of the selected mass bins for the goliat simulation: mass
i i i−1
threshold Mth , equivalent number density n(M > Mth ) and best fit αi in Mth <
i
M < Mth for the halogen α(M ) approach.
2.3.6 Summary
α(M )
1. selecting a cell with probability Pcell ∝ ρcell ,
2. randomly selecting a particle within the cell and using its coordinates as the
halo position,
3. ensuring that the halo does not overlap (following an exclusion criterion) with
any previously placed halo in any cell, and re-choosing a different random
particle in that case,12
4. subtracting the halo’s mass from the selected cell, mcell = mcell −M : if mcell ≤ 0
the cell is removed from selection.
12
If, after several iterations all the particles are found inside another halo, re-choose cell (to avoid
infinite loops).
88 Chapter 2. HALOGEN: an approximate halo catalogue generator
Note that the physically motivated nature of the process suggests that higher-order
statistics may also be recovered with some success.
We have mentioned several parameters of the halogen method, and these are of
particular importance in producing accurate realisations. In this section we will
discuss each parameter, its effects and how to optimise for it if possible.
There are three parameters in halogen (with other options and parameters being
expressly determined by the required output, such as the size of the simulation box
L): the two physical parameters of the model, α – controlling the linear bias – and
fvel – controlling the velocity bias – and the one parameter of the algorithm, lcell .
In the previous Section we used goliat as a reference. We now turn to BigMul-
tiDark and its FOF catalogue: this simulation has a larger volume, allowing us to
probe BAO scales. The increased volume also reduces cosmic variance on interme-
diate scales. halogen primarily aims at reproducing clustering statistics for even
larger volumes, hence it is beneficial to assess the performance of halogen and its
parameters in this regime. Furthermore, this demonstrates independence from the
underlying simulation and halo finding technique.
stand-alone routine which determines a best-fit for α(M ), which can then be passed
to halogen to generate any number of realisations. We describe this routine here,
and illustrate it with application to BigMultiDark. The fitting of α(M ) is based
on the standard χ2 -minimisation technique. However, a few details are worth men-
tioning.
13
Cosmic variance – strictly speaking – requires the study of the same volume, but in a dif-
ferent place in the universe. This approach is more appropriately called ’sampling variance’ yet
nevertheless the generally accepted technique for generating covariance matrices.
90 Chapter 2. HALOGEN: an approximate halo catalogue generator
where ξH and ξNB are the 2PCFs of halogen and the reference catalogue, respec-
tively. We note that minimising this statistic is susceptible to systematic errors in
halogen in bins where the stochastic error (σH ) is much smaller than the system-
atic error (∆ξ). This is especially likely when the region of the fit approaches lcell .
To test whether the region is stable, we may choose a distance estimator to be min-
imised that treats all scales with the same weight, e.g.. ∆ = (ξH − ξNB )2 /ξNB
2
. We
have tried both definitions in our fitted range, and the results are left unchanged,
indicating that the range of the fit is stable.
We use a grid of α to cover the expected result for each mass bin. We use a cubic
spline interpolation over χ2 (α) to locate a precise minimum for the best-fit α.
Number of mass bins. The number of bins to use in this procedure will depend
on the needs of the user, and the size and resolution of the reference simulation. It
determines the reliability of the mass-dependent clustering. For BigMultiDark we
i
distribute the halos into 8 roughly equi-numbered bins with the mass thresholds Mth
as shown in Table 2.4. In that table we also show the best-fit αi , and the equivalent
number density ni for each mass threshold.
Fitting Range. We restrict the range of the fit to scales in which the shape of
ξH (r)/ξNB (r) is flat. This corresponds to mid-range scales of 15h−1 Mpc < r <
47h−1 Mpc, which avoids small-scale effects of halogen, and large-scale cosmic vari-
ance.
The 2PCFs for our 8 values of ni are shown in Figure 2.4, where we compare the
results from halogen against the BigMultiDark reference catalogue. The range
used during the fitting procedure and for the χ2 -minimisation is indicated by the
vertical lines.
We note that the choice of α finely controls the bias. This is demonstrated in
2.4. HALOGEN: Parameter Study 91
BigMultiDark
HALOGEN
10 n0
n1
n2
n3
n4
1 n5
n6
ξ (r)
n7
0.1
0.01
10 100
r (Mpc/h)
Figure 2.5, in which we show the resultant ξ(r) for the entire grid of α7 for this
fit (top figure). There is a ∼ 10 per cent deviation in ξH (r) over the grid range
(1% between consecutive lines). On the bottom figure, we show the χ2 of each of
those curves and the cubic spline fit interpolation used to find the minimum, which
corresponds to the α7 best-fit value shown in Table 2.4.
vh = fvel (M ) · vp , (2.9)
10
1 BigMultiDark
α7=1.296
α7=1.353
ξ (r)
0.1 α7=1.411
α7=1.468
α7=1.526
α7=1.584
0.01 α7=1.641
α7=1.699
α7=1.757
α7=1.814
1.15
1.1
1.05
ξ/ξFOF
1
0.95
0.9
0.85
1 10 100
r (Mpc/h)
350
test values
splines
300 30
minimum
25
250
20
200 15
χ2
10
150
5
100 0
1.65 1.7 1.75 1.8 1.85
50
0
1.3 1.4 1.5 1.6 1.7 1.8 1.9
α7
Figure 2.5: Illustration of variations in α and its consequences for the 2PCF. Top
figure: Correlation function of the target halo catalogue (BigMultiDark, crosses)
and the grid of ξH corresponding to the grid of α7 used for minimisation. The lower
sub-panel shows the ratios to the BigMultiDark result. The vertical dashed lines
mark the spatial r-range of the fit. Bottom figure: χ2 (Equation 2.8) as a function
of α7 for the grid of values used in the left panel (red crosses) and the interpolated
curve (dashed blue line). In the inner box we zoom into the area near the minimum
(green circle).
2.4. HALOGEN: Parameter Study 93
bin i i
Mth [h−1 M ] ni [(h−1 Mpc)−3 ] αi fvel
0 1.64 · 1014 0.05 · 10−4 4.80 0.564
1 4.93 · 1013 0.45 · 10−4 2.79 0.672
2 2.95 · 1013
0.95 · 10−4 2.28 0.715
3 2.15 · 1013 1.45 · 10−4 2.00 0.743
4 1.70 · 1013 1.95 · 10−4 1.90 0.754
5 1.41 · 1013
2.45 · 10−4 1.84 0.760
6 1.21 · 1013 2.95 · 10−4 1.73 0.771
7 1.04 · 1013
3.50 · 10−4 1.73 0.771
Table 2.4: Properties of the selected mass bins for the BigMultiDark simulation:
i i
mass threshold Mth , equivalent number density n(M > Mth ), best fit αi for the
i−1 i
interval of masses Mth < M < Mth and fvel computed for the same interval (see
Section 2.4.2).
i−1 i
for each interval of mass M = (Mth : Mth ] while performing the fit for α. These
results are also listed in Table 2.4. There is a noticeable decrease in fvel towards
higher mass halos. We will see in Section 2.5.4 below how this affects the modelling
of Redshift Space Distortions.
We finally note that there may be other more complex models of velocity bias ac-
counting for the physics of low scales and adjusting other statistics beyond the overall
94 Chapter 2. HALOGEN: an approximate halo catalogue generator
BigMultiDark
selected particles
HALOGEN
Figure 2.6: One-component (vx ) velocity distribution of the halo catalogues. The
FOF halos from the BigMultiDark simulation are in a red solid line, and vx,p of
the particles selected by halogen catalogue are in a green dashed line, while the
corrected vh halos from halogen are in a blue dotted line. The correction provides
a very closely matching distribution, which has a generally lower velocity.
velocity distribution. However, the model presented here is very simple and capable
of reproducing the halo velocity distribution with a great accuracy.
10
BigMultiDark
l=15
1 l=10
l=8
l=6
ξ (r)
0.1 l=5
l=4
l=3
0.01
0.001
1.15
1.1
1.05
ratio
1
0.95
0.9
0.85
1 10 100
r
80
BigMultiDark
l=15
60 l=10
l=8
40 l=6
r2 ξ (r)
l=5
20 l=4
l=3
0
-20
1.15
1.1
1.05
ratio
1
0.95
0.9
0.85
Figure 2.7: Two-point correlation function on logarithmic (top figure) and linear
(bottom figure) scale of the FOF catalogue of the BigMultiDark simulation
(crosses) against the results from halogen (lines) for different values of lcell (differ-
ent line styles as indicated in the legend). Note that in the bottom figure the 2PCF
has been multiplied by r2 to increase the visibility of the BAO peak. The lower
sub-panels show the ratio with respect to the BigMultiDark curve.
96 Chapter 2. HALOGEN: an approximate halo catalogue generator
The first effect is clearly noticeable in the top figure where the halogen 2PCF
detaches from the BigMultiDark curve at r ≈ lcell . This is expected, since particles
are chosen at random inside the cell, tending towards a bias of unity at these scales.
The second effect is more noticeable in the bottom figure. As lcell is decreased,
the broadening and dampening (best seen in the lower sub-panel as the difference
between the artificial peak at r = 80h−1 Mpc and trough at r = 100h−1 Mpc) is
decreased. The reason for this is that we introduce an uncertainty (on a scale lcell ) in
the position of the halos that propagates to an uncertainty in the determination of
rBAO . In effect, the density field has been filtered by a quasi-top-hat function [137],
which has the known effect of peak-broadening.
Clearly, lcell should be set as small as possible to mitigate these effects. However, a
limit is enforced by the mean-interparticle-separation, dp , of the input density field.
We cannot hope to reliably probe scales smaller than dp , and even just above this
scale we run into the problem of having poor statistics within cells. We recommend
using a value of lcell ≥ 1.5dp (ensuring > 3 particles per cell on average), and in this
work we take lcell = 4h−1 Mpc ≈ 2dp as the reference.
We comment here that the choice of lcell affects the optimal α(M ) relation. This is
unfortunate, because it would be useful to be able to perform the fit for α using a
lower resolution (since this is the bottleneck). The mechanism by which this effect
occurs is known, and we hope to be able to correct for it in the future.
Let us illustrate the mechanism with an example: suppose we take a cell with cell-
I
size lcell and density ρIcell from a volume (N lcell
I
)3 . For the same distribution, we could
II I
also use lcell = lcell /2, which forms 8 sub-cells i with densities ρIIcell,i . For the same α,
the probability of choosing the cell in case I is
1
P8 α
(ρI )α 8 i ρII
cell,i
I
Pcell = PNcell
3 = PN 3 (2.11)
I α
j (ρj ) j (ρIj )α
5.5
lcell= 3
5 lcell= 4
lcell= 5
lcell= 6
4.5 lcell= 8
lcell=10
4 lcell=15
3.5
α 3
2.5
1.5
1e+13 1e+14
M (Msun/h)
Figure 2.8: Best-fit α(M ) functions for different values of lcell , as marked in the
legend (units of h−1 Mpc).
in the distributions to be dependent on α, the two cell-sizes and their ratio and
the cosmology, via the mass variance σ(r). In future studies we hope to be able to
quantify this relationship to enable faster fitting.
Figure 2.8 shows the effect of changing lcell on the best-fit α(M ) and we notice two
characteristics. Firstly, α(M ) is an increasing function for all lcell , as expected since
b(M ) is increasing. Secondly, low masses are less sensitive to lcell , which we expect
mathematically from Eqs.2.11 and 2.12 with an increasing α(M ) (the greater α is,
the greater the differences expected).
In Figure 2.7 we have re-fit the α(M ) relation for each value of lcell , ensuring proper
comparison between curves. Furthermore, we run 5 realisations of each and display
the average, to reduce the effects of halogen variance.
While previous sections were dedicated to the design and optimisation of halogen,
we have now defined the final method and fixed the optimal parameters. In this
section we discuss the performance of halogen in more detail, both in the clustering
statistics so far analysed, and in other statistics that halogen is not constrained to
98 Chapter 2. HALOGEN: an approximate halo catalogue generator
The driving motivation of developing fast methods for synthetic halo catalogues is
to accurately produce robust covariance matrices for large galaxy survey statistics.
Though halogen requires a full N -body simulation to calibrate its two parameter
sets, once these parameters have been established, we are free to run as many re-
alisations (with different phases for the initial conditions) of the the halo catalogue
(using the same cosmological parameters, volume, mass resolution etc.) as we like.
This process is expected to purely simulate the effects of cosmic variance, and thus
is extremely valuable for deriving the covariance matrices.
In order to verify that the variance seen in the resulting data traces the expected
cosmic variance, we complemented the generation of the halogen catalogues with
several corresponding N -body simulations. Due to the computational time con-
straints, we were only able to run five simulations, which were based on goliat, and
in which only the seed for the random Initial Condition (IC) phases was changed.
The initial conditions for these runs were generated with 2LPTic at redshift z = 32
(for the N -body) and z = 0 (for halogen), using the same seed for each pair. The
N -body particle distributions were evolved to z = 0 using Gadget2 (and subse-
quently analysed with AHF).
In Figure 2.9 we present the 2PCF of those 5 pairs of catalogues (random seeds
2.5. HALOGEN: Outcome 99
are colour-coded, with halogen as solid lines, and AHF as points). The halo-
gen lines are the average of 5 realisations of halogen placement (maintaining the
same phases) and the error bars show the halogen variance. Given that the go-
liat box size is rather small (1h−1 Gpc), scales r >∼ 60h−1 Mpc are dominated by
cosmic variance effects. This makes it easy to identify the signature of each set of
initial conditions. Though the realisations are significantly different, we note that
the halogen catalogue follows the N -body result, and maintains the correct nor-
malisation at intermediate scales (20h−1 Mpc < r < 50h−1 Mpc). We stress that the
fitting procedure has only been performed once; all five cases used fixed parameters.
The similarity of the goodness of fit in each case (as compared to that directly fitted
to) demonstrates that the fitted α(M ) is universal with respect to input seed. We
note also that the halogen variance is significantly sub-dominant to the cosmic
variance.
To better appreciate the dominance of the cosmic variance in a more applicable
scenario, we return to the BigMultiDark simulation. This has a reduced cosmic
variance due to the larger volume, but has the disadvantage that we cannot run
several N -body simulations of this magnitude. The blue line of Figure 2.10 shows
how the 2PCF of a single-run halogen (neither halogen nor cosmic variance has
been averaged out) compares to the reference BigMultiDark catalogue when they
have the same initial condition phases. We further show the halogen variance (σH )
and cosmic variance (σcosm ). The former has been computed as usual: running 5
realisations of halogen on the same 2LPT snapshot. For the latter we run five
2LPTic snapshots with different IC seeds. In order to avoid mixing σcosm and σH
for each of them we first averaged out halogen variance by running 5 realisations
of halogen and σcosm is computed as the dispersion of the five resulting (σH -free)
lines. We find for all scales that the halogen variance is dominated by the cosmic
variance, σH < σcosm .
A simple but powerful statistic for point particles is the Probability Distribution
Function (PDF), which is the distribution of particles per cell on a given scale.
100 Chapter 2. HALOGEN: an approximate halo catalogue generator
120
100
80
60
r2 ξ (r)
40
GOLIAT IC-0
HALOGEN IC-0
20 GOLIAT IC-1
HALOGEN IC-1
GOLIAT IC-2
HALOGEN IC-2
GOLIAT IC-3
0 HALOGEN IC-3
GOLIAT IC-4
HALOGEN IC-4
-20
0 20 40 60 80 100 120 140
r [Mpc/h]
Figure 2.9: 2PCF of halogen (lines) and the AHF (points) catalogues for five
different 2LPTic random seeds (colour-coded). The first case corresponds to the
original goliat used to obtain the α(M ) relation whereas the following share the
same setup besides the seed.
80
BigMultiDark
60 HALOGEN
40
r2 ξ (r)
20
-20
cosmic variance
1.2 HALOGEN variance
1.1
ξ/ξref
1
0.9
0.8
Figure 2.10: 2PCF of the FOF catalogue of the BigMultiDark simulation com-
pared to that of a single-run halogen (non-averaged) with the same initial condi-
tion phases. We also include error bars: in green the cosmic variance and in orange
the halogen variance (see text). The lower panel shows the ratio with respect to
BigMultiDark.
2.5. HALOGEN: Outcome 101
107
HALOGEN N=1000
BigMultiDark N=1000
6
10 HALOGEN N=500
BigMultiDark N=500
5 HALOGEN N=250
10 BigMultiDark N=250
HALOGEN N=125
4 BigMultiDark N=125
10
counts
103
102
1
10
0
10
1 10 100
Nhalo/cell
Figure 2.11: PDF of halo counts for both halogen (lines) and BigMultiDark
(points) catalogues from BigMultiDark. Several mesh numbers are used, as
labelled by colours, and these correspond to the physical scales of 2.5h−1 Mpc,
5h−1 Mpc, 10h−1 Mpc and 20h−1 Mpc respectively.
halogen has been designed to recover the 2PCF ξ(r) of a provided halo catalogue.
As the power spectrum P (k) is its Fourier Transform, it theoretically contains the
same information. However, this information is distributed differently in the two
functions and there is mode coupling when transforming from one to another: an
102 Chapter 2. HALOGEN: an approximate halo catalogue generator
1600
1400 BigMultiDark
HALOGEN
1200
1000
k P(k)
800
600
400
200
0
-200
1.15
1.1
1.05
ratio
1
0.95
0.9
0.85
0.01 0.1 1
k [h/Mpc]
Figure 2.12: Power Spectrum P(k) of halogen (blue line) and FOF (red line) for
BigMultiDark. The bottom panel shows their ratio. The Power Spectrum has
been computed using a N = 10243 mesh and corrected for shot noise as explained
in [142].
error at a given scale in one of the magnitudes can propagate to an error at all scales
in the other. So we expect to witness different strengths and weaknesses in P (k).
In Figure 2.12 we compare the power spectrum of the BigMultiDark FOF cata-
logue to the corresponding halogen realisation. We find agreement to 5% across the
scales 0.01hMpc−1 < k < 0.3hMpc−1 , but note that smaller scales k > 0.3hMpc−1
(r < 20h−1 Mpc) are underestimated. This underestimation arises from the smallest
scales of the 2PCF, r < lcell , which integrate through higher scales in P (k).
Observed galaxies are not directly located in 3D space, but 2D-angular (θ, φ) with
redshift z converted to a polar distance. However, such distances are modified by
galaxies’ peculiar velocities – velocity components that are not due to the Hubble
expansion. These modifications are encoded as Redshift Space Distortions (RSD),
and we can begin to account for them by assigning correct velocities to halos.
2.6. Comparison with other Approximate Methods 103
120
BigMultiDark RS
100 HALOGEN RS
80 selected particles RS
r ξ (r)
60
40
2 20
0
-20
1.15
1.1
1.05
ξ/ξref
1
0.95
0.9
0.85
Figure 2.13: 2PCF in redshift space (RS) for FOF (red points), and halogen (blue
line) of the BigMultiDark simulation. We also include in magenta the results of
our catalogue without applying the velocity bias (i.e. fvel = 1, ’selected particles’)
and find that a correct velocity bias is needed.
Using the halo velocities, we can mimic this effect when calculating the 2PCF. We
show the results of such an analysis in Figure 2.13, in which the monopole of the
2PCF in redshift space is compared for the halogen and BigMultiDark cata-
logues. To show the effect of our velocity transformation, we also include the 2PCF
of the ’selected particles’ in which the velocities were not transformed. The nor-
malisation and shape are significantly improved by the simple linear transformation
(Equation 2.9), and we find agreement to below 5% per cent at intermediate scales.
So far, we devoted this chapter to the construction and analysis of halogen. How-
ever, there are other approximate methods in the literature that also generate fast
halo mock catalogs. Within the Mocking Astrophysics program described in Sec-
104 Chapter 2. HALOGEN: an approximate halo catalogue generator
Table 2.5: Computing resources and related properties used by each code to generate
the halo catalogue analysed in this study. The CPU-hours can vary significantly from one
machine to another, but it is important to note their order of magnitude, which depends
on the algorithm and the particle mesh size. The memory usage is mostly determined by
the mesh size, that determines the spatial and mass resolution. Whereas most codes use
the same particle and force mesh, cola need 3 times more resolution in the latter. Codes
that need to resolve halos need more particles and, hence, more resources, but always much
lower than a full N -Body simulation (BigMD).
tion 1.1.3, the ’nIFTy cosmology’ workshop14 arose, in which we compared nearly
all existing methods for approximate halo mock catalogs.
We briefly present here some of the results that emerged from that comparison [3].
As described before (Section 2.1.1 & Section 2.2), all methods can be seen as a four
step process. The main differences among methods rest in the way they generate the
density field and how they apply a bias to generate a halo distribution. This idea is
graphically represented in Figure 2.14.
The methods presented here can be used in different contexts and each of them is
designed with different purposes. Some of them require less computing resources
at the price of having lower resolution, whereas others prefer to keep the resources
higher but gain accuracy. Table 2.5 compares the resources needed for each method
to generate the same halo catalogue (being the N -Body simulation BigMultiDark
in Table 2.1 the reference catalogue). Those that need to resolve halos (cola,
pinocchio and PThalos) have a predictive nature and typically require more
resources than those with a stochastic nature (EZmocks, halogen, patchy and
Log-Normal) that need to be fitted to a reference simulation.
14
http://popia.ft.uam.es/nIFTyCosmology
2.6. Comparison with other Approximate Methods 105
(EZmock, LN)
INITIAL
(PINOCCHIO) CONDITIONS MODIFIED
IC
2LPT
Gaussian
(HALOGEN,
ALPT
PTHALOS)
(PATCHY)
COLLAPSE +PM
(COLA)
TIMES
DENSITY
FIELD ZA
(EZmock)
FoF
(COLA) Non linear
+ mass deterministic
reassignment + stochastic
Halo build-up via MF match bias
+ 2LPT (PTHALOS) (EZmock, LN, PATCHY)
(PINOCCHIO) + mass bins
(HALOgen)
HALOS
Figure 2.14: Scheme of approximate methods. Most of them use a gravity solver
(2LPT, ALPT, 2LPT+PM, ZA) to generate a density field from which halos are
generated either using a halo finder or a stochastic bias. Some methods additionally
need to modify the initial power spectrum (EZmocks and Log-Normal). pinoc-
chio computes the halo formation and evolution in collapse time.
The main characteristics of the methods are shown in Table 2.6, and we briefly
describe them hereunder:
2.6.2 Results
Here we study how different methods perform in the 1, 2 and 3-point statistics.
Recall that the PDF is primarily a 1-point statistics, but with contributions of all
higher orders. This section focuses more in the 2-point functions, which are the most
studied in the literature (and the primary objective of halogen) because it contains
the most net information about cosmology (including BAO). 3-point functions are
more difficult to measure with current surveys (although it has been done [147]) and
have high contributions from non-linearities more difficult to predict from the theory.
Nevertheless, it is also a target for the future surveys.
The PDF distribution is shown in Figure 2.15, where we find two outliers: the
Log-Normal method and pinocchio. Note however, that the scales explored here
2.6M pc/h are already highly non-linear.
In regard to the 2-point function, looking at the ξ at the top part of Figure 2.16, we
find that most methods give similar results both in real (left) and redshift (right)
space. For the Log-Normal method, velocities where not computed (although they
can be computed with linear theory), so all results from redshift space are missing.
The normalisation of PThalos is off by more than 20%, this is due to the fact
that here we took the binding length b2LPT from a theoretical value and categorised
PThalos as a predictive method, but b2LPT could be left free and the bias fitted.
In Fourier space (bottom of Figure 2.16), it appears similarly at the linear scales,
but this figure focuses more in the non-linear scales (k > 0.1h/M pc, compared to
Figure 2.12) where methods based on 2LPT (halogen, PThalos and pinocchio)
and Log-Normal start having problems. Only methods with accurate density field
(cola and patchy) or many free bias parameters (EZmocks) can reproduce these
scales within 5% error.
For the 3-point function something similar occurs: we need more sophisticated meth-
ods. Particularly, the Log-Normal does not reproduce even the shape of the func-
tions, whereas halogen and PThalos (and slightly pinocchio) have an offset in
the normalisation but reproduce the shape.
In conclusion, as long as the 2-point is concerned, nearly all the methods presented
108 Chapter 2. HALOGEN: an approximate halo catalogue generator
Figure 2.15: PDF of halo counts in a grid with N = 9603 cells for the different
approximate methods.
here can reproduce good results at large scales. This is particularly interesting for
BAO analysis. If we are also interested in higher order statistics we will need more
sophisticated methods that may require more computing resources or a more complex
bias model. Depending of the needs of a particular study it will be more convenient
to use one code or other.
2.7 Conclusions
2. Sample a theoretical halo mass function n(> M ) with a list of Nh halo masses
M and order them in descending mass.
100 140
BigMD.FoF BigMD.FoF
80
COLA 120 COLA
EZmock EZmock
HALOgen 100 HALOgen
60 LogNormal PATCHY
PATCHY 80 PINOCCHIO
40 PINOCCHIO 60
PTHalos
PTHalos
s2 ξ0 (s)
r 2 ξ0 ( r )
20 40
20
0
0
20
20
40 40
1.15 1.15
1.10 1.10
1.05 1.05
ratio
ratio
1.00 1.00
0.95 0.95
0.90 0.90
0.85 0.85
0.800 20 40 60 80 100 120 140 160 180 0.800 20 40 60 80 100 120 140 160 180
r [h−1 Mpc] s [h−1 Mpc]
600 800
700
500
600
400 500
400
300
k1.5 P0 (k)
k1.5 P0 (k)
300
BigMD.FoF
200 COLA BigMD.FoF
EZmock 200 COLA
HALOgen EZmock
LogNormal HALOgen
PATCHY PATCHY
PINOCCHIO PINOCCHIO
100 PTHalos PTHalos
90 100
1.15 1.15
1.10 1.10
1.05 1.05
ratio
ratio
1.00 1.00
0.95 0.95
0.90 0.90
0.85 0.85
0.80 0.1 0.2 0.3 0.4 0.5 0.80 0.1 0.2 0.3 0.4 0.5
k [h Mpc−1 ] k [h Mpc−1 ]
Figure 2.16: Comparison of the 2-point functions for different methods. Top sub-
figures show configuration space whereas bottom panels show Fourier space. Left
sub-figures show real space and right subfigures are represented in redshift space.
110 Chapter 2. HALOGEN: an approximate halo catalogue generator
4.0 1e8
BigMD.FoF
3.5 COLA
EZmock
3.0 HALOgen
LogNormal
2.5 PATCHY
PINOCCHIO
PTHalos
B(θ)
2.0
1.5
1.0
0.5
0.0
1.2
1.1
ratio
1.0
0.9
0.8
0.0 0.2 0.4 0.6 0.8 1.0
θ12/π
Figure 2.17: Comparison of the 3-point functions for different methods. Left: 3-
point function in real space with fixed r1 = 10h−1 Mpc and r2 = 20h−1 Mpc and free
r3 Right: Bispectrum with k1 = 0.1 h Mpc−1 and k2 = 0.2 h Mpc−1 , and a varying
angle θ12 .
α(M )
the cell density and halo mass Pcell ∝ ρcell . We select random particles
within cells, respecting the exclusion criterion and conserving mass in cells (cf.
Section 2.3).
4. Assign the velocity of the selected particle to the halo through a factor vhalo =
fvel (M ) · vpart
Further, we noted the modularity of these steps and acknowledged alternatives for
each of them. The 2LPT in step (1) provides us with the correct large scale clustering
at a low computational cost, while step (2) reconstructs the halo mass function. The
heart of halogen is step (3) where the mass dependent bias is modelled through
the parameter α(M ) that stochastically places more massive halos in overdensities,
recovering the correct 2-point correlation function as a function of mass. We also
preclude halos from overlapping to match the small-scale behaviour of the 2-point
clustering. In the last step (4), we re-map particle velocities in order to obtain the
correct halo velocity distribution.
2.7. Conclusions 111
We studied how the parameters of the method – α(M ), fvel (M ) and lcell – can be
optimised and summarised the results in Table 2.3. Though halogen needs a
reference halo catalogue from an N-Body simulation to obtain α(M ) and fvel (M ),
once they have been optimised for a given setup, halogen can be used to generate
a multitude of halo catalogues, allowing the quantification of cosmic variance.
The halo mass function is recovered by construction to the theoretical value. The
2-point function at intermediate scales (10h−1 Mpc < r < 50h−1 Mpc, where the bias
is controlled by α(M )) can be obtained in a BigMultiDark-like simulation at the
∼ 2% level and to the 15% level at BAO scales (80h−1 Mpc < r < 110h−1 Mpc)
(Figure 2.10). In redshift space, the error at intermediate scales rises to ∼ 4%
and remains at ∼ 15% at large scales (Figure 2.13). The clustering has a mass-
dependence, for which the accuracy is controlled by the number of bins in the α(M )
fit (Figure 2.4). The power spectrum can be recovered at the 5% level in the range
of scales 0.01Mpc−1 h < k < 0.3Mpc−1 h (Figure 2.12). The halo PDF is accurately
reproduced at low Nhalo /cell, but overpredicts the high-Nhalo /cell tail where the
contributions of non-linearities are higher (Figure 2.11).
halogen was constructed in favour of simplicity of the method and adaptabil-
ity. Even though goliat and BigMultiDark have different characteristics (see
Table 2.1), halogen can be used for both with little recalibration effort. In Sec-
tion 3.2.1 we will also fit it to mice simulation, with still another very different setup.
This indicates that halogen is not only capable of running on one specific box-size,
redshift or cosmology, which makes it a powerful tool for exploring the statistics of
varying cosmologies etc.
We have also verified that changing the initial phases in 2LPTic for halogen leads
to changes in the correlation function (due to cosmic variance) that follow the N -
body simulation both in shape and normalisation. This implies that doing so will
yield robust estimates of cosmic variance, over potentially hundreds to thousands of
realisations. Hence, it has been demonstrated that halogen is a powerful tool for
modelling statistics of halo catalogues, and quantify the effects of cosmic variance
on them.
112 Chapter 2. HALOGEN: an approximate halo catalogue generator
Comparing halogen with other methods, we find that the 2-point correlation func-
tion at large scales is well recovered by nearly all methods including halogen.
halogen is also found well suited for PDF statistics. If we also want to recover
non-linear scales or 3-point functions, a more sophisticated method would be re-
quired. This method could either have a very accurate density field as cola, for
which computing resources are high compared with a statistical approach as halo-
gen, or a complex bias model with many free parameters that need be tuned to
recover all the different statistics (as patchy and EZmocks), losing in adaptability
and simplicity. This links with the idea of modularity remarked across the chapter,
we could change the density field (step 1) or the way we place halos (step 3).
For example, for BAO physics, where only large scales of the 2-point function are
relevant or for Counts-in-Cells (observational counterpart to PDF), halogen has
been demonstrated to be a powerful tool able to generate fast mock catalogues, with
low computing resources and simple algorithms. In the next chapter we will see an
example of exactly this: how halogen is used to study the systematics and account
for the cosmic variance of an experiment, and show how eventually will be used to
determine the error bars of a BAO measurement.
Approximate halo mock generation is an emerging field that will have great impact
in the coming years with the increasing volume surveyed by the experiments. For
different studies there will be a different optimal method depending on the accuracy
needed, computing resources available, adaptability to different cosmologies required,
number of catalogs needed, etc. Having a variety of methods available and knowing
the strengths and weaknesses of all of them will be crucial for the experiments. The
new era of observational cosmology is moving forward fast and cosmology modelling
must adapt its pace for the new times.
2.7. Conclusions
Table 2.6: Main technical features of the methodologies. From top to bottom: whether they provide mass and velocity,
how they generate initial conditions, whether they used the same initial random seeds as BigMultiDark (cola did not
and large scales could be affected by cosmic variance), whether they generate or assume a halo mass function (EZmocks
and patchy generate it with a post-processing procedure explained in [146]), whether they assume a bias model, whether
provide substructure and merger trees, the number of free parameters introduced in total, for the RSD, for HMF and for
the bias.
113
Chapter 3
The Dark energy Survey (DES) [19] is a photometric survey designed to observe the
southern hemisphere sky. In particular, DES aims at constraining the equation of
state w(a) of Dark Energy in order to shed light on its nature. For that it combines
four different main probes:
Observations are performed with the 570-Megapixel digital Dark Energy camera
(DECam) mounted on the 4-meter Victor Blanco Telescope in Chile. DECam was
specifically designed for this experiment. Its main peculiarity is its high sensitivity at
the red end of the visible spectrum and at the near infrared, crucial for the detection
of objects at high redshift. The survey will cover 5000 deg2 using a field of view of
115
116 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
2.2 deg of diameter with five different filters (the traditional g, r, i, z and the infrared
Y) over 5 years, reaching a magnitude limit of 24 in the band i. DES will observe
∼ 200 million galaxies up to z ∼ 1.4 determining its angular position, photometric
redshift (photo-z) and shape.
Opposed to spectroscopic redshift surveys, where redshift can be measured accurately
(σz ∼ (0.001 − 0.0001)(1 + z)) with a spectrograph, DES is a photometric survey
where redshifts are estimated by the combination of flux obtained in the 5 filters
(see some techniques in [148–150]) with a typical accuracy of σz ≈ 0.03(1 + z).
This decreases the knowledge we have about galaxy radial positions, but at the
same time it allows us to increase significantly the number of observed galaxies (as
spectra measurements are very time consuming), obtaining a complete magnitude-
limit survey. Additionally, problems with fibre collisions and apertures, associated
with spectroscopy, do not appear here.
Baryonic Acoustic Oscillations (BAO) of the primordial photon-matter plasma leave
their imprint in the large scale distribution of matter. From galaxy positions, we can
measure the correlation function and find the BAO feature. Detecting BAO with
a photometric survey will be an arduous task, since most of the radial information
is lost, and the density field is effectively smoothed. But, carefully analysing the
data, DES will be able to detect the BAO scale evolution with redshift in the range
0.6 . z . 1.4 and, consequently, measure the evolution of expansion of the Universe.
More importantly, this is a range not explored before with BAO, and it will tighten
the constraints in the distance-redshift relation shown in Figure 2.
Galaxy shapes are distorted due to gravitational lensing. Whereas in some cases, this
effect is so strong that we see multiple images of the same galaxy (strong lensing),
generally is much milder and can not be seen in individual galaxies (as intrinsic
dispersion of shapes is larger), but only study it statistically. This phenomenon is
known as Weak Lensing and tells us about the amount and clustering stage of dark
matter. Galaxy cluster counts is another mean to measure the dark matter and its
stage of clustering, as is tightly related to the high mass halo abundance as seen in
simulations. In the standard ΛCDM model we expect to detect over 100, 000 clusters
with DES (being sensitive to clusters with ∼ 10 red-sequence galaxies). Studying
these two effects as a function of time will be another probe of the expansion of the
3.1. The Dark Energy Survey 117
universe.
SNIa are used as standard candles in Cosmology to study the evolution of the Uni-
verse and were the first evidence of Dark Energy. DES has 4 special fields for SNIa
search different to the galaxy field (Figure 3.1), as for SNIa we need to target the
same field periodically to search new appearing objects and characterise their light
curves (flux as a function of time). Each DES supernova field is revisited ∼ 5 times
every month, and will discover ∼ 4000 SNIa up to redshift z ∼ 1.
All probes combined together will tightly constrain the time-dependent equation of
state of Dark Energy parametrised as w = w0 + (a − 1)wa , as already indicated at
the bottom of Figure 1. But DES is well suited for many more astrophysical studies.
From DES early data, there have been many remarkable discoveries [151]: 17 (out of
48 known) Milky Way satellite galaxies, a new type of objects termed Super-luminous
SN, high-redshift (z ∼ 6) and lensed quasars, 34 new transnewtonian objects, etc.
DES is also relevant for the two major astrophysical events that happened in the
last few months and that even reached the public attention: the discovery of a ninth
planet in the Solar System [152] and the direct detection of gravitational waves by
the LIGO experiment [153]. DES has an agreement with LIGO to search for an
optical counterpart of any triggered detection of gravitational waves. No optical
counterpart was found for LIGO event GW150914 [154, 155], caused by a merger
of two massive black holes. This is not surprising, since this type of merger is not
expected to emit in the optical, but it can be really useful for other type of events or
to find unexpected physics. As for the ninth planet, the predicted trajectory [156]
passes through the DES observed area, so that a detection may be possible in the
future.
The DES data are split by seasons into Science Verification (SV), Year-1 (Y1), Year-
2 (Y2), etc. The Science Verification observations were taken in 2012 and 2013 and
provided data of over 250 deg2 at nearly the nominal depth of DES. It was used
to test all the science potential of the 5-year survey, finding promising results for
cosmology [157–162]. While SV has been widely analysed, DES collaboration is now
analysing Y1 post-processed data taken between August 2013 and February 2014. Y1
covers a large fraction of the targeted area but at a milder flux limit (Figure 3.1). Y2
118 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
Figure 3.1: DES observed strategy footprint from [151]. We find the supernova field,
SV regions, and Y1, Y2 and Y5 masks in equatorial coordinates. Dash and dotted
lines represent, respectively, the galactic and ecliptic planes.
is currently being post-processed, although some results quoted above have already
emerged from it.
This chapter is part of the work done within the Large Scale Structure Working
Group with the aim of detecting BAO from Y1 data. It focuses particularly in the
creation of galaxy mock catalogues matching the overall statistics of the selected
LSS-Y1 sample (see Section 3.2) to be used to compute the covariance matrices and
error bars on the large scale clustering. We further present preliminary results of its
application for the data analysis (Section 3.3).
3.2. HALOGEN lamps: observational galaxy mock catalogues 119
• Photo-z. DES is a photometric survey for which zrsd is estimated by zph with
low precision. This effect mixes galaxies of different z in the same zph -bin, we
will see how to implement it in Section 3.2.2.
In this section we simulate these three effects with the aim of creating galaxy
mock catalogues with the same statistical properties as the selected LSS-Y1 sample.
Namely, the same galaxy number density as a function of redshift n(zph ), the same
angular correlation function in zph -bins wi (θ) and the same P (zrsd |zph ) distribution.
The selection of the LSS-Y1 sample has been optimised to yield a BAO detection with
error below the 5%. It consists in a sub-sample of the full Y1 data, to which we apply
three main cuts in the different filter magnitudes: completeness 17.5 < mi < 22,
brightness mi < 19 + 3zph and red selection (mi − mz ) + 2(mr − mi ) < 1.7. The
1
From now we will use zrsd as the ideally measured redshift with no error but with redshift space
distortions included. We introduce this notation to distinguish it from the z = ztrue that represents
the cosmological time and has been used so far, and also from zph .
120 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
selection has been done balancing the trade-off between having a sample with higher
bias and better photo-z (brighter and redder sample) and reducing the shot-noise
(that increases if we reduce the number density). This is optimised together with
the selection of the mask, based on the goodness of the different areas (see [163] for
the details).
This links with a forth observational feature: the application of a mask. A mask
consists in a list of pixels (we use healpix pixelisation2 ) telling us which regions of the
sky can be used, and which ones can not. Excluded regions can be due to no observa-
tion, insufficient observation (for magnitude limited samples), bad seeing, foreground
(mainly stellar) contamination or other causes for large systematics. This leads to
a somewhat patchy footprint that depends specifically on the selected sample. The
red region in Figure 3.1 represents approximately the Y1 mask. More specifically,
the Y1-LSS mask has an area of ∼ 1426deg 2 . This mask does not fit in an octant (it
is around 150◦ wide in ra), and we need to cover it with three different patches of the
lightcone generated in Section 3.2.1. We will not enter into the details of masking,
beyond noting that the Y1-LSS mask has been applied to all the catalogues analysed
in the figures from Figure 3.4 onwards.
3.2.1 Lightcone
Y
ra = arctan
X
Z
dec = arcsin (3.1)
r
1 + z(r)
zrsd = z(r) + ~u · r̂
c
√
being r = X 2 + Y 2 + Z 2 , ~u the comoving velocity, r̂ = ~r/r and z(r) the inverse of
2
http://healpix.sourceforge.net/
3.2. HALOGEN lamps: observational galaxy mock catalogues 121
z
dz 0
Z
r(z) = c (3.2)
0 H(z 0 )
The reason why this implementation is called a lightcone is because the cosmolog-
ical time (t(z)) is determined by the radial distance r in the same way as done in
observations as light travels. But the Universe changes with time and hence, our
simulations will also change with redshift.
Particularly, we are interested in a redshift-dependent clustering, and hence we will
have the halogen parameters (α and fvel , summarised in Table 2.3) varying as
a function of redshift as well. For this we will use as a reference the mice N -
Body simulation (see Table 2.1 and Section 2.1.2) and fit α(M ) and fvel (M ) at the
snapshots z = 0, 0.5, 1.0, 1.5 and interpolate at intermediate redshifts.
The outcome of the fit is shown in Figure 3.2 where we see the mass-dependent clus-
tering for the snapshots z = 0.5 and z = 1.0. Note, that the reference density for this
simulation is ∼ 4.5 times bigger than the one previously used for BigMultiDark,
and the minimum mass Mmin here 4 times smaller. This is roughly the minimum
number density that we need to simulate the sample. We found that in this case a
logarithmic binning of masses was more useful, and we represent in Figure 3.2 the
Mass thresholds that were used during the fitting.
The halogen parameters (including the HMF) were interpolated to z = 0.55,
0.625, 0.675, 0.725, 0.775, 0.825, 0.875, 0.925, 0.975, 1.05 and halogen was run
at those redshifts. We build the lightcone from the superposition of zrsd shells of
those snapshots by setting the edges at the intermediate redshifts, and saving data
from 0.45 < zrsd < 1.2 (restricted for storage saving). We repeat the same process
8 times setting the observer in the 8 corners of the box to generate 8 different cat-
alogues. This process might not be ideal and we are working on a future version of
the catalogues where we avoid the need for 10 snapshots by building the lightcone
directly in one box with growth factors that depends on the position D1,2 (z(r)) in
the 2LPT Equation 1.5.
Finally, we compare the resulting halogen lightcone with the halo lightcone gener-
ated by mice in Figure 3.3. The mice simulated lightcone is constructed from fine
122 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
1000 1000
z=0.5 z=1.0
100 100
r2 ξ (r)
r ξ (r)
2
HALOGEN Mth=1.6e14 HALOGEN Mth=1.6e14
MICE Mth=1.6e14 NB Mth=1.6e14
Mth=8e13 Mth=8e13
10 Mth=4e13 10 Mth=4e13
Mth=2e13 Mth=2e13
Mth=1e13 Mth=1e13
Mth=5e12 Mth=5e12
Mth=2.5e12 Mth=2.5e12
1.15 1.15
1.1 1.1
1.05 1.05
ratio
ratio
1 1
0.95 0.95
0.9 0.9
0.85 0.85
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
r r
Figure 3.2: 2-point correlation function of mice vs. halogen halos in the simulation
box at the snapshots z = 0.5 (left) and z = 1.0 (right). We show the different mass
thresholds used during the fit.
shells (∆z = 0.005 − 0.025) of snapshots generated from a full N -Body simulation,
and using the velocity of the particles to extrapolate their positions at the precise
moment they cross the lightcone [164]. Remarkably, despite the great differences in
the methodology, the angular correlation function from both lightcones shows very
good agreement at all the interpolated redshifts.
0.1
0.01
w(θ)
Figure 3.3: Angular correlation function of halos from the mice (crosses) and halo-
gen (lines) lightcones. The different curves correspond to different redshift bins with
width ∆zrsd = 0.5 and centred at the indicated zrsd .
with
bin(z)
∆ph (z) = sdata (3.4)
being Rgauss (0, 1) a Gaussian random number with mean 0 and standard deviation
1, and bin(z) a discrete function giving i between 1 and 8 according to Table 3.1.
For zrsd > 1.0 and zrsd < 0.6 we use, respectively, the value of ∆ph from the last and
first bin.
However, as shown in Figure 3.4 (cp, dashed lines), this is far from reproducing the
data. The reason is that, as σ i is not independent of redshift (although si is flat in a
certain z range), a width in a P (zrsd |zph ) distribution is not equivalent to a width in
P (zph |zrsd ). This concept may be better understood with an illustration: a galaxy
124 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
bin-i 1 2 3 4 5 6 7 8 – – Ξ2
z-range [0.6,0.65) [0.65,0.7) [0.7,0.75) [0.75,0.8) [0.8,0.85) [0.85,0.9) [0.9,0.95) [0.95,1.0) [1.0,1.1) [1.1,1.2) –
sidata 0.030 0.030 0.029 0.029 0.030 0.035 0.041 0.049 – – –
∆icp 0.030 0.030 0.029 0.029 0.030 0.035 0.041 0.049 – – 0.612
sicp 0.031 0.034 0.039 0.042 0.044 0.044 0.044 0.043 – – –
∆iopt 0.031 0.029 0.029 0.029 0.029 0.029 0.029 0.030 0.040 0.050 –
siopt 0.030 0.030 0.031 0.032 0.035 0.039 0.040 0.039 – – 0.098
Table 3.1: Photo-z. Redshift intervals for the 8 z-bins (rows 2 and 1) and, below, measured (from P (zrsd |i)) and
applied (in Equation 3.3) widths. First (row 3), we find the measured width in the data sidata , then we present the
input (∆) and output (s) of two models: cp for which we take ∆icp = sidata and opt for which ∆iopt are free and set
to minimise Ξ2 . For ∆iopt we allow two additional z-bins. Finally, in the last column we show the Ξ2 as defined in
Equation 3.5 obtained for the two models.
that has been assigned zph = 0.65 will be more likely coming from zrsd = 0.8 than
from zrsd = 0.5, since the error applied at higher redshifts is bigger. This effect skews
and widens the distribution. This is also seen in the widths sicp measured from the
P (zrsd |i) distribution of the catalogs after applying this method (Table 3.1).
In order to improve this, in a second method, we vary the values of ∆iph and minimise
8
X (si i
method − sdata )
2
Ξ2 = (3.5)
i=1
(sidata )2
where, simethod is the measured width in the P (zrsd |i) distribution after having applied
∆imethod in Equation 3.3.
Further, we allow different values of ∆ph in two additional bins, z ∈ [1.0, 1.1) and
z ∈ [1.1, 1.2), as we find it helps minimizing Ξ2 . The best fit values ∆iopt and the
outcome siopt are shown in the last two rows of Table 3.1. Note that ∆opt remains
nearly flat in all the target redshift range, and that it is the contamination from
higher redshifts what makes siopt change with redshift.
The P (zrsd |i) for this method has been also plotted in Figure 3.4 (opt, solid lines),
showing an improvement with respect to the previous method. We fix this photo-z
scheme for the rest of the results presented below.
So far, all the clustering measurements shown throughout the thesis were obtained
from halo catalogues at a given mass threshold. But observed clustering is typi-
3.2. HALOGEN lamps: observational galaxy mock catalogues 125
9
Data
∆cp
8
∆opt
bin 1
7 bin 2
bin 3
6 bin 4
bin 5
5 bin 6
P(zrsd|i)
bin 7
bin 8
4
0
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
zrsd
Figure 3.4: zrsd probability distribution in each zph -bin i for the data and the two
photo-z schemes described in the text and applied to the mock catalogues.
cally measured from galaxy catalogues with a magnitude limited sample with the
associated selection effects and, more generally, with redshift-dependent colour and
magnitude cuts.
One could assume that the most massive halo in a simulation would correspond to
the most luminous galaxy in the observations and that we could do a one-to-one
mapping in rank order. This is certainly very optimistic and we need to add a
scatter in the Luminosity-Mass relation (L − M ) that will decrease the clustering
for a magnitude-limit sample. This idea presented here is the basis of the Halo
Abundance Matching (HAM) method [76–79].
halogen was designed to only deal with main halos, neglecting subhalos (Sec-
tion 2.3.2 & Section 2.1.2). This limits the potential of HAM, as we can not use its
natural extension to subhalos SHAM, where there is more freedom in the physics
modelling (see e.g. [167]).
Nevertheless, we already argued (Section 2.1.2) the possibility of adding substructure
to a main halo catalogue with a Halo Occupation Distribution (HOD) scheme. We
126 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
know that halos can host more than one galaxy, especially massive halos which rep-
resent galaxy clusters and host tens of galaxies. If we attribute a number of galaxies
Ngal which is an increasing function of the halo mass (Mh ) to a halo mock cata-
logue, the clustering will be enhanced, since massive halos will be over-represented
(as occurs in reality). This is the basis of the HOD methods [67–75].
We just presented two models that need to be implemented to a halo catalogue to
get a realistic galaxy catalogue. The details of these methods need to be matched
to observations via parameter fitting. This process can be particularly difficult if
one aims at having a general model that serves for any sample with any magnitude
and colour cut at any redshift (e.g. [73]). Additionally, the HOD implementation
will determine the small scale clustering corresponding to the correlation between
galaxies of the same halo. This is the 1-halo term according to the halo model [168],
as opposed to the 2-halo term, which is relevant at large scales and corresponds to
correlation between galaxies of different halos.
Here, we aim at presenting the first end-to-end galaxy mock catalogue set for the
Y1-LSS selected sample. Hence, we start from the simplest model that can match
the large scale clustering and number density. This will consist in applying a HAM
with a one-parameter L − M scatter when the halo-clustering is higher than the data
and a one-parameter HOD when the halo-clustering is lower than the data.
As we are only interested in a particular sample, we use M gal as a proxy for lumi-
nosity, and the HAM scatter is modelled as
with γ a free parameter, indicated the dispersion in dex (decimal exponent units).
For the HOD, we always set a central galaxy at the centre of the halo and Nsat
satellite galaxies following a NFW profile [66] where Nsat is a Poisson draw of the
halo mass divided by the free parameter M1 :
Mh
Nsat (Mh ) = RPoisson (3.7)
M1
3.2. HALOGEN lamps: observational galaxy mock catalogues 127
We can find in the literature more complex Nsat (Mh ) functions that include expo-
nentials, exponentials by parts and error functions. However, we chose Equation 3.7
for simplicity in the fitting, finding valid results for the desired purpose. Moreover,
studies using power-law HODs find best fit values for the exponent very close to
unity [73], which leads to Equation 3.7.
The concentration relation needed for the NFW placement is determined by the mass
following [169]. The velocities of the central galaxies are taken from the host halo,
whereas the velocities of the satellite have an added dispersion following [170, 171]
where we use fvir = 0.9 and ∆vir (z) = 18π 2 + 82d(z) − 39d(z)2 from spherical top-hat
collapse theory, being d(z) = 1 − Ω(z) and E(z)2 = H(z)2 /H02 . Note that virial
theorem together with Equation 1.9 already predicts σ ∝ M 1/3 .
All the galaxies contained in a halo with mass Mh are assigned with the same mass
M gal = Mh .
This HOD-HAM process is done before constructing the lightcone and applying the
photo-z, but measurements of the target wi (θ) in the 8 zph -bins and its associated
χ2 are performed after those processes:
8 i
2
X X (wdata (θj ) − w̄i (θj ))2
χ = (3.9)
i=1 0.1◦ <θj <1◦
∆w(θj )2
Here, we use the same z-bins previously introduced in Table 3.1. The fit of this
procedure is done from 8 catalogues as follows:
• In each ztrue -bin i apply either the HOD or the HAM scatter with one pa-
rameter (M1i or γ i ) depending on whether we need to enhance or reduce the
clustering, respectively. The bin-1 value is also used for the low-z extension of
the lightcone (ztrue < 0.6) and the bin-8 for the upper part (ztrue > 1.0).
128 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
• Apply the lightcone, mask and photo-z. This mixes galaxies drawn from differ-
ent physics (γ i /M1i ), but this helps smoothing the transition between z-bins.
gal
• Compute the mass threshold Mth (zph ) (a proxy for a redshift-dependent mag-
nitude cut) needed to match the N (z) of data for each of the 8 catalogues.
gal
Compute the average of them and apply that threshold M̄th (zph ) to all the
catalogues.
• Compute the angular correlation function in the 8 zph -bins for the 8 catalogues
and measure their mean w̄(θ) and standard deviation ∆w(θ) to estimate χ2
(Equation 3.9)
The resulting fitted parameters are shown in Table 3.2. These are used for the genera-
tion of the 8 halogen-lamps catalogues whose statistics are shown in Figure 3.5 and
Figure 3.6. Both number density and angular clustering show an excellent agreement
with data. Moreover, we see that the galaxy mock catalogues (halogen-lamps)
represents a great improvement with respect to the halo catalogues (halogen).
Hence, the implementation of the HOD-HAM scheme appears necessary.
Finally, we remark that the dispersion γ found in the last three bins is large compared
to the typical dispersions found in the literature [172, 173]. This is partially due to the
sample selection and partially due to the modelling. Firstly, in those bins, the density
field drops quickly (Figure 3.5), and low density (highly biased) samples typically
present more dispersion. Moreover, the photo-z selection gets more contaminated
(see the broadening in Figure 3.4, or sidata values), and what is meant to be a highly
biased sample (especially due to the low density) may be selecting average galaxies
from other redshift, needing more scatter to compensate. Finally, the halo catalogues
have a mass resolution of M = 2.5 · 101 1h−1 M . Given the large dispersions that
we are applying (over 2-3 orders of magnitudes), it is clear, we are lacking lower
mass halos that would decrease the bias more efficiently. In fact, the bias barely
changes after γ & 1.5, clearly pointing towards the convenience of improving the
mass resolution of the catalogue.
3.3. Results and Applications 129
bin-i 1 2 3 4 5 6 7 8
log10 (M1i ) 13.4 13.6 14.2 14.5 14.0 – – –
γi – – – – – 2.6 2.6 3.5
Table 3.2: HOD and HAM fitted parameters for the 8 z-bins (Table 3.1). M1 is the
mass scale of the HOD and γ the scatter in the L − M relation in dex.
0.0016 15
HALOGEN-lamps
HALOGEN-lamps mean
0.0014 Mthgal
Data 14.5
0.0012
14
n(z) [(h/Mpc)3]
0.001
log10(M)
13.5
0.0008
13
0.0006
12.5
0.0004
0.0002 12
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
z
Figure 3.5: Left-axis: Number density of galaxies in units of (h/M pc)3 as a function
of zph for data and galaxy mock catalogues. For the latter we show the mean and σ
(halogen-lamps mean), and individual curves (halogen-lamps) over 8 realisa-
gal
tions. Right-axis: Mass threshold M̄th used to get the halogen-lamps catalogues,
the value indicates the decimal logarithm of mass in h−1 M .
Certainly, as we improve our understanding on the data, we will improve the mod-
elling and vice-versa. At the moment, in this section, we have constructed the first
set of catalogues that reproduce the three main properties of the Y1-LSS sample, as
shown in Figure 3.4, Figure 3.5 and Figure 3.6.
Once we have a set of galaxy mock catalogues, we can use them for many applications:
• First, gain insight into the modelling and compare statistics with theoretical
130 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
0.60<z<0.65 0.65<z<0.70
0.1
w(θ)
0.01
HALOGEN-lamps
HALOGEN-lamps mean
HALOGEN mean
Data
0.001
0.70<z<0.75 0.75<z<0.80
0.1
w(θ)
0.01
0.001
0.80<z<0.85 0.85<z<0.90
0.1
w(θ)
0.01
0.001
0.90<z<0.95 0.95<z<1.00
0.1
w(θ)
0.01
0.001
0.1 1 0.1 1
θ [°] θ [°]
Figure 3.6: Angular correlation function in the 8 zph -bins (as indicated in each
panel). We compare the clustering of the Y1-LSS sample data with the halo mock
catalogues (halogen) and the galaxy mock catalogues (halogen-lamps), showing
the mean and standard deviation computed over 8 mock catalogues. Further, we
show the clustering of each of the 8 galaxy mock catalogues that can be more directly
comparable with the data. For all catalogues have been imposed the same rough n(z)
3.3. Results and Applications 131
• Additionally, we can study the optimal methodology to extract the data (Sec-
tion 3.3.2).
• Eventually, compute covariance matrices and set the uncertainty on the –BAO
and other– measurements Section 3.3.3.
All the results presented in this section are provisional, and some of the figures
presented here were based on a previous version of the catalogues (with a simpler
photo-z modelling and only halos). But we want to emphasise the need and function-
ality of these catalogues in the process of the analysis and optimisation rather than
presenting results, that will not be definitive until the optimisation in the sample is
finished, the method fixed, and the results published by the collaboration.
Whereas in Section 3.2.3 we already compared the catalogues with data during the
calibration, here we start by comparing them with purely theoretical models. This
will help us understanding the models and their range of validity.
We show in Figure 3.7 a comparison of the clustering and its error with theoretical
predictions done by the method explained in [174]. The theory part implemented the
same bias b(z), photo-z P (zrsd |i) and number density N (z) as the mock catalogues.
We find a good agreement both for the mean and error on w(θ) for all zph -bins,
although, as expected, the theoretical predictions underestimate the errors. These
errors represent the diagonal part of the covariance matrices. We leave for a future
study the comparison of off-diagonal components.
Being able to model the data from simulations allows us to understand better the
physics behind, and to control it in the simulation. For example, in the left panel of
Figure 3.8 we study the exact effect of adding a photo-z to our catalogues in the clus-
tering and the difference between the two photo-z models introduced in Section 3.2.2.
132 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
10-1 theory
mocks
10-2
w(θ)
10-3
10-4
0.6<z<0.65 0.65<z<0.70 0.7<z<0.75 0.75<z<0.8
-5
10-1
10 0.8<z<0.85 0.85<z<0.90 0.90<z<0.95 0.95<z<1.00
10-2
w(θ)
-3
10
10-4
-5
10
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
θ (deg) θ (deg) θ (deg) θ (deg)
theory
mocks
-3
∆w(θ)
10
-3
∆w(θ)
10
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
θ (deg) θ (deg) θ (deg) θ (deg)
Figure 3.7: Comparison of estimation from theory and mock catalogues of the mean
of the angular clustering (top) and the 1 − σ error (bottom) for the 8 zph -bins.
504 halogen halo mock catalogues have been used for this estimations. Theory
predictions provided by M. Crocce.
3.3. Results and Applications 133
1 0.01
Cosm.Var. + HOD + Photo-z
HOD + Photo-z
0.1
Photo-z
0.01
∆w(θ)
w(θ)
0.001
0.001
0.8<zrsd<0.85
0.0001 0.8<zph<0.85 (cp)
0.8<zph<0.85 (opt)
1e-05 0.0001
0.1 1 0.1 1
θ (deg) θ (deg)
Figure 3.8: Studying the effects of photo-z. On the left we compare the clustering
of a sample selected in zrsd with a sample selected in zph following the two different
methods explained in Section 3.2.2. All the catalogues were selected using the same
number density. On the right we compare the error introduced in w(θ) by the photo-
z, the HOD and cosmic variance (see text) for the 0.8 < zph < 0.85 bin.
As expected, the clustering decreases by adding the photo-z, because the density field
is effectively smoothed along the line of sight and inhomogeneities appear less pro-
nounced. The model labelled as cp, that presents a higher σph (Table 3.1), reduces
even more the clustering.
One of the motivations that we argued for the need of mock catalogues was to
account for the interplay of systematics and cosmic variance. In the right panel of
Figure 3.8, we show a comparison of the error induced in w(θ) by the photo-z, by the
combination of photo-z and HOD, and the total error by the combination of photo-z,
HOD and cosmic variance. This was computed from a) 8 different realisations of the
photo-z on the same catalogue without HOD; b) 8 realisations of photo-z and HOD
on the same catalogue, c) 8 different catalogues with the full implementation. The
HOD here refers to the combination of HOD and HAM scatter as fixed by Table 3.2.
Interestingly, we find that the error introduced by the photo-z seeds an important
fraction (∼ 0.3 − 0.5) of the total error, whereas the HOD stochasticity introduces a
negligible error.
134 Chapter 3. Dark Energy Survey Galaxy Mock Catalogues
Another important application of mocks is the test of the methodology. The way
we compress the data from a list of the coordinates of the full galaxy catalogue to
a χBAO measurement, will affect χBAO itself and particularly ∆χBAO . This can be
analysed statistically with a large set of mock catalogues.
In the past sections we took 8 zph -bins and an angular binning of 0.015◦ for the
w(θ) as a starting point. The convenience of changing both bin sizes to optimise the
precision in χBAO is now under study. Preliminary results from theory suggest that
a smaller θ-binning will increase the precision in the BAO and that zph -bins can be
widen without information loss while reducing the size of the covariance matrices.
This needs to be confirmed or refuted by mock catalogues, since the validity of
theoretical predictions at small θ-binning may be non-realistic.
In addition, a comparison of the different methods to extract the BAO information
will be carried out in [175], where different proposed methods analyse both the data
and the mock catalogues. This will include methods that analyse the clustering in
3D space (ξ(s)), the angular clustering in configuration space (w(θ)) and in Fourier
space (Cl ).
Although most of the methods extract the BAO from the angular clustering because
most of the information along the line of sight is lost, preliminary results of [176]
show that combining carefully the 3D information one can recover χBAO with similar
precision as from the angular clustering with the advantage of reducing drastically
the dataset. In [176] we study how the BAO information distributes with µ (cosine
of the angle with respect to the line-of-sight), finding a non-negligible amount at
0.2 < µ < 0.4 that is neglected by angular clustering. The relative information in
different µ intervals can be seen in Figure 3.9, where we see a well pronounced peak
at µ < 0.2 that fades at larger µ.
3.3.3 Uncertainty
Finally, the ultimate goal of the mock catalogues is to compute the covariance ma-
trices of the correlation functions and the error bars of the BAO scale. Preliminary
3.4. Conclusions 135
25
20
15
10
s2 ξ
5 µ <0.2
0.2 <µ <0.4
0.4 <µ <0.6
0 0.6 <µ <0.8
Figure 3.9: 3D correlation ξ(s) integrated in µ intervals from [176]. Solid lines
follow the theoretical models, whereas points are computed with
√ 504 halogen mock
catalogues, error bars represent the standard deviation over 504.
results are shown in Figure 3.10, where we compare the correlation from the data
(BPZ) with halogen mock catalogues, a catalogue from an N -body simulation
(Buzzard) and a theoretical model. The errors from the data and Buzzard are com-
puted with a Jack-Knife algorithm. In the future we will see a similar figure with the
errors estimated purely from the mock catalogues. The results are very encouraging
and all the work done in the LSS working group is promising.
3.4 Conclusions
We have presented the first end-to-end set of galaxy mock catalogues for the Y1-LSS
sample of the Dark Energy Survey. They have been designed to match the data in
three statistics
25 Y1 Red Sample
20
15
10
r2 ξ (h−2 Mpc2 )
5
BPZ
Buzzard Mock
10 HALOGEN mocks
1520 40 60 80 100 120 140 160 180 200
r (h−1 Mpc)
Figure 3.10: Preliminary 3D correlation function ξ(r) from data (BPZ), halogen
catalogues, Buzzard catalogue (N -body simulation) and theoretical model (solid
line). halogen points represent the mean and standard deviation from 504 halo
mock catalogues. Errors in both Buzzard and BPZ are estimated via Jack-Knife.
Figure provided by A. J. Ross
3.4. Conclusions 137
• Lightcone. The parameters of halogen are fitted with several snapshots (at
z = 0.0, 0.5, 1.0, 1.5) and then interpolated at intermediate redshifts. The
lightcone is constructed by superposition of z-shells of ∆z = 0.05 at the range
of interest.
In this thesis, I presented the research carried out during my PhD in the field of Large
Scale Structure of the Universe. It connects on the one side simulations and on the
other side observations. In Chapter 1, I studied the suitability of certain halo finders
and merger tree builders in the field of N -Body simulations. During Chapter 2,
I developed a technique to generate approximate halo mock catalogs: halogen.
Finally, in Chapter 3 that technique was applied to the analysis of observed data
from a specific galaxy survey: the Dark Energy Survey. I extensively presented the
conclusions of each chapter in Section 1.6, Section 2.7 and Section 3.4 respectively,
but I synthesize here some of the most general conclusions, together with a wider
outlook.
In Chapter 1, I showed that we need to carefully select/design the tools we use in the
simulation pipeline accordingly to the application. Specifically, for accurate merger
trees, we need halo finders able to trace halos when crossing the centre of another
halo. Achieving this can be aided by tracking algorithms or phase-space finders. It
is also desirable to have merger tree builders able to correct for halo finder flaws by,
for example, skipping one snapshot.
In Chapter 2, I argued for the need for a new generation of tools for the massive
production of synthetic galaxy catalogs. halogen was shown to be a powerful tool
able to produce halo catalogs with the correct 1 and 2-point statistics at large scales.
It consists on a single-parameter bias routine applied to a 2LPT density field, with
an analytic mass function and a velocity rescaling. I also presented a comparison of
139
140 Closure
practically all existing approximate methods (at that time) and discussed how each
of them is best suited for different applications, depending on the trade-off between
accuracy, simplicity, amount of resources and versatility.
Although no final results can be claimed until the selection and analysis of the Y1-
LSS sample concludes and the results are published, things are converging fast and
we expect to have a new BAO detection soon. But this is only the first year data
out of 5 year of DES observations. During that period a new point will appear in
the dM (z) diagram presented in Figure 2, eventually settling at the unexplored BAO
region of z = 1, perhaps solving some of the current puzzles in Cosmology, or maybe
posing new questions.
We have shown that precision Cosmology with Large Scale Structure is possible, but
not necessarily easy. The field of fast generation of mock catalogs is now boosting,
and will need soon to deal simultaneously with increasing volumes, higher precision
in measurements, more statistics to reproduce and larger covariance matrices to be
estimated. There is a lot of work ahead to be performed by the scientific community,
and a new generation of tools is needed.
141
Data analysis, theory and simulations must interact with each other, opening new
windows to explore and new horizons to reach. The cosmological revolution goes on,
supported by the work of countless cosmologist around the world doing their share,
and little by little widening the knowledge as a collective mind.
Epı́logo
143
144 Epı́logo
Aunque los resultados finales no estarán listos hasta que la selección y análisis de
la muestra Y1-LSS haya acabado, y los resultados hayan sido publicados por la
colaboración; la situación converge rápidamente y esperamos obtener una nueva
detección del BAO pronto. Pero esto es sólo el análisis de los datos del primer año, de
145
los 5 años programados en DES. Durante ese tiempo, un nuevo punto aparecerá en el
diagrama dM (z) que introjimos en la Figura 2. Con nuevos datos y el correspondiente
análisis, este punto se irá asentando en torno al área inexplorada de z ≈ 1. Quizás
ese nuevo punto nos ayude a resolver algunos de los enigmas actuales de la Cosmolgı́a
o, quizás, plantee nuevas preguntas.
El análisis de datos, las simulaciones y la teorı́a deberán interactuar los uno con
los otros, abriendo nuevas ventanas para explorar nuevos horizontes. La Revolución
Cosmológica continúa, contando con el apoyo de innumerables cosmólogos a lo largo
y ancho del planeta, contribuyendo con su grano de arena, y poco a poco ensanchando
el conocimiento como una mente colectiva.
Bibliography
[3] C.-H. Chuang, C. Zhao, F. Prada, E. Munari, S. Avila, A. Izard, F.-S. Ki-
taura, M. Manera, P. Monaco, S. Murray, A. Knebe, C. G. Scóccola, G. Yepes,
J. Garcia-Bellido, F. A. Marı́n, V. Müller, R. Skibba, M. Crocce, P. Fosalba,
S. Gottlöber, A. A. Klypin, C. Power, C. Tao, and V. Turchaninov. nIFTy cos-
mology: Galaxy/halo mock catalogue comparison project on clustering statis-
tics. MNRAS, 452:686–700, September 2015.
[5] G. Bertone, D. Hooper, and J. Silk. Particle dark matter: evidence, candidates
and constraints. Phys. Rep., 405:279–390, January 2005.
[6] L. Amendola and S. Tsujikawa. Dark Energy: Theory and Observations. Cam-
bridge University Press, 2010.
147
148 Bibliography
[7] A. Linde. Inflationary Cosmology after Planck 2013. ArXiv e-prints, February
2014.
[14] D. Kraljic and S. Sarkar. How rare is the Bullet Cluster (in a ΛCDM universe)?
Journal of Cosmology and Astroparticle Physics, 4:050, April 2015.
[15] I. Harrison and P. Coles. Testing cosmology with extreme galaxy clusters.
MNRAS, 421:L19–L23, March 2012.
[16] A. Rassat, J.-L. Starck, P. Paykari, F. Sureau, and J. Bobin. Planck CMB
anomalies: astrophysical and cosmological secondary effects and the curse of
masking. Journal of Cosmology and Astroparticle Physics, 8:006, August 2014.
[19] J. Frieman and Dark Energy Survey Collaboration. The Dark Energy Survey:
Overview. In American Astronomical Society Meeting Abstracts 221, volume
221 of American Astronomical Society Meeting Abstracts, page 335.01, January
2013.
[40] Dark Energy Survey Collaboration. The Dark Energy Survey Science Program.
[43] O. Lahav and A. R Liddle. The Cosmological Parameters 2014. ArXiv e-prints,
January 2014.
[45] J.W Eastwood R.W Hockney. Computer simulation using particles. A. Hilger,
special student ed edition, 1988.
[48] Antony Lewis, Anthony Challinor, and Anthony Lasenby. Efficient computa-
Bibliography 155
[67] Y. P. Jing, H. J. Mo, and G. Börner. Spatial Correlation Function and Pairwise
Velocity Dispersion of Galaxies: Cold Dark Matter Models versus the Las
Campanas Survey. ApJ, 494:1–12, February 1998.
[68] J. A. Peacock and R. E. Smith. Halo occupation numbers and galaxy bias.
MNRAS, 318:1144–1156, November 2000.
Bibliography 157
[71] H. Guo, I. Zehavi, and Z. Zheng. A New Method to Correct for Fiber Collisions
in Galaxy Two-point Statistics. ApJ, 756:127, September 2012.
[75] R. A. Skibba and R. K. Sheth. A halo model of galaxy colours and clustering
in the Sloan Digital Sky Survey. MNRAS, 392:1080–1091, January 2009.
[84] P. Monaco, F. Fontanot, and G. Taffoni. The MORGANA model for the rise
of galaxies and active nuclei. MNRAS, 375:1189–1219, March 2007.
Bibliography 159
[88] J. R. Bond, S. Cole, G. Efstathiou, and N. Kaiser. Excursion set mass functions
for hierarchical Gaussian fluctuations. ApJ, 379:440–460, October 1991.
[89] F. Jiang and F. C. van den Bosch. Generating Merger Trees for Dark Matter
Haloes: A Comparison of Methods. ArXiv e-prints, November 2013.
[91] C. Lacey and S. Cole. Merger rates in hierarchical models of galaxy formation.
MNRAS, 262:627–649, June 1993.
[101] S. R. Knollmann and A. Knebe. AHF: Amiga’s Halo Finder. ApJS, 182:608–
624, June 2009.
[102] J. Han, Y. P. Jing, H. Wang, and W. Wang. Resolving subhaloes’ lives with
the Hierarchical Bound-Tracing algorithm. MNRAS, 427:2437–2449, December
2012.
[114] P. Coles and B. Jones. A lognormal model for the cosmological mass distribu-
tion. MNRAS, 248:1–13, January 1991.
[118] F.-S. Kitaura, G. Yepes, and F. Prada. Modelling baryon acoustic oscillations
with perturbation theory and stochastic halo biasing. MNRAS, 439:L21–L25,
March 2014.
[120] M. White, J. L. Tinker, and C. K. McBride. Mock galaxy catalogues using the
quick particle mesh method. MNRAS, 437:2594–2606, January 2014.
[121] C.-H. Chuang, F.-S. Kitaura, F. Prada, C. Zhao, and G. Yepes. EZmocks:
extending the Zel’dovich approximation to generate mock galaxy catalogues
with accurate clustering statistics. MNRAS, 446:2621–2628, January 2015.
Bibliography 163
[122] Y. Feng, M.-Y. Chu, and U. Seljak. FastPM: a new scheme for fast simulations
of dark matter and halos. ArXiv e-prints, March 2016.
[123] S. R. Knollmann and A. Knebe. AHF: Amiga’s Halo Finder. ApJS, 182:608–
624, June 2009.
[132] J. R. Bond, S. Cole, G. Efstathiou, and N. Kaiser. Excursion set mass functions
for hierarchical Gaussian fluctuations. ApJ, 379:440–460, October 1991.
164 Bibliography
[135] D. Alonso. CUTE solutions for two-point correlation functions from large
cosmological datasets. ArXiv e-prints 1210.1833, October 2012.
[142] Y. P. Jing. Correcting for the Alias Effect When Measuring the Power Spec-
trum Using a Fast Fourier Transform. ApJ, 620:559–563, February 2005.
[145] F.-S. Kitaura and S. Heß. Cosmological structure formation with augmented
Lagrangian perturbation theory. MNRAS, 435:L78–L82, August 2013.
[146] C. Zhao, F.-S. Kitaura, C.-H. Chuang, F. Prada, G. Yepes, and C. Tao. Halo
mass distribution reconstruction across the cosmic web. MNRAS, 451:4266–
4276, August 2015.
[152] K. Batygin and M. E. Brown. Evidence for a Distant Giant Planet in the Solar
System. AJ, 151:22, February 2016.
[163] M. Crocce and et al. Optimisation of the galaxy sample from Y1-DES data
for BAO analysis. in prep.
[168] A. Cooray and R. Sheth. Halo models of large scale structure. Phys. Rep.,
372:1–129, December 2002.
[171] R. K. Sheth and A. Diaferio. Peculiar velocities of galaxies and clusters. MN-
RAS, 322:901–917, April 2001.
[175] E. Sanchez and et al. Extracting BAO from a photometric galaxy survey:
comparison of methods. in prep.
[176] J. Ross, A. and et al. Optimizing BAO Measurements for photoz surveys:
Application to Dark Energy Survey Galaxy Clustering. in prep.