0% found this document useful (0 votes)
6 views14 pages

Design and Optimization of Neural Networks For Multifidelity Cosmological Emulation

The document presents T2N-MusE, a neural network framework designed for multifidelity cosmological emulation, which significantly improves the efficiency and accuracy of emulators for the matter power spectrum. It introduces a novel architecture, hyperparameter optimization, and training strategies, demonstrating a reduction in validation error by over five times compared to previous models. This framework has been successfully applied to data from the Goku simulation suite and is expected to enhance future emulators for various cosmological statistics.

Uploaded by

cccc36631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

Design and Optimization of Neural Networks For Multifidelity Cosmological Emulation

The document presents T2N-MusE, a neural network framework designed for multifidelity cosmological emulation, which significantly improves the efficiency and accuracy of emulators for the matter power spectrum. It introduces a novel architecture, hyperparameter optimization, and training strategies, demonstrating a reduction in validation error by over five times compared to previous models. This framework has been successfully applied to data from the Goku simulation suite and is expected to enhance future emulators for various cosmological statistics.

Uploaded by

cccc36631
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Design and optimization of neural networks for multifidelity cosmological emulation

Yanhui Yang (杨焱辉)1 ,∗ Simeon Bird1 ,† Ming-Feng Ho (何銘峰)1,2,3 , and Mahdi Qezlou4
1
Department of Physics & Astronomy, University of California,
Riverside, 900 University Ave., Riverside, CA 92521, USA
2
Department of Physics, University of Michigan, 450 Church St, Ann Arbor, MI 48109, USA
3
Leinweber Center for Theoretical Physics, 450 Church St, Ann Arbor, MI 48109, USA and
4
The University of Texas at Austin, 2515 Speedway Boulevard, Stop C1400, Austin, TX 78712, USA
(Dated: July 11, 2025)
Accurate and efficient simulation-based emulators are essential for interpreting cosmological sur-
vey data down to nonlinear scales. Multifidelity emulation techniques reduce simulation costs by
combining high- and low-fidelity data, but traditional regression methods such as Gaussian processes
struggle with scalability in sample size and dimensionality. In this work, we present T2N-MusE, a
neural network framework characterized by (i) a novel 2-step multifidelity architecture, (ii) a 2-stage
arXiv:2507.07184v1 [astro-ph.CO] 9 Jul 2025

Bayesian hyperparameter optimization, (iii) a 2-phase k-fold training strategy, and (iv) a per-z prin-
cipal component analysis strategy. We apply T2N-MusE to selected data from the Goku simulation
suite, covering a 10-dimensional cosmological parameter space, and build emulators for the mat-
ter power spectrum over a range of redshifts with different configurations. We find the emulators
outperform our earlier Gaussian process models significantly and demonstrate that each of these
techniques is efficient in training neural networks or/and effective in improving generalization accu-
racy. We observe a reduction in validation error by more than a factor of five compared to previous
work. This framework has been used to build the most powerful emulator for the matter power
spectrum, GokuNEmu, and will also be used to construct emulators for other statistics in future.

I. INTRODUCTION lations [25], the emulators based on the Mira-Titan Uni-


verse suite [26–30], the E-MANTIS emulator [31], EuclidEm-
Cosmological surveys such as DESI [1], LSST [2], Eu- ulator [32, 33], CSST Emulator [34] and GokuEmu [35]. These

clid [3], the Nancy Grace Roman Space Telescope [4], emulators are able to predict summary statistics within
the China Space Station Telescope (CSST) [5], and the their parameter space with orders of magnitude lower
Prime Focus Spectrograph (PFS) on the Subaru Tele- computational costs than full simulations.
scope [6] will enable precise measurements of the galaxy There are several well-motivated extensions of the
power spectrum, as well as the weak lensing shear field. standard cosmological model which are constrained by
These measurements will be used to constrain cosmologi- current and future surveys. However, including these
cal models motivated by unresolved fundamental physics extensions in emulators is challenging due to the high
questions. dimensionality of the parameter space that necessitates
Interpreting the data and inferring cosmological pa- a large number of computationally expensive samples.
rameters requires making predictions for the matter field Multi-fidelity (MF) techniques have been developed to
or a summary statistic, such as the matter power spec- reduce the computational cost of building emulators,
trum, and using Bayesian methods. A naive inference e.g., MFEmulator [36] and MF-Box [37]. Ref. [35] built
run may require 106 –107 matter power spectrum evalua- GokuEmu [35], an emulator for the matter power spectrum,
tions at different cosmological parameters, which would which expanded the parameter space to 10 dimensions
be computationally expensive. for the first time, taking into account dynamical dark
Emulation replaces intensive numerical computation energy, massive neutrinos, the effective number of ultra-
for every likelihood evaluation by the evaluation of a relativistic neutrinos and the running of the primordial
cheap pre-trained surrogate model. For instance, emu- spectral index. This was achieved by using MF-Box, which
lators have been widely used to replace the Boltzmann combines simulations with different box sizes and parti-
codes in cosmological inference [7–14]. Emulators based cle loads, at a computational cost 94% less than single-
on N -body simulations are needed to interpret observa- fidelity approaches.
tions on nonlinear scales, k ≳ 0.1h/Mpc. There have Despite the success of MF-Box in reducing the computa-
been several such cosmological emulators, e.g., Franken- tional cost of producing the training data (simulations),
Emu [15–17], the emulators of the Aemulus project [18– the regression technique used, Gaussian process (GP) re-
20], NGenHalofit [21], the dark quest emulator [22], BE- gression, still suffers from the curse of dimensionality.
HaPPY [23], the baryonification emulator of the BACCO The computational complexity of GP regression scales
project [24], the emulators built on the Quijote simu- poorly (cubically) with sample size (see Chapter 8 of
Ref. [38] or Chapter 9 of Ref. [39]). This in turn leads
to lengthy prediction and training times, as well as in-
creased memory usage. GP regression struggles to sat-
[email protected] isfy our need for next-generation cosmological emulators,
[email protected] which would ideally become yet more complex, including
2

non-standard dark matter models or baryonic physics.


TABLE I. Specifications and numbers of simulations in the
Neural networks (NNs) have been used in emulators. Goku-W suite.
For example, Ref. [40] built an NN emulator for the
Lyman-α forest 1D flux power spectrum, Ref. [41] con- Simulation Box size Particle Number of
structed an MF emulator for large-scale 21 cm light- fidelity (Mpc/h) load simulations
cone images using generative adversarial networks, and HF 1000 30003 nH = 21
LF 250 7503 nL = 564
Ref. [42] trained models for gravitational waves using
NNs. NNs are suitable for larger data sets, given that
they typically scale linearly or sublinearly with sample
size (see, e.g., Ref. [43]). They are also more efficient in of high-fidelity (HF) cosmologies were chosen from the
inference time and memory usage. In addition, Ref. [44] LF cosmologies so as to optimize the available HF infor-
showed that NN MF regression can outperform GP re- mation. Goku includes two Latin hypercubes that cover
gression in terms of accuracy in some cases and suggested different parameter boxes, Goku-W and Goku-N, with wide
that a high-dimensional parameter space would prevent and narrow ranges of parameters, respectively. For con-
GP regression from being effective. venience of testing the methods, only Goku-W will be used
In this work, we develop the “Triple-2” neural net- in this study. We note that the emulator trained on Goku-
work framework for multifidelity cosmological emulation W exhibit larger generalization errors than that trained
(T2N-MusE), characterized by a “2-step” MF architec- on Goku-N [35], underscoring the need for improved mod-
ture, a “2-stage” hyperparameter optimization process, eling over this broader parameter space. Although there
and a “2-phase” k-fold training strategy. Compared to are two LF nodes, L1 and L2, we only use L2 in this study
Ref. [44], we have made several improvements. We intro- (hereafter, we refer to L2 as LF).1 We summarize the LF
duce a modified “2-step” MF architecture, which turns and HF simulations in Table I. The redshifts considered
out to be more suitable in the context of cosmological are z = 0, 0.2, 0.5, 1, 2 and 3. The matter power spec-
emulation than the original “2-step” architecture. The tra measured from these simulations, along with their
“2-stage” hyperparameter optimization process and “2- cosmologies, are the data we use to train the neural net-
phase” training strategy further improve the emulation works.
performance. In addition, we propose a per-redshift data Specifically, the input of the target model are the 10
compression strategy to further boost the emulator’s ac- cosmological parameters2 , i.e., the input vector x ∈ Rdin ,
curacy. We test the performance of T2N-MusE on selected where din = 10. The output is the matter power spec-
data from the Goku simulation suite [35], demonstrating trum at a series of k modes and redshifts, i.e., the output
the efficacy of these training strategies. vector
We organize this paper as follows. Sec. II intro-
y = [y(z1 , k1 ), y(z1 , k2 ), . . . , y(z1 , knk ),
duces the cosmological simulation data used in this study
(Sec. II A), the MF architectures of the neural networks y(z2 , k1 ), . . . , y(z2 , knk ), (1)
(Sec. II B), the workflow of training the neural networks y(znz , k1 ), . . . , y(znz , knk )],
(Sec. II C), the comparative study we design to evaluate
the performance of different choices of architectures and where y(zi , kj ) = lg P (zi , kj ) is the matter power spec-
strategies for data compression and optimization of NNs trum in log space at redshift zi and wavenumber kj , nz is
(Sec. II D). In Sec. III, we present the results of the com- the number of z bins, and nk is the number of k modes.
parative study, showing the effects of different approaches The output vector y ∈ Rdout , where dout = nz × nk . In
on the emulation performance. Finally, we conclude in our case with Goku-W, we have nz = 6 and nk = 64, hence
Sec. IV. dout = 384.

II. METHODS B. Multifidelity Architectures

A. Simulation Data Ref. [44] proposed a “2-step” architecture for NN


MF regression, which consists of two NNs. A first
NN is trained on the LF data, T L = {(xL,i , yL,i ) :
We briefly recap the Goku simulation suite and the spe-
i = 1, 2, . . . , nL }, to learn the LF function f L , such
cific data we use in this work. This paper focuses on
that yL = f L (x). Then a second NN approximates
the machine learning techniques we have developed for
the correlation between the LF and HF functions, F,
building highly optimized emulation models. For more
details on the simulation suite, please refer to Ref. [35].
Goku is a suite of N -body simulations that covers 10
cosmological parameters, performed using the MP-Gadget 1 L2 corresponds to the k range where emulation error dominates
code [45]. A relatively large number of low-fidelity (LF) total uncertainty, making it more suitable for evaluating improve-
simulations were sampled in the parameter space using a ment from the applied techniques.
2 In practice, they are normalized to [−0.5, 0.5].
sliced Latin hypercube design [46], and a small number
3

such that yH = F(x, yL ), based on the input data configurations found in the first stage. Each evaluation of
(X H , fNN
L
(X H )) = {(xH,i , fNNL
(xH,i )) : i = 1, 2, . . . , nH } the hyperparameters involves both training and valida-
and the available HF output data Y H = yH (X H ) = tion of an NN. Importantly, this pipeline is not limited to
{yH,i : i = 1, 2, . . . , nH }. Note that in our case, the the multifidelity emulation context; it is broadly applica-
HF cosmologies are a subset of the LF cosmologies, so ble to other tasks, including single-fidelity emulation and
L
we can replace fNN (X H ) with yL (X H ), such that the two general regression problems involving high-dimensional
NNs can be trained independently and simultaneously. outputs.
While Ref. [44] restricts the second NN to be a shallow More details of each component of the workflow are
NN with only one hidden layer, we allow multiple hidden given in their dedicated sections. See Sec. II C 1 for data
layers in the second NN to increase the flexibility of the compression, Sec. II C 2 for neural network training, and
model. Sec. II C 3 for hyperparameter optimization.
Figure 1 illustrates the original 2-step architecture
with a simple example of 2D input and 3D output. Note
that the input of N NLH (the NN modeling LF-HF cor- 1. Data compression
relation), is a 5D vector, which is a concatenation of the
LF output and the initial input vector. We use PCA to reduce the dimensionality of the output
We propose a modified 2-step architecture with the data. Two strategies are explored in this work: global
same N NL but a different N NLH , illustrated in Fig. 1. PCA and per-redshift (hereafter, local) PCA. The former
Instead of approximating the correlation between the was adopted in some existing emulators, e.g., EuclidEmula-
LF and HF functions, the new N NLH learns the ra- tor2 [33] and the CSST Emulator [34]. We propose the latter
tio of yH to yL , r with the component ri = yiH /yiL as a new approach to compress the output data, allow-
for i = 1, 2, . . . , dout , as a function of the input vec- ing a more flexible representation of the output data that
tor x, i.e., r = G(x). The training data for N NLH is may be better suited to the case where the redshift evo-
T H = {(xH,i , rH,i ) : i = 1, 2, . . . , nH }, where rH,i = lution of the output is nonlinear or complex.
yH,i ⊘ fNNL
(xH,i ). As before, we replace fNN L
(xH,i ) with In the global PCA approach, we perform PCA on all
L H,i
y (x ). With the trained N NL and N NLH , we can pre- k modes and redshifts together, and then each of the
H L
dict the HF output as yNN = GNN (x) ⊙ fNN (x).3 Note original output components can be expressed as a linear
that, for the matter power spectrum, the ratio is calcu- combination of the principal components (PCs), i.e.,
lated in original space rather than log space.
nX
The modified 2-step model significantly reduces the di- PCA

mensionality of the input of N NLH , which is din + dout in y(zi , kj ; x) = µ(zi , kj ) + al (x)ϕl (zi , kj ), (2)
the original architecture and din in the modified architec- l=1

ture. This is particularly important for high-dimensional


where µ(zi , kj ) is the mean of the output data, al (x) is
output data, such as the matter power spectrum, where
the coefficient of the lth PC (i.e., eigenvector), ϕl (zi , kj )
dout ≫ din .
is the lth PC at redshift zi and wavenumber kj , and nPCA
We will test the performance of both architectures in
is the number of PCs. Then the compressed output can
our comparative study (Sec. II D) and show the results
be expressed as
in Sec. III A.
yc = [a1 , a2 , . . . , anPCA ], (3)
C. Neural Network Workflow reducing the dimensionality of the output from n × m
[see Eq. (1)] to dglob
out = nPCA .
We show a schematic of the training workflow for a In the local PCA, we perform PCA on each redshift
highly optimized fully-connected neural network (FCNN) separately. For redshift zi , we have
in Fig. 2. First, we perform data compression to reduce
the dimensionality of the output using principal compo- niPCA
X
i
nent analysis (PCA). Then we explore the hyperparam- y(zi , kj ; x) = µ (kj ) + ail (x)ϕil (kj ), (4)
eter space of the neural networks through a two-stage l=1
Bayesian optimization process. In the first stage, we
perform a coarse search over a large space of hyperpa- where µi , ail and ϕil are the mean, coefficient and PC at
rameters, and in the second stage, we perform a fine- redshift zi , respectively, and niPCA is the number of PCs
tuning search over a narrower space. The bounds of the for zi . The compressed output vector is then
fine-tuning search are defined around the best-performing (1) (1) (1) (2) (2)
yc = [a1 , a2 , . . . , a (1) , a1 , . . . , a (2) ,...,
nPCA nPCA
(nz ) (nz )
(5)
a1 ,...,a (n )
z
],
nPCA
3 The symbol ⊙ denotes the element-wise multiplication, and ⊘
Pnz
denotes the element-wise division. with dimensionality dloc
out = i=1 niPCA .
4

Step 1: N NL Step 2: N NLH


<latexit sha1_base64="d45znZsI36HmHEK0QpVdsxT0wss=">AAAB9XicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZcFN11IqWAf0I4lk6ZtaJIZkoxShv6HGxeKuPVf3Pk3ZtpZaOuBwOGce7knJ4g408Z1v53c2vrG5lZ+u7Czu7d/UDw8aukwVoQ2SchD1QmwppxJ2jTMcNqJFMUi4LQdTG5Sv/1IlWahvDfTiPoCjyQbMoKNlR7q9X5PYDNWIrmtzfrFklt250CrxMtICTI0+sWv3iAksaDSEI617npuZPwEK8MIp7NCL9Y0wmSCR7RrqcSCaj+Zp56hM6sM0DBU9kmD5urvjQQLracisJNpRL3speJ/Xjc2w2s/YTKKDZVkcWgYc2RClFaABkxRYvjUEkwUs1kRGWOFibFFFWwJ3vKXV0nrouxdlit3lVK1ktWRhxM4hXPw4AqqUIMGNIGAgmd4hTfnyXlx3p2PxWjOyXaO4Q+czx9pipJt</latexit>

<latexit sha1_base64="ftW4DZPAFahncYmVaOpcyN1jR2o=">AAAB9HicbVDLSgMxFL3js9ZX1aWbYBFclRkp6rLgxoWUCvYB7VAyaaYNTTJjkimUod/hxoUibv0Yd/6NmXYW2nogcDjnXu7JCWLOtHHdb2dtfWNza7uwU9zd2z84LB0dt3SUKEKbJOKR6gRYU84kbRpmOO3EimIRcNoOxreZ355QpVkkH800pr7AQ8lCRrCxkl+v93sCm5ES6f2sXyq7FXcOtEq8nJQhR6Nf+uoNIpIIKg3hWOuu58bGT7EyjHA6K/YSTWNMxnhIu5ZKLKj203noGTq3ygCFkbJPGjRXf2+kWGg9FYGdzBLqZS8T//O6iQlv/JTJODFUksWhMOHIRChrAA2YosTwqSWYKGazIjLCChNjeyraErzlL6+S1mXFu6pUH6rlWjWvowCncAYX4ME11OAOGtAEAk/wDK/w5kycF+fd+ViMrjn5zgn8gfP5A9Jwkhs=</latexit>

(a) Original (b) Modified


Input Hidden layers Output
x1
<latexit sha1_base64="afKV9aJhHeoYg4uVrgiaaB3XUu8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqIaAaBZfYMtwIfEgU0igQ2AnG1zO/84hK81jem0mCfkSHkoecUWOlu6e+1y9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fptvdKo5XEU4QRO4Rw8uIQG3EATWsBgCM/wCm+OcF6cd+dj0Vpw8plj+APn8wcJzo2Z</latexit>

y1H
<latexit sha1_base64="8GMRkfFpcNFmjJZsgG/gWt48A7g=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZlS1GXBTZcV7APaacmkmTY0yQxJRilD/8ONC0Xc+i/u/Bsz7Sy09UDgcM693JMTxJxp47rfzsbm1vbObmGvuH9weHRcOjlt6yhRhLZIxCPVDbCmnEnaMsxw2o0VxSLgtBNM7zK/80iVZpF8MLOY+gKPJQsZwcZKg9nQG/QFNhMl0sZ8WCq7FXcBtE68nJQhR3NY+uqPIpIIKg3hWOue58bGT7EyjHA6L/YTTWNMpnhMe5ZKLKj200XqObq0ygiFkbJPGrRQf2+kWGg9E4GdzBLqVS8T//N6iQlv/ZTJODFUkuWhMOHIRCirAI2YosTwmSWYKGazIjLBChNjiyraErzVL6+TdrXiXVdq97VyvZrXUYBzuIAr8OAG6tCAJrSAgIJneIU358l5cd6dj+XohpPvnMEfOJ8/miWSiw==</latexit>

y1L x2
<latexit sha1_base64="HkjDia/DYWYx47Q6vXLU59CEAKc=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDrx+T2AzViK9nQ3KFbfqzoFWiZeTCuRoDspfvWFEEkGlIRxr3fXc2PgpVoYRTmelXqJpjMkEj2jXUokF1X46Tz1DZ1YZojBS9kmD5urvjRQLracisJNZQr3sZeJ/Xjcx4ZWfMhknhkqyOBQmHJkIZRWgIVOUGD61BBPFbFZExlhhYmxRJVuCt/zlVdKuVb2Lav2uXmnU8jqKcAKncA4eXEIDbqAJLSCg4Ble4c15cl6cd+djMVpw8p1j+APn8wegOZKP</latexit>

r1
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>

<latexit sha1_base64="IhCunT2xzWQPSfkaGxQnwyLKW1o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2g9oQ9lsN+3SzSbsToQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1iNOE+xEdKREKRtFKD3rgDcoVt+ouQNaJl5MK5GgOyl/9YczSiCtkkhrT89wE/YxqFEzyWamfGp5QNqEj3rNU0YgbP1ucOiMXVhmSMNa2FJKF+nsio5Ex0yiwnRHFsVn15uJ/Xi/F8MbPhEpS5IotF4WpJBiT+d9kKDRnKKeWUKaFvZWwMdWUoU2nZEPwVl9eJ+1a1buq1u/rlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOk8+K8Ox/L1oKTz5zCHzifPwCqjZM=</latexit>

x1
<latexit sha1_base64="afKV9aJhHeoYg4uVrgiaaB3XUu8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqIaAaBZfYMtwIfEgU0igQ2AnG1zO/84hK81jem0mCfkSHkoecUWOlu6e+1y9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fptvdKo5XEU4QRO4Rw8uIQG3EATWsBgCM/wCm+OcF6cd+dj0Vpw8plj+APn8wcJzo2Z</latexit>

x1
<latexit sha1_base64="afKV9aJhHeoYg4uVrgiaaB3XUu8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqIaAaBZfYMtwIfEgU0igQ2AnG1zO/84hK81jem0mCfkSHkoecUWOlu6e+1y9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fptvdKo5XEU4QRO4Rw8uIQG3EATWsBgCM/wCm+OcF6cd+dj0Vpw8plj+APn8wcJzo2Z</latexit>

y1L
<latexit sha1_base64="HkjDia/DYWYx47Q6vXLU59CEAKc=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDrx+T2AzViK9nQ3KFbfqzoFWiZeTCuRoDspfvWFEEkGlIRxr3fXc2PgpVoYRTmelXqJpjMkEj2jXUokF1X46Tz1DZ1YZojBS9kmD5urvjRQLracisJNZQr3sZeJ/Xjcx4ZWfMhknhkqyOBQmHJkIZRWgIVOUGD61BBPFbFZExlhhYmxRJVuCt/zlVdKuVb2Lav2uXmnU8jqKcAKncA4eXEIDbqAJLSCg4Ble4c15cl6cd+djMVpw8p1j+APn8wegOZKP</latexit>

y2H
<latexit sha1_base64="V8O31qBQhOJIvX9JXolzdJgmyBw=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZlS1GXBTZcV7APaacmkaRuaZIYkowxD/8ONC0Xc+i/u/Bsz7Sy09UDgcM693JMTRJxp47rfzsbm1vbObmGvuH9weHRcOjlt6zBWhLZIyEPVDbCmnEnaMsxw2o0UxSLgtBPM7jK/80iVZqF8MElEfYEnko0ZwcZKg2RYHfQFNlMl0sZ8WCq7FXcBtE68nJQhR3NY+uqPQhILKg3hWOue50bGT7EyjHA6L/ZjTSNMZnhCe5ZKLKj200XqObq0ygiNQ2WfNGih/t5IsdA6EYGdzBLqVS8T//N6sRnf+imTUWyoJMtD45gjE6KsAjRiihLDE0swUcxmRWSKFSbGFlW0JXirX14n7WrFu67U7mvlejWvowDncAFX4MEN1KEBTWgBAQXP8ApvzpPz4rw7H8vRDSffOYM/cD5/AJu0kow=</latexit>

y2L
<latexit sha1_base64="ojCuRe8tzwXSPocIh065uj1ELXA=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDmr9nsBmrER6OxuUK27VnQOtEi8nFcjRHJS/esOIJIJKQzjWuuu5sfFTrAwjnM5KvUTTGJMJHtGupRILqv10nnqGzqwyRGGk7JMGzdXfGykWWk9FYCezhHrZy8T/vG5iwis/ZTJODJVkcShMODIRyipAQ6YoMXxqCSaK2ayIjLHCxNiiSrYEb/nLq6Rdq3oX1fpdvdKo5XUU4QRO4Rw8uIQG3EATWkBAwTO8wpvz5Lw4787HYrTg5DvH8AfO5w+hyJKQ</latexit>

<latexit sha1_base64="vKwz7DbnlHWxCVKEJIGlKWng4/g=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2g9oQ9lsN+3SzSbsToQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1iNOE+xEdKREKRtFKD3pQG5QrbtVdgKwTLycVyNEclL/6w5ilEVfIJDWm57kJ+hnVKJjks1I/NTyhbEJHvGepohE3frY4dUYurDIkYaxtKSQL9fdERiNjplFgOyOKY7PqzcX/vF6K4Y2fCZWkyBVbLgpTSTAm87/JUGjOUE4toUwLeythY6opQ5tOyYbgrb68Ttq1qndVrd/XK41aHkcRzuAcLsGDa2jAHTShBQxG8Ayv8OZI58V5dz6WrQUnnzmFP3A+fwACLo2U</latexit>

r2
x2
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>

x2
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>

y2L
<latexit sha1_base64="ojCuRe8tzwXSPocIh065uj1ELXA=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDmr9nsBmrER6OxuUK27VnQOtEi8nFcjRHJS/esOIJIJKQzjWuuu5sfFTrAwjnM5KvUTTGJMJHtGupRILqv10nnqGzqwyRGGk7JMGzdXfGykWWk9FYCezhHrZy8T/vG5iwis/ZTJODJVkcShMODIRyipAQ6YoMXxqCSaK2ayIjLHCxNiiSrYEb/nLq6Rdq3oX1fpdvdKo5XUU4QRO4Rw8uIQG3EATWkBAwTO8wpvz5Lw4787HYrTg5DvH8AfO5w+hyJKQ</latexit>

y3H
<latexit sha1_base64="9aKi0Q1GwTeLAaJajFgKdw2Nh0A=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCmy4r2Ae0aZlMJ+3QmSTMTJQQ+h9uXCji1n9x5984abPQ1gMDh3Pu5Z45XsSZ0rb9bRU2Nre2d4q7pb39g8Oj8vFJR4WxJLRNQh7KnocV5Sygbc00p71IUiw8Trve7C7zu49UKhYGDzqJqCvwJGA+I1gbaZiMroYDgfVUirQ5H5UrdtVeAK0TJycVyNEalb8G45DEggaacKxU37Ej7aZYakY4nZcGsaIRJjM8oX1DAyyoctNF6jm6MMoY+aE0L9Boof7eSLFQKhGemcwSqlUvE//z+rH2b92UBVGsaUCWh/yYIx2irAI0ZpISzRNDMJHMZEVkiiUm2hRVMiU4q19eJ51a1bmu1u/rlUYtr6MIZ3AOl+DADTSgCS1oAwEJz/AKb9aT9WK9Wx/L0YKV75zCH1ifP51Dko0=</latexit>

y3L
<latexit sha1_base64="ZnzXZ6ls7M1jJd2fc9OoVGqZeY0=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCGxcuKtgHtGmZTCft0MkkzEyUEPofblwo4tZ/ceffOGmz0NYDA4dz7uWeOV7EmdK2/W0V1tY3NreK26Wd3b39g/LhUVuFsSS0RUIeyq6HFeVM0JZmmtNuJCkOPE473vQm8zuPVCoWigedRNQN8FgwnxGsjTRIhheDfoD1RAbp3WxYrthVew60SpycVCBHc1j+6o9CEgdUaMKxUj3HjrSbYqkZ4XRW6seKRphM8Zj2DBU4oMpN56ln6MwoI+SH0jyh0Vz9vZHiQKkk8MxkllAte5n4n9eLtX/tpkxEsaaCLA75MUc6RFkFaMQkJZonhmAimcmKyARLTLQpqmRKcJa/vEratapzWa3f1yuNWl5HEU7gFM7BgStowC00oQUEJDzDK7xZT9aL9W59LEYLVr5zDH9gff4Ao1eSkQ==</latexit>

<latexit sha1_base64="ixnTP8mUDOcvFSeF/yQrFZ6phJg=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkt6rHgxWNF+wFtKJvtpF262YTdjVBCf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ThVDJssFrHqBFSj4BKbhhuBnUQhjQKB7WB8O/PbT6g0j+WjmSToR3QoecgZNVZ6UP3LfqnsVtw5yCrxclKGHI1+6as3iFkaoTRMUK27npsYP6PKcCZwWuylGhPKxnSIXUsljVD72fzUKTm3yoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPEzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2naEPwll9eJa1qxbuq1O5r5Xo1j6MAp3AGF+DBNdThDhrQBAZDeIZXeHOE8+K8Ox+L1jUnnzmBP3A+fwADso2V</latexit>

r3
y3L
<latexit sha1_base64="ZnzXZ6ls7M1jJd2fc9OoVGqZeY0=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCGxcuKtgHtGmZTCft0MkkzEyUEPofblwo4tZ/ceffOGmz0NYDA4dz7uWeOV7EmdK2/W0V1tY3NreK26Wd3b39g/LhUVuFsSS0RUIeyq6HFeVM0JZmmtNuJCkOPE473vQm8zuPVCoWigedRNQN8FgwnxGsjTRIhheDfoD1RAbp3WxYrthVew60SpycVCBHc1j+6o9CEgdUaMKxUj3HjrSbYqkZ4XRW6seKRphM8Zj2DBU4oMpN56ln6MwoI+SH0jyh0Vz9vZHiQKkk8MxkllAte5n4n9eLtX/tpkxEsaaCLA75MUc6RFkFaMQkJZonhmAimcmKyARLTLQpqmRKcJa/vEratapzWa3f1yuNWl5HEU7gFM7BgStowC00oQUEJDzDK7xZT9aL9W59LEYLVr5zDH9gff4Ao1eSkQ==</latexit>

yiH = ri yiL
<latexit sha1_base64="3TfcrNRJZnQwOCfbFeYxPwLyqZQ=">AAACCnicbVDLSsNAFJ34rPUVdelmtAiuSlKKuhEKbrpwUcE+oI1hMp20QyeTMDMRQsjajb/ixoUibv0Cd/6NkzaL2nrgwuGce7n3Hi9iVCrL+jFWVtfWNzZLW+Xtnd29ffPgsCPDWGDSxiELRc9DkjDKSVtRxUgvEgQFHiNdb3KT+91HIiQN+b1KIuIEaMSpTzFSWnLNk8SlD4MAqbEI0mZ2LVwK56XbzDUrVtWaAi4TuyAVUKDlmt+DYYjjgHCFGZKyb1uRclIkFMWMZOVBLEmE8ASNSF9TjgIinXT6SgbPtDKEfih0cQWn6vxEigIpk8DTnfmFctHLxf+8fqz8KyelPIoV4Xi2yI8ZVCHMc4FDKghWLNEEYUH1rRCPkUBY6fTKOgR78eVl0qlV7Ytq/a5eadSKOErgGJyCc2CDS9AATdACbYDBE3gBb+DdeDZejQ/jc9a6YhQzR+APjK9fBM+bEQ==</latexit>

FIG. 1. Examples of the original and modified 2-step MF NN architectures. Both architectures have the same N NL (step 1:
the LF NN) but different N NLH (step 2: the NN used to correct the LF output). The original N NLH (a) approximates the
correlation between the LF and HF functions, with (x, yL ) as input and yH as output. The modified N NLH (b) learns the
mapping from x to the ratio of yH to yL , r = G(x), and the final HF output is the element-wise product of the LF output with
the correction ratio r.

Data curacy across a range of variance thresholds prior to fi-


nalizing the compression scheme.
PCA is applied to the output for each NN prior to
CompressionI training, i.e., to both N NL and N NLH in the 2-step archi-
tecture (original and modified versions). We implement
Compressed data PCA using the scikit-learn library [47].

2. Neural network training


Hyperparameter optimizationII
• Initial search Here we describe how we train an NN, assuming the
• Fine-tuning architecture of the NN is given (i.e., with the number
(TrainingIII and validationIV)
of layers and layer widths are pre-defined). Training an
NN is essentially a process of minimizing the discrepancy
between the predicted output and the true output, which
is usually done by minimizing a loss function through
Model
iteratively updating the weights (W) and biases (b) of
the NN. PyTorch [48] is used to implement the NNs in
Techniques: I. PCA II. Bayesian optimization this work.
III. Fully-connected NN IV. k-fold cross validation
Suppose we have a training data set T = {(xi , yi ) : i =
1, 2, . . . , Ntrain }, where Ntrain is the number of training
FIG. 2. Overview of the workflow of training a highly opti- samples, xi is the ith input, and yi is the corresponding
mized NN. The workflow consists of three main steps: data output, and a separate validation set Tval = {(xival , yvali
):
compression, hyperparameter optimization, and training the
i = 1, 2, . . . , Nval }, where Nval is the number of validation
final model. When optimizing the hyperparameters, a large
space is explored in the initial search stage, and then a smaller samples. A good model should not only fit the training
space is searched in the second stage (fine-tuning). Each eval- data well but also generalize well to unseen data. To
uation of the hyperparameters involves both training and val- achieve this, a loss function that can prevent a model
idation of an NN. from being too complex is needed. In this work, we em-
ploy a regularized loss function of the form

Following Ref. [33], we determine the number of PCs L(W, b) = Ltrain (W, b) + λ||W||22 , (6)
based on the cumulative variance they explain. Specifi-
cally, we select the smallest value of nPCA (or niPCA for where Ltrain is the training loss, measuring the distance
local PCA) such that the remaining unexplained variance between the predicted output and the training output
is < 10−5 . While we do not investigate how emulator per- data, and the second term is the regularization term,
formance varies with this threshold in the present work, which penalizes large weights to prevent overfitting. The
we note that the optimal choice is likely data-dependent. regularization parameter λ is a hyperparameter that con-
As such, it is generally advisable to assess emulator ac- trols the strength of the regularization. We use the mean
5

squared error (MSE) as the training loss, i.e.,


TABLE II. Ranges of hyperparameters used in the first stage
NX
train of the hyperparameter optimization process. The prior for M
1 is uniform over integers from 16 to 512 in steps of 16.a
Ltrain (W, b) = ||fNN (xi ; W, b) − yi ||22 , (7)
Ntrain i=1 Hyperparameter Prior
i
where fNN (x ; W, b) is the predicted output of the NN L U({1, 2, 3, 4, 5, 6, 7})
with weights W and biases b at xi . Note that the loss M U({16, 32, 48, . . . , 512})
λ LU(10−9 , 5 × 10−6 )
is computed using PCA coefficients instead of the raw
output itself. The loss function is minimized using the a U ({}) denotes the discrete uniform distribution, and LU
AdamW optimizer [49].4 The activation function used in denotes the log-uniform distribution.
the hidden layers is the SiLU function, which is a special
case of the Swish function [51] with β = 1:
x the first stage, we use a uniform prior for L and M , and
SiLU(x) = x · σ(x) = , (8) a log-uniform prior for λ. The ranges of the hyperpa-
1 + e−x
rameters are chosen to be wide enough to cover a large
where σ(x) refers to the sigmoid function. space of hyperparameters. In the second stage, L is fixed
We define the validation loss, Lval , in a similar way as to the best value found in the first stage, since a differ-
the training loss, but replacing the training data with the ent L will lead to a significantly different NN that is un-
validation data in Eq. (7). likely to result in a better performance.6 The prior for M
A dynamically decreasing learning rate (LR) schedule follows U({M1 − 16 + 2q : q = 0, 1, . . . , 16}), where M1 is
is implemented to stabilize the training process. An ini- the best value found in the first stage. This defines a uni-
tial LR is set and decreased if L + Lval does not improve form prior over 25 integers centered at M1 with a step
for a certain number of epochs (patience). The schedule size of 2. The prior for λ is defined as LU(λ1 /2, 2λ1 ),
parameters, including the initial LR and patience, can be where λ1 is the best value found in the first stage.
adjusted for different training runs. Evaluating a point in the hyperparameter space in-
As a function of a large number of variables (the volves training and validating the NN with the given
weights and biases), L can be very complex and have hyperparameters. Notice that Goku-W does not have a
many local minima. The optimizer may converge to sub- separate validation set of HF data, so we will use leave-
optimal solutions (bad local minima). To mitigate this, one-out cross-validation (LOOCV) to evaluate the per-
we perform multiple training runs with different random formance of the emulator. LOOCV is a special case of
seeds for initialization, and the best model with the low- k-fold cross-validation [53] with k = Ntrain . In the next
est loss is retained. section, we detail how a given set of hyperparameters is
evaluated with k-fold training and validation, for N NL
and N NLH , respectively. We have confirmed that the
3. Hyperparameter optimization
LOOCV performance is consistent with the performance
on a separate test set in Appendix A, where we trained
Our objective is to identify the optimal set of hyperpa- an emulator based on the Goku-pre-N simulations [35] and
rameters for the NN that minimizes the combined train- tested it on the available test set.
ing and validation losses, thereby balancing underfitting
and overfitting. The hyperparameters subject to opti-
mization include the number of hidden layers L, the num-
ber of neurons per layer M (assumed uniform across lay- 4. k-fold training and validation
ers), and the regularization parameter λ.
We perform Bayesian optimization implemented with k-fold cross-validation is a technique to estimate the
Hyperopt [52] in two stages. In the first stage, we performance of a model by splitting the training data into
perform a coarse search over a large space of hyperpa- k subsets (folds). The model is trained on k − 1 folds and
rameters, and in the second stage, we perform a refined validated on the remaining fold. This process is repeated
search over a smaller region. The initial hyperparame- k times, with each fold being used as the validation set
ter ranges used in this work5 are given in Table II. For once. The final performance is obtained by averaging the
performance over all k folds. The quantity we minimize
in the hyperparameter optimization process (Sec. II C 3)
4 The AdamW optimizer is a variant of the Adam optimizer [50] is the mean of the training and validation losses, i.e., a
that decouples weight decay from the optimization process. How-
ever, we use explicit L2 regularization instead of weight decay in
this work, so the AdamW optimizer behaves like the Adam op-
timizer.
5 We chose the prior range empirically, picking values so that the 6 We empirically confirm that none of the best hyperparameter
optimal hyperparameters were not at the boundaries. The ranges sets found in the first stage led to a better performance when L
can be data dependent and may be adjusted for specific problems. was changed in our tests.
6

Phase 1: searching for a good local minimum Likewise, for N NL , the data set has nL samples, and
(with separate training and validation data) we split the data into k = nL folds. However, only the
nH HF cosmologies should be tested on for our purpose,
1 2 3 4 5 6 7 8 9 i.e., we only need to iterate over the nH folds that leave
the HF cosmologies out in training. We can take advan-
tage of this feature and use a 2-phase training strategy,
3 random seeds where a good local minimum is found in the first phase
and then used as the common initial model for the sec-
Model A Model B Model C ond phase of training for each fold. This way is much
more efficient than regular methods (such as what we do
for N NL ), since there will be no need to search for good
Choose the best minima for every fold by trying different random seeds
Training independently. Specifically, in the first phase, the LF
Model C
cosmology-only data  (samples with HF cosmologies are
Test L
Initialize excluded), T1,train = (xi , yi ) : 1 ≤ i ≤ nL , xi ∈
/ X H , is
used as the training set, and the LF data with the HF
Phase 2: regular k-fold training and validation L
= T L \ T1,train
L
<latexit sha1_base64="/n8nEJKBtYAVuxMaizwscki5HXs=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzUnAxKZbfiLkDWiZeTMuRoDEpf/WHM0gilYYJq3fPcxPgZVYYzgbNiP9WYUDahI+xZKmmE2s8Wh87IpVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGtn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyvedaXWrJXr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwB0KOM6A==</latexit>

cosmologies, T1,val , is used as the val-


(with k = 9, ktest = 3 )
<latexit sha1_base64="TBXz7hXZfesxSBzDy5jp5ESr7eE=">AAACAHicbVDLSsNAFJ3UV62vqgsXbgaL4EJKUouPRaHgxmUF+4AmlMl02g6dScLMjVBCNv6KGxeKuPUz3Pk3TtoutPXAhcM593LvPX4kuAbb/rZyK6tr6xv5zcLW9s7uXnH/oKXDWFHWpKEIVccnmgkesCZwEKwTKUakL1jbH99mfvuRKc3D4AEmEfMkGQZ8wCkBI/WKR+PazbmLxz1XEhgpmQDTkNYuesWSXbanwMvEmZMSmqPRK365/ZDGkgVABdG669gReAlRwKlgacGNNYsIHZMh6xoaEMm0l0wfSPGpUfp4ECpTAeCp+nsiIVLrifRNZ3amXvQy8T+vG8Pg2kt4EMXAAjpbNIgFhhBnaeA+V4yCmBhCqOLmVkxHRBEKJrOCCcFZfHmZtCpl57Jcva+W6pV5HHl0jE7QGXLQFaqjO9RATURRip7RK3qznqwX6936mLXmrPnMIfoD6/MH3b6V6g==</latexit>

idation set. We train the NN with nLseed = 15 ran-


dom seeds for initialization in the first phase. The best
1 2 3 4 5 6 7 8 9 model found in the first phase is then used to set the
initial state of the NN for the second phase of training.
1 2 3 4 5 6 7 8 9 We illustrate the process of the 2-phase training using a
simple example in Fig. 3. A bonus of this approach is
that the second phase will only take a small number of
1 2 3 4 5 6 7 8 9 epochs to converge, and is thus quite efficient in com-
putation.7 The second phase is the target k-fold train-
ing and validation, i.e., for each fold, the training set is
L,j
= (xi , yi ) : 1 ≤ i ≤ nL , i ̸= j, xj ∈ X H , where j

T2,train
is the index of the fold (also the HF cosmology), and the
Fold models L,j
validation set is T2,val = (x , y(j) ) (i.e., the point left
 (j)

out).
FIG. 3. Illustration of the two-phase k-fold training and cross- For the best-performing set of hyperparameters, we ini-
validation strategy for N NL , assuming a total of 9 samples, tialize the final model training using the fold model with
of which 3 (orange circles) are supposed to be tested against
the median regularized loss. To prevent overfitting dur-
(i.e., the HF cosmologies). In phase 1, the model is trained on
the remaining 6 samples (blue circles) using 3 separate runs ing this final training step, we impose a lower bound on
with different random seeds, and validated on the 3 held-out the training loss: specifically, the final training loss is not
test samples. In phase 2, we perform regular k-fold training allowed to fall below 80% of the median training loss ob-
and validation, with the initial model (weights and biases) set served across the folds. This threshold has proven effec-
to the best model found in phase 1. tive in practice, as we have verified that the final model’s
performance remains consistent with the LOOCV results
(see Appendix A). Nevertheless, a more comprehensive
combined loss as a function of the hyperparameters: evaluation of this thresholding strategy could be pursued
in future work.
1 X
k The 2-phase training strategy ensures that all the fold
Φ(L, M, λ) = [Φtrain,i (L, M, λ) + Φval,i (L, M, λ)] , models fall into the same local minimum, and the valida-
2k i=1 tion error should be a better representative of the gener-
(9) alization error for the final model (also in the same local
where Φtrain (L, M, λ) = minW,b Ltrain (L, M, λ; W, b) minimum) trained on the full LF data set compared to
(the minimum training loss from Eq. (7)) and i is the in- regular k-fold validation. Note that the HF cosmologies
dex of the fold. Φval,i (L, M, λ) is defined in a similar way must be excluded from the training set in the first phase,
as Φtrain,i (L, M, λ) but with the training loss replaced by though that phase is just for the sake of local minima
the validation loss. searching instead of final validation. Because otherwise,
For N NLH , the data set has nH samples, and we split
the data into k = nH folds. In each iteration, we use
nH − 1 samples for training and 1 sample for validation.
In addition, nLHseed = 5 random seeds are used to initialize
7 In practice, we also set the initial learning rate in the second
the weights and biases of the NN for each fold training phase to be equal to the final learning rate from the first phase
to avoid bad local minima. to avoid jumping to other local minima.
7

the model for initialization would have memorized the discuss the impact of each technique at the level of the
data we are supposed to test on, and validation in the component NNs (N NL and N NLH ).
second phase would be invalid. This is also the reason
why we cannot use the 2-phase strategy for N NLH (no
data available other than the test points) but have to try III. RESULTS
multiple random seeds for each fold training.
We present the results of the comparative study in this
section. The models trained with different approaches
D. Comparative Study Design are evaluated using LOOCV, with the validation error de-
fined as the relative mean absolute error (rMAE) of the
The techniques evaluated in this work are summarized predicted power spectrum compared to the true power
in Table III. To assess the effectiveness of each technique, spectrum, denoted as ΦrMAE . For clarity, each model is
we design a comparative study with a series of different identified by the name of the approach used in its con-
approaches for emulator construction. These approaches struction (e.g., the model trained with the Base approach
are distinct combinations of the techniques we mentioned is referred to as Base).
above. The configurations for each approach are defined Figure 4 shows the validation errors as functions of k
in Table IV. and z for Base, Mid, and Optimal. We found that even
Mid serves as the reference approach, which uses the the basic model, Base, achieves a validation error sig-
modified 2-step architecture, separate PCA for each red- nificantly lower than GokuEmu’s 3% error (see Fig. 13 of
shift, but does not include hyperparameter fine-tuning Ref. [35]). This suggests that NNs may be better-suited
and 2-phase training of N NL . Base is the most basic for emulation tasks involving large training sets and high-
approach, with the original 2-step architecture, global dimensional parameter spaces than GPs. The improve-
PCA, and no additional optimization strategies. The ment may be accounted for by more efficient training
most advanced approach, Optimal, incorporates all en- that allows more intensive hyperparameter optimization,
hanced techniques. The remaining approaches differ from though PCA could have also contributed to the perfor-
Mid by altering only one component, allowing us to iso- mance improvement. Compared to Base, Mid achieves
late the contribution of each technique. For example, a significant improvement in accuracy, with an overall
Arch-0 uses the original 2-step architecture but keeps the validation error of 1.03% (compared to 1.73% for Base),
other techniques the same as Mid. HO-2 uses 2-stage hy- attributed to the modified 2-step architecture and the lo-
perparameter optimization without changing other com- cal PCA strategy.9 The improvement is observed across
ponents. all redshifts and wavenumbers, though the worst-case er-
By comparing HO-2 and Mid, we will see the effect of ror is still much higher than the average. The validation
hyperparameter fine-tuning. However, it is not a strictly error of Optimal is less than 1% for all redshifts and al-
fair comparison, since they would have significant differ- most all wavenumbers, with an overall mean of 0.62%,
ences in compute time. To make a fair comparison, we which is a further improvement over Mid resulting from
define HO-3, which uses the same 1-stage hyperparameter the changes in hyperparameter optimization and training
optimization as Mid but with a larger number of trials, of N NL . Not only is the overall validation error reduced,
ntrial = 120, ensuring that the total compute time is sim- but the worst-case error is also considerably lower than
ilar to HO-2. Similarly, while comparing NNL-1 and Mid that of Mid (a reduction by a factor of 5 will be seen in
will show the effect of 2-phase training of N NL , we also Fig. 5).
define NNL-0+ which uses the same 1-phase training as A summary comparison of the LOO errors of the emu-
Mid (which does not try multiple seeds) but with a larger lators built with different approaches is shown in Fig. 5,
number of random seeds, nseed = 3, leading to a similar with both the overall mean error and the worst-case error
compute time as NNL-1. Although we do not present a shown. The error of Mid is lower than that of Arch-0 and
detailed quantitative comparison of compute times across PCA-0, indicating both the modified 2-step architecture
all approaches, we note that training each emulator re- and the local PCA strategy are effective in improving
quires less than 24 hours on a single Grace-Hopper node the performance of the emulator, while the 2-step ar-
of the Vista supercomputer8 . This cost is negligible rel- chitecture leads to a larger improvement than the local
ative to the computational expense of running the sim- PCA data compression strategy. While the aforemen-
ulations themselves in the context of simulation-based tioned techniques improve the overall mean error, the
emulation. worst-case error is still high. A substantial reduction in
The results of the comparative study will be shown the worst-case error is seen when implementing a 2-phase
in Sec. III, where we will compare the performance of training strategy, NNL-1, which allows a large number
the emulators built with different approaches and also of local minima to be explored efficiently. NNL-0+ also

8 https://tacc.utexas.edu/systems/vista/ 9 Similar compute times were used to train these two models.
8

TABLE III. Techniques considered in this work. The numbers 0, 1, and 0+ (if applicable) refer to the choices of the strategies,
e.g., choice 0 represents the original 2-step model for the MF NN architecture. ntrial is the number of trials in the coarse search
stage of the hyperparameter optimization process, and ntune trial is the number of trials in the fine-tuning stage. “1-stage” means
no fine-tuning. For the training of N NL , “1-phase” refers to regular k-fold training and validation (nseed = 1 by default).
Choice MF NN architecture PCA Hyperparameter optimization Training of N NL
0 Original 2-step Global (all-z) 1-stage with ntrial = 80 1-phase
1 Modified 2-step Local (per-z) 2-stage with ntrial = 80 and ntune
trial = 40 2-phase
0+ 1-phase with nseed = 3

Base (1.73%) Mid (1.03%) Optimal (0.62%)


10
z = 0.0
8 z = 0.2
z = 0.5
z = 1.0
Φ rMAE (%)

6
z = 2.0
4 z = 3.0
1%
2

0
10 1 100 101 10 1 100 101 10 1 100 101
−1 −1 −1
k (h Mpc ) k (h Mpc ) k (h Mpc )

FIG. 4. LOO errors of the emulators built with approaches Base, Mid, and Optimal. Redshifts are color coded. The solid lines
are the error averaged over cosmologies, and the corresponding shaded regions indicate the range of individual cosmologies.
The gray-shaded area marks the region where the error is less than 1%. Each model is titled with the name of the approach
and its overall validation error.

8 TABLE IV. Approaches tested in this work. MFA, PCA,


Overall mean HO, and N NL are short for the column names in Table III.
7 Worst case The numbers 0, 1 and 0+ refer to the techniques described in
6 Table III for each column.
Approach MFA PCA HO N NL
5
Φ rMAE (%)

Base 0 0 0 0
Arch-0 0 1 0 0
4 PCA-0 1 0 0 0
3 Mid 1 1 0 0
NNL-1 1 1 0 1
2 NNL-0+ 1 1 0 0+
Optimal 1 1 1 1
1
0
shows an improvement over Mid, but the improvement
se

l
d

0+
0

1
ma
Mi
-

A-

L-

is not as significant as that of NNL-1, despite consuming


ch
Ba

L-
PC

NN

ti
NN
Ar

Op

a similar amount of compute time (essentially because


NNL-0+ tried less random seeds than NNL-1). Optimal
achieves slightly lower errors than NNL-1, attributed to
FIG. 5. Summary comparison of the LOO errors of the emu-
lators built with different approaches. Black crosses indicate hyperparameter fine-tuning.10
the mean validation error of each approach, while gray crosses In the following subsections, we present the effects of
show the worst-case errors (the maximal error over all test
points).
10 We have checked that simply increasing the number of trials (i.e.,
ntrial > 80) did not improve the performance, suggesting that it
9

3.0 3.0
Arch-0 PCA-0
2.5 (1.03%) 2.5 (1.15%)
Mid Mid
2.0 (0.33%) 2.0 (0.97%)

Φ LrMAE (%)
rMAE (%)

z = 0.0 z = 0.0
1.5 1.5 z = 1.0
z = 1.0
Φ LH

1.0 z = 3.0 1.0 z = 3.0

0.5 0.5

0.0 0.0
1 10 1 100 101
10 100 101 3.0
−1
k (h Mpc ) k (h Mpc −1 ) PCA-0
2.5 (0.38%)
Mid
FIG. 6. Comparison of the LF-to-HF correction NNs of the 2.0 (0.33%)

rMAE (%)
original (Arch-0) and the modified (Mid) 2-step architectures.
Blue lines are LOO errors of Arch-0’s N NLH , while orange 1.5

Φ LH
lines are Mid’s. The solid, dashed, and dotted lines correspond
1.0
to z = 0, 1, and 3, respectively. The overall mean errors
averaged over 6 redshifts are shown in the legends. 0.5
0.0
10 1 100 101
each technique in more detail, by comparing the perfor-
−1
mance of the component NNs (N NL and N NLH ) trained k (h Mpc )
with different approaches. Figures 6, 7 and 8 show the
rMAE of the component NNs, defined as the LOO error FIG. 7. Comparison of the two data compression strategies:
of the predicted power spectrum compared to the true global PCA (PCA-0, in blue) and separate PCA for each red-
power spectrum. This ensures that N NL and N NLH are shift (Mid, in orange). The top and bottom panels show the
evaluated separately and independently. Specifically, in LOO errors for N NL and N NLH , respectively. The solid,
Figure 6 and the lower panel of Figure 7, the component dashed, and dotted lines correspond to z = 0, 1, and 3, re-
shown is N NLH . Thus the input is the test cosmology spectively. The overall mean errors averaged over 6 redshifts
and the true LF power spectrum, instead of the LF power are shown in the legends.
spectrum predicted by N NL . In the upper panel of Fig-
ure 7 and Figure 8, the component shown is N NL . In this
case, both the predicted power spectrum and the true
power spectrum that it is tested against are LF power the information of the data is not fully exploited. The
spectra. improvement is likely due to the aforementioned signifi-
cantly reduced complexity of the NN (Sec. II B) relative
to the original architecture [44].
A. Architecture: 2-Step vs. Modified 2-Step

Figure 6 compares the LF-to-HF correction NNs of the


original and modified 2-step architectures. The valida-
tion error of Mid is shown to be significantly lower than B. Data Compression: Global vs. Local (PCA)
that of Arch-0 across redshifts and scales, with the aver-
age error reduced by a factor of ∼ 3. In addition, Mid’s er-
ror decreases with increasing redshift (especially at small From Fig. 7, we observe that the local PCA strategy
scales), which is consistent with our expectation that it (Mid) outperforms the global PCA strategy (PCA-0) for
is easier to learn the LF-to-HF correction at higher red- both N NL and N NLH , which is likely because the global
shifts, where the spectrum is more linear and less affected PCA is not as flexible as the local PCA in capturing
by nonlinear effects. In contrast, Arch-0 has a moder- redshift-dependent features of the spectrum. In partic-
ately larger error at z = 3 than at z = 0 and 1. This sug- ular, the improvement is more pronounced at z = 0 in
gests that the original architecture struggles to learn the both NNs, where the spectrum is more nonlinear.
correlation between the LF and HF power spectra and
We also note that ΦLrMAE is larger than ΦLH
rMAE in both
cases, indicating the uncertainty of the interpolation of
the LF power spectrum in the parameter space domi-
is the hyperparameter fine-tuning which is responsible for the nates the overall error of the emulator, consistent with
(small) improvement in performance. the findings of Ref. [35].
10

2.5 In this work, we build emulators based on the non-


Mid linear matter power spectra from the Goku simulations
2.0 (0.97%) suite [35] using different combinations of the various tech-
NNL-1 niques for a comparative study. The results show that
(0.55%)
Φ LrMAE (%)

1.5 all the techniques we proposed are effective in improving


NNL-0+
(0.74%) the performance of the emulator, although the effect of
1.0 z = 0.0 hyperparameter fine-tuning is modest. The novel 2-step
z = 1.0 MF architecture reduces the complexity of the LF-to-HF
0.5 z = 3.0 correction NN, decreasing the error by a factor of ∼ 3.
The per-z PCA strategy allows NNs to learn the redshift-
0.0 dependent features of the statistics of interest more ac-
10 1 100 101 curately, with accuracy improved by more than 10% in
k (h Mpc −1 ) both the LF NN and the LF-to-HF correction NN com-
pared to the global PCA strategy. The 2-stage hyperpa-
rameter optimization strategy moderately improves the
FIG. 8. Comparison of the training strategies for N NL . Mid
(blue) and NNL-0+ (green) use regular training, while NNL-1 performance of the emulator by fine-tuning the hyperpa-
(orange) uses the 2-phase training strategy. NNL-0+ tried more rameters in a smaller space after a coarse search. The
random seeds than Mid to match the compute time of NNL-1. 2-phase training strategy for the LF NN efficiently finds
Redshifts z = 0, 1, and 3 are coded with solid, dashed, and a common local minimum for k-fold (or LOO) training
dotted lines, respectively. The overall LOO errors averaged and validation and substantially improves the worst-case
over 6 redshifts are shown in the legends. error.
T2N-MusE realizes highly efficient training of NNs on
large data sets with high-dimensional parameter spaces
C. Training of the LF NN: Regular vs. 2-Phase that traditional GP-based methods struggle with. This
demonstrates the effectiveness of T2N-MuSE not only as a
Fig. 8 compares the LF NNs trained with different high-accuracy optimization scheme in its own right, but
strategies. Compared to Mid, NNL-1 reduces the overall also as a general tool for upgrading existing emulators to
error significantly from 0.97% to 0.55%, improving per- higher performance or expanding their parameter space,
formance across all redshifts and scales. When we simply all at significantly reduced computational costs. We have
increased the number of random seeds over Mid (NNL-0+), rebuilt a production emulator for the matter power spec-
the worst-case error was about midway between Mid and trum with T2N-MusE based on Goku, named GokuNEmu.
GokuNEmu is the highest performing in existence, in terms
NNL-1, despite a similar compute time. Regular training
with more random seeds, e.g., 15 distinct seeds, might of error, dimensionality, parameter coverage and infer-
allow a performance similar to NNL-1, but it would take ence speed, and is presented in Ref. [54]. We will also
much longer to train the NN and the final model initial- apply this framework to build emulators for other sum-
ized by one of the fold models might not generalize as well mary statistics, such as the Lyman-α forest flux power
as the model trained with the 2-phase strategy–the fold spectrum [55] in future work. The code of T2N-MusE is
models might have fallen into different local minima, and publicly available at https://github.com/astro-YYH/
the chosen model is not guaranteed to be the one with T2N-MusE for the community to use and extend.
the best generalization performance.
ACKNOWLEDGMENTS

IV. CONCLUSION
YY and SB acknowledge funding from NASA ATP
80NSSC22K1897. MFH is supported by the Leinweber
We have developed T2N-MusE, a multifidelity neural net- Foundation and DOE grant DE-SC0019193. Computing
work framework for cosmological emulation, which is ca- resources were provided by Frontera LRAC AST21005.
pable of building highly optimized regression models to The authors acknowledge the Frontera and Vista com-
predict summary statistics. This framework is character- puting projects at the Texas Advanced Computing Cen-
ized by a novel 2-step architecture, per-z PCA for data ter (TACC, http://www.tacc.utexas.edu) for provid-
compression, 2-stage hyperparameter optimization, and ing HPC and storage resources that have contributed to
a 2-phase training strategy for the low-fidelity regression the research results reported within this paper. Frontera
model. This NN approach improves on our earlier GP and Vista are made possible by National Science Foun-
approach by a factor of more than 5 on the same data.11 dation award OAC-1818253.

11 The training of GokuEmu used both the L1 and L2 nodes, while estimate.
we only use L2 in this work. So a factor of 5 is a very conservative
11

[1] DESI Collaboration, A. Aghamousa, and J. Aguilar et ulation, The Open Journal of Astrophysics 7, 10 (2024),
al., The DESI Experiment Part I: Science,Targeting, and arXiv:2307.14339 [astro-ph.CO].
Survey Design, arXiv e-prints , arXiv:1611.00036 (2016), [14] M. Bonici, G. D’Amico, J. Bel, and C. Carbone, Ef-
arXiv:1611.00036 [astro-ph.IM]. fort: a fast and differentiable emulator for the Ef-
[2] P. A. Abell et al., LSST Science Book, Version 2.0 fective Field Theory of the Large Scale Structure of
(arXiv, 2009) arXiv:0912.0201 [astro-ph.IM]. the Universe, arXiv e-prints , arXiv:2501.04639 (2025),
[3] R. Laureijs et al., Euclid Definition Study Report, arXiv arXiv:2501.04639 [astro-ph.CO].
e-prints , arXiv:1110.3193 (2011), arXiv:1110.3193 [astro- [15] K. Heitmann, D. Higdon, M. White, S. Habib, B. J.
ph.CO]. Williams, E. Lawrence, and C. Wagner, THE COY-
[4] R. Akeson et al., The Wide Field Infrared Survey OTE UNIVERSE. II. COSMOLOGICAL MODELS
Telescope: 100 Hubbles for the 2020s, arXiv e-prints AND PRECISION EMULATION OF THE NONLIN-
, arXiv:1902.05569 (2019), arXiv:1902.05569 [astro- EAR MATTER POWER SPECTRUM, The Astrophys-
ph.IM]. ical Journal 705, 156 (2009).
[5] Y. Gong, X. Liu, Y. Cao, X. Chen, Z. Fan, R. Li, X.- [16] K. Heitmann, M. White, C. Wagner, S. Habib, and
D. Li, Z. Li, X. Zhang, and H. Zhan, Cosmology from D. Higdon, THE COYOTE UNIVERSE. I. PRECISION
the Chinese Space Station Optical Survey (CSS-OS), DETERMINATION OF THE NONLINEAR MATTER
Astrophys. J. 883, 203 (2019), arXiv:1901.04634 [astro- POWER SPECTRUM, The Astrophysical Journal 715,
ph.CO]. 104 (2010).
[6] M. Takada, R. S. Ellis, M. Chiba, J. E. Greene, H. Ai- [17] K. Heitmann, E. Lawrence, J. Kwan, S. Habib,
hara, N. Arimoto, K. Bundy, J. Cohen, O. Doré, and D. Higdon, THE COYOTE UNIVERSE EX-
G. Graves, J. E. Gunn, T. Heckman, C. M. Hirata, TENDED: PRECISION EMULATION OF THE MAT-
P. Ho, J.-P. Kneib, O. Le Fèvre, L. Lin, S. More, H. Mu- TER POWER SPECTRUM, The Astrophysical Journal
rayama, T. Nagao, M. Ouchi, M. Seiffert, J. D. Silver- 780, 111 (2013).
man, L. Sodré, D. N. Spergel, M. A. Strauss, H. Sugai, [18] J. DeRose, R. H. Wechsler, J. L. Tinker, M. R. Becker,
Y. Suto, H. Takami, and R. Wyse, Extragalactic science, Y.-Y. Mao, T. McClintock, S. McLaughlin, E. Rozo, and
cosmology, and Galactic archaeology with the Subaru Z. Zhai, The Aemulus Project. I. Numerical Simulations
Prime Focus Spectrograph, Publications of the Astro- for Precision Cosmology, The Astrophysical Journal 875,
nomical Society of Japan 66, R1 (2014), arXiv:1206.0737 69 (2019).
[astro-ph.CO]. [19] T. McClintock, E. Rozo, M. R. Becker, J. DeRose, Y.-
[7] T. Auld, M. Bridges, M. P. Hobson, and S. F. Y. Mao, S. McLaughlin, J. L. Tinker, R. H. Wechsler,
Gull, Fast cosmological parameter estimation using neu- and Z. Zhai, The Aemulus Project. II. Emulating the
ral networks, MNRAS 376, L11 (2007), arXiv:astro- Halo Mass Function, The Astrophysical Journal 872, 53
ph/0608174 [astro-ph]. (2019).
[8] T. Auld, M. Bridges, and M. P. Hobson, COSMONET: [20] Z. Zhai, J. L. Tinker, M. R. Becker, J. DeRose, Y.-Y.
fast cosmological parameter estimation in non-flat mod- Mao, T. McClintock, S. McLaughlin, E. Rozo, and R. H.
els using neural networks, MNRAS 387, 1575 (2008), Wechsler, The Aemulus Project. III. Emulation of the
arXiv:astro-ph/0703445 [astro-ph]. Galaxy Correlation Function, The Astrophysical Journal
[9] G. Aricò, R. E. Angulo, and M. Zennaro, Accelerating 874, 95 (2019).
Large-Scale-Structure data analyses by emulating Boltz- [21] R. E. Smith and R. E. Angulo, Precision modelling of
mann solvers and Lagrangian Perturbation Theory, arXiv the matter power spectrum in a Planck-like Universe,
e-prints , arXiv:2104.14568 (2021), arXiv:2104.14568 Monthly Notices of the Royal Astronomical Society 486,
[astro-ph.CO]. 1448 (2019).
[10] A. Spurio Mancini, D. Piras, J. Alsing, B. Joachimi, [22] T. Nishimichi, M. Takada, R. Takahashi, K. Osato,
and M. P. Hobson, COSMOPOWER: emulating cosmo- M. Shirasaki, T. Oogi, H. Miyatake, M. Oguri, R. Mu-
logical power spectra for accelerated Bayesian inference rata, Y. Kobayashi, and N. Yoshida, Dark Quest. I. Fast
from next-generation surveys, MNRAS 511, 1771 (2022), and Accurate Emulation of Halo Clustering Statistics and
arXiv:2106.03846 [astro-ph.CO]. Its Application to Galaxy Clustering, The Astrophysical
[11] A. Nygaard, E. B. Holm, S. Hannestad, and T. Tram, Journal 884, 29 (2019).
CONNECT: a neural network based framework for em- [23] D. Valcin, F. Villaescusa-Navarro, L. Verde, and A. Rac-
ulating cosmological observables and cosmological pa- canelli, BE-HaPPY: bias emulator for halo power spec-
rameter inference, Journal of Cosmology and Astropar- trum including massive neutrinos, Journal of Cosmology
ticle Physics 2023, 025 (2023), arXiv:2205.15726 [astro- and Astroparticle Physics 2019 (12), 057.
ph.IM]. [24] G. Aricò, R. E. Angulo, S. Contreras, L. Ondaro-Mallea,
[12] S. Günther, J. Lesgourgues, G. Samaras, N. Schöneberg, M. Pellejero-Ibañez, and M. Zennaro, The BACCO sim-
F. Stadtmann, C. Fidler, and J. Torrado, CosmicNet II: ulation project: a baryonification emulator with neural
emulating extended cosmologies with efficient and ac- networks, MNRAS 506, 4070 (2021), arXiv:2011.15018
curate neural networks, Journal of Cosmology and As- [astro-ph.CO].
troparticle Physics 2022, 035 (2022), arXiv:2207.05707 [25] F. Villaescusa-Navarro, C. Hahn, E. Massara, A. Baner-
[astro-ph.CO]. jee, A. M. Delgado, D. K. Ramanah, T. Charnock,
[13] M. Bonici, F. Bianchini, and J. Ruiz-Zapatero, Capse.jl: E. Giusarma, Y. Li, E. Allys, A. Brochard, C. Uhlemann,
efficient and auto-differentiable CMB power spectra em- C.-T. Chiang, S. He, A. Pisani, A. Obuljen, Y. Feng,
12

E. Castorina, G. Contardo, C. D. Kreisch, A. Nicola, arXiv:2306.03144 [astro-ph.CO].


J. Alsing, R. Scoccimarro, L. Verde, M. Viel, S. Ho, [38] C. E. Rasmussen and C. K. I. Williams, Gaussian Pro-
S. Mallat, B. Wandelt, and D. N. Spergel, The Quijote cesses for Machine Learning (MIT Press, 2006).
Simulations, The Astrophysical Journal Supplement Se- [39] R. Garnett, Bayesian Optimization (Cambridge Univer-
ries 250, 2 (2020), arXiv:1909.05273 [astro-ph.CO]. sity Press, 2023).
[26] K. Heitmann, D. Bingham, E. Lawrence, S. Bergner, [40] L. Cabayol-Garcia, J. Chaves-Montero, A. Font-Ribera,
S. Habib, D. Higdon, A. Pope, R. Biswas, H. Finkel, and C. Pedersen, A neural network emulator for the
N. Frontiere, and S. Bhattacharya, THE MIRA–TITAN Lyman-α forest 1D flux power spectrum, MNRAS 525,
UNIVERSE: PRECISION PREDICTIONS FOR DARK 3499 (2023), arXiv:2305.19064 [astro-ph.CO].
ENERGY SURVEYS, The Astrophysical Journal 820, [41] K. Diao and Y. Mao, Multi-fidelity emulator for large-
108 (2016). scale 21 cm lightcone images: a few-shot transfer learn-
[27] E. Lawrence, K. Heitmann, J. Kwan, A. Upadhye, ing approach with generative adversarial network, arXiv
D. Bingham, S. Habib, D. Higdon, A. Pope, H. Finkel, e-prints , arXiv:2502.04246 (2025), arXiv:2502.04246
and N. Frontiere, The Mira-Titan Universe. II. Matter [astro-ph.IM].
Power Spectrum Emulation, The Astrophysical Journal [42] F. Zhang, Y. Luo, B. Li, R. Cao, W. Peng, J. Mey-
847, 50 (2017). ers, and P. R. Shapiro, SageNet: Fast Neural Network
[28] S. Bocquet, K. Heitmann, S. Habib, E. Lawrence, Emulation of the Stiff-amplified Gravitational Waves
T. Uram, N. Frontiere, A. Pope, and H. Finkel, The from Inflation, arXiv e-prints , arXiv:2504.04054 (2025),
Mira-Titan Universe. III. Emulation of the Halo Mass arXiv:2504.04054 [astro-ph.CO].
Function, Astrophys. J. 901, 5 (2020), arXiv:2003.12116 [43] J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun,
[astro-ph.CO]. H. Kianinejad, M. M. A. Patwary, Y. Yang, and Y. Zhou,
[29] K. R. Moran, K. Heitmann, E. Lawrence, S. Habib, Deep Learning Scaling is Predictable, Empirically, arXiv
D. Bingham, A. Upadhye, J. Kwan, D. Higdon, and e-prints , arXiv:1712.00409 (2017), arXiv:1712.00409
R. Payne, The Mira-Titan Universe - IV. High-precision [cs.LG].
power spectrum emulation, MNRAS 520, 3443 (2023), [44] M. Guo, A. Manzoni, M. Amendt, P. Conti, and J. S.
arXiv:2207.12345 [astro-ph.CO]. Hesthaven, Multi-fidelity regression using artificial neu-
[30] J. Kwan, S. Saito, A. Leauthaud, K. Heitmann, S. Habib, ral networks: Efficient approximation of parameter-
N. Frontiere, H. Guo, S. Huang, A. Pope, and S. Ro- dependent output quantities, Computer Methods in Ap-
driguéz-Torres, Galaxy Clustering in the Mira-Titan Uni- plied Mechanics and Engineering 389, 114378 (2022),
verse. I. Emulators for the Redshift Space Galaxy Corre- arXiv:2102.13403 [math.NA].
lation Function and Galaxy-Galaxy Lensing, Astrophys. [45] Y. Feng, S. Bird, L. Anderson, A. Font-Ribera, and
J. 952, 80 (2023), arXiv:2302.12379 [astro-ph.CO]. C. Pedersen, MP-Gadget/MP-Gadget: A tag for getting
[31] I. Sáez-Casares, Y. Rasera, T. R. G. Richardson, and a DOI (2018).
P. S. Corasaniti, The e-MANTIS emulator: Fast and ac- [46] P. Z. G. Qian, Sliced Latin Hypercube Designs, Journal
curate predictions of the halo mass function in f(R)CDM of the American Statistical Association 107, 393 (2012),
and wCDM cosmologies, Astronomy & Astrophysics 691, https://doi.org/10.1080/01621459.2011.644132.
A323 (2024), arXiv:2410.05226 [astro-ph.CO]. [47] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
[32] Euclid Collaboration, M. Knabenhans, and J. Stadel B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
et al., Euclid preparation: II. The EuclidEmula- R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
tor – a tool to compute the cosmology dependence napeau, M. Brucher, M. Perrot, and E. Duchesnay,
of the nonlinear matter power spectrum, Monthly Scikit-learn: Machine learning in Python, Journal of Ma-
Notices of the Royal Astronomical Society 484, chine Learning Research 12, 2825 (2011).
5509 (2019), https://academic.oup.com/mnras/article- [48] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,
pdf/484/4/5509/27790453/stz197.pdf. G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,
[33] Euclid Collaboration, M. Knabenhans, and J. Stadel et A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Rai-
al., Euclid preparation: IX. EuclidEmulator2 – power son, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
spectrum emulation with massive neutrinos and self- J. Bai, and S. Chintala, PyTorch: An Imperative Style,
consistent dark energy perturbations, Monthly Notices High-Performance Deep Learning Library, arXiv e-prints
of the Royal Astronomical Society 505, 2840 (2021). , arXiv:1912.01703 (2019), arXiv:1912.01703 [cs.LG].
[34] Z. Chen, Y. Yu, J. Han, and Y. P. Jing, CSST Cosmolog- [49] I. Loshchilov and F. Hutter, Decoupled Weight Decay
ical Emulator I: Matter Power Spectrum Emulation with Regularization, arXiv e-prints , arXiv:1711.05101 (2017),
one percent accuracy, arXiv e-prints , arXiv:2502.11160 arXiv:1711.05101 [cs.LG].
(2025), arXiv:2502.11160 [astro-ph.CO]. [50] D. P. Kingma and J. Ba, Adam: A Method for Stochastic
[35] Y. Yang, S. Bird, and M.-F. Ho, Ten-parameter simu- Optimization, arXiv e-prints , arXiv:1412.6980 (2014),
lation suite for cosmological emulation beyond ΛCDM, arXiv:1412.6980 [cs.LG].
Phys. Rev. D 111, 083529 (2025), arXiv:2501.06296 [51] P. Ramachandran, B. Zoph, and Q. V. Le, Searching for
[astro-ph.CO]. Activation Functions, arXiv e-prints , arXiv:1710.05941
[36] M.-F. Ho, S. Bird, and C. R. Shelton, Multifidelity em- (2017), arXiv:1710.05941 [cs.NE].
ulation for the matter power spectrum using Gaussian [52] J. Bergstra, D. Yamins, and D. Cox, Making a science of
processes, MNRAS 509, 2551 (2022), arXiv:2105.01081 model search: Hyperparameter optimization in hundreds
[astro-ph.CO]. of dimensions for vision architectures, in Proceedings of
[37] M.-F. Ho, S. Bird, M. A. Fernandez, and C. R. Shel- the 30th International Conference on Machine Learn-
ton, MF-Box: multifidelity and multiscale emulation for ing, Proceedings of Machine Learning Research, Vol. 28,
the matter power spectrum, MNRAS 526, 2903 (2023), edited by S. Dasgupta and D. McAllester (PMLR, At-
13

lanta, Georgia, USA, 2013) pp. 115–123. it on the available test set. Goku-pre-N contains 297 pairs
[53] R. Kohavi, A study of cross-validation and Bootstrap for of LF simulations and 27 HF simulations in the training
accuracy estimation and model selection, in Proceedings set and 12 HF simulations in the test set. Following the
of the International Joint Conference on Artificial Intelli- main text, we do not use L1 simulations in this study.
gence (IJCAI) (Morgan Kaufmann, 1995) pp. 1137–1143. The HF simulations evolve 3003 particles in a box of size
[54] Y. Yang, S. Bird, M.-F. Ho, and M. Qezlou, Ten-
100 Mpc/h. For more details about the Goku-pre-N simu-
dimensional neural network emulator for the nonlinear
matter power spectrum, arXiv e-prints (2025), submit- lations, see Ref. [35].
ted concurrently to arXiv.
[55] S. Bird, M. Fernandez, M.-F. Ho, M. Qezlou, R. Monadi,
Y. Ni, N. Chen, R. Croft, and T. Di Matteo, PRIYA: a
new suite of Lyman-α forest simulations for cosmology,
Journal of Cosmology and Astroparticle Physics 2023,
037 (2023), arXiv:2306.05471 [astro-ph.CO]. The LOO error and test error of the emulator are
shown in the top and bottom panels of Fig. 9, respec-
tively. They are consistent with each other, with the
Appendix A: LOOCV vs. Separate Test Set test error being slightly lower than the LOO error. This
indicates that the LOO cross-validation is a good repre-
We train an emulator based on the preliminary simula- sentative of the generalization error of the final emulator
tion set, Goku-pre-N, using the Optimal approach and test trained on the full training set.
14

LOOCV (2.24%)
20

15
Φ rMAE (%)

10

0
Test (2.05%)
20
z = 0.0
z = 0.2
15 z = 0.5
z = 1.0
Φ rMAE (%)

10 z = 2.0
z = 3.0

0
100 101
−1
k (h Mpc )

FIG. 9. Comparison of LOO error (top) and test error (bot-


tom) for the emulator trained on the Goku-pre-N simulations.
The solid lines are the mean errors, and the shaded regions
indicate the range of individual cosmologies. The redshifts
are color coded, and the overall mean errors are shown in the
titles of the panels.

You might also like