Design and Optimization of Neural Networks For Multifidelity Cosmological Emulation
Design and Optimization of Neural Networks For Multifidelity Cosmological Emulation
Yanhui Yang (杨焱辉)1 ,∗ Simeon Bird1 ,† Ming-Feng Ho (何銘峰)1,2,3 , and Mahdi Qezlou4
1
Department of Physics & Astronomy, University of California,
Riverside, 900 University Ave., Riverside, CA 92521, USA
2
Department of Physics, University of Michigan, 450 Church St, Ann Arbor, MI 48109, USA
3
Leinweber Center for Theoretical Physics, 450 Church St, Ann Arbor, MI 48109, USA and
4
The University of Texas at Austin, 2515 Speedway Boulevard, Stop C1400, Austin, TX 78712, USA
(Dated: July 11, 2025)
Accurate and efficient simulation-based emulators are essential for interpreting cosmological sur-
vey data down to nonlinear scales. Multifidelity emulation techniques reduce simulation costs by
combining high- and low-fidelity data, but traditional regression methods such as Gaussian processes
struggle with scalability in sample size and dimensionality. In this work, we present T2N-MusE, a
neural network framework characterized by (i) a novel 2-step multifidelity architecture, (ii) a 2-stage
arXiv:2507.07184v1 [astro-ph.CO] 9 Jul 2025
Bayesian hyperparameter optimization, (iii) a 2-phase k-fold training strategy, and (iv) a per-z prin-
cipal component analysis strategy. We apply T2N-MusE to selected data from the Goku simulation
suite, covering a 10-dimensional cosmological parameter space, and build emulators for the mat-
ter power spectrum over a range of redshifts with different configurations. We find the emulators
outperform our earlier Gaussian process models significantly and demonstrate that each of these
techniques is efficient in training neural networks or/and effective in improving generalization accu-
racy. We observe a reduction in validation error by more than a factor of five compared to previous
work. This framework has been used to build the most powerful emulator for the matter power
spectrum, GokuNEmu, and will also be used to construct emulators for other statistics in future.
clid [3], the Nancy Grace Roman Space Telescope [4], emulators are able to predict summary statistics within
the China Space Station Telescope (CSST) [5], and the their parameter space with orders of magnitude lower
Prime Focus Spectrograph (PFS) on the Subaru Tele- computational costs than full simulations.
scope [6] will enable precise measurements of the galaxy There are several well-motivated extensions of the
power spectrum, as well as the weak lensing shear field. standard cosmological model which are constrained by
These measurements will be used to constrain cosmologi- current and future surveys. However, including these
cal models motivated by unresolved fundamental physics extensions in emulators is challenging due to the high
questions. dimensionality of the parameter space that necessitates
Interpreting the data and inferring cosmological pa- a large number of computationally expensive samples.
rameters requires making predictions for the matter field Multi-fidelity (MF) techniques have been developed to
or a summary statistic, such as the matter power spec- reduce the computational cost of building emulators,
trum, and using Bayesian methods. A naive inference e.g., MFEmulator [36] and MF-Box [37]. Ref. [35] built
run may require 106 –107 matter power spectrum evalua- GokuEmu [35], an emulator for the matter power spectrum,
tions at different cosmological parameters, which would which expanded the parameter space to 10 dimensions
be computationally expensive. for the first time, taking into account dynamical dark
Emulation replaces intensive numerical computation energy, massive neutrinos, the effective number of ultra-
for every likelihood evaluation by the evaluation of a relativistic neutrinos and the running of the primordial
cheap pre-trained surrogate model. For instance, emu- spectral index. This was achieved by using MF-Box, which
lators have been widely used to replace the Boltzmann combines simulations with different box sizes and parti-
codes in cosmological inference [7–14]. Emulators based cle loads, at a computational cost 94% less than single-
on N -body simulations are needed to interpret observa- fidelity approaches.
tions on nonlinear scales, k ≳ 0.1h/Mpc. There have Despite the success of MF-Box in reducing the computa-
been several such cosmological emulators, e.g., Franken- tional cost of producing the training data (simulations),
Emu [15–17], the emulators of the Aemulus project [18– the regression technique used, Gaussian process (GP) re-
20], NGenHalofit [21], the dark quest emulator [22], BE- gression, still suffers from the curse of dimensionality.
HaPPY [23], the baryonification emulator of the BACCO The computational complexity of GP regression scales
project [24], the emulators built on the Quijote simu- poorly (cubically) with sample size (see Chapter 8 of
Ref. [38] or Chapter 9 of Ref. [39]). This in turn leads
to lengthy prediction and training times, as well as in-
creased memory usage. GP regression struggles to sat-
∗ [email protected] isfy our need for next-generation cosmological emulators,
† [email protected] which would ideally become yet more complex, including
2
such that yH = F(x, yL ), based on the input data configurations found in the first stage. Each evaluation of
(X H , fNN
L
(X H )) = {(xH,i , fNNL
(xH,i )) : i = 1, 2, . . . , nH } the hyperparameters involves both training and valida-
and the available HF output data Y H = yH (X H ) = tion of an NN. Importantly, this pipeline is not limited to
{yH,i : i = 1, 2, . . . , nH }. Note that in our case, the the multifidelity emulation context; it is broadly applica-
HF cosmologies are a subset of the LF cosmologies, so ble to other tasks, including single-fidelity emulation and
L
we can replace fNN (X H ) with yL (X H ), such that the two general regression problems involving high-dimensional
NNs can be trained independently and simultaneously. outputs.
While Ref. [44] restricts the second NN to be a shallow More details of each component of the workflow are
NN with only one hidden layer, we allow multiple hidden given in their dedicated sections. See Sec. II C 1 for data
layers in the second NN to increase the flexibility of the compression, Sec. II C 2 for neural network training, and
model. Sec. II C 3 for hyperparameter optimization.
Figure 1 illustrates the original 2-step architecture
with a simple example of 2D input and 3D output. Note
that the input of N NLH (the NN modeling LF-HF cor- 1. Data compression
relation), is a 5D vector, which is a concatenation of the
LF output and the initial input vector. We use PCA to reduce the dimensionality of the output
We propose a modified 2-step architecture with the data. Two strategies are explored in this work: global
same N NL but a different N NLH , illustrated in Fig. 1. PCA and per-redshift (hereafter, local) PCA. The former
Instead of approximating the correlation between the was adopted in some existing emulators, e.g., EuclidEmula-
LF and HF functions, the new N NLH learns the ra- tor2 [33] and the CSST Emulator [34]. We propose the latter
tio of yH to yL , r with the component ri = yiH /yiL as a new approach to compress the output data, allow-
for i = 1, 2, . . . , dout , as a function of the input vec- ing a more flexible representation of the output data that
tor x, i.e., r = G(x). The training data for N NLH is may be better suited to the case where the redshift evo-
T H = {(xH,i , rH,i ) : i = 1, 2, . . . , nH }, where rH,i = lution of the output is nonlinear or complex.
yH,i ⊘ fNNL
(xH,i ). As before, we replace fNN L
(xH,i ) with In the global PCA approach, we perform PCA on all
L H,i
y (x ). With the trained N NL and N NLH , we can pre- k modes and redshifts together, and then each of the
H L
dict the HF output as yNN = GNN (x) ⊙ fNN (x).3 Note original output components can be expressed as a linear
that, for the matter power spectrum, the ratio is calcu- combination of the principal components (PCs), i.e.,
lated in original space rather than log space.
nX
The modified 2-step model significantly reduces the di- PCA
mensionality of the input of N NLH , which is din + dout in y(zi , kj ; x) = µ(zi , kj ) + al (x)ϕl (zi , kj ), (2)
the original architecture and din in the modified architec- l=1
<latexit sha1_base64="ftW4DZPAFahncYmVaOpcyN1jR2o=">AAAB9HicbVDLSgMxFL3js9ZX1aWbYBFclRkp6rLgxoWUCvYB7VAyaaYNTTJjkimUod/hxoUibv0Yd/6NmXYW2nogcDjnXu7JCWLOtHHdb2dtfWNza7uwU9zd2z84LB0dt3SUKEKbJOKR6gRYU84kbRpmOO3EimIRcNoOxreZ355QpVkkH800pr7AQ8lCRrCxkl+v93sCm5ES6f2sXyq7FXcOtEq8nJQhR6Nf+uoNIpIIKg3hWOuu58bGT7EyjHA6K/YSTWNMxnhIu5ZKLKj203noGTq3ygCFkbJPGjRXf2+kWGg9FYGdzBLqZS8T//O6iQlv/JTJODFUksWhMOHIRChrAA2YosTwqSWYKGazIjLCChNjeyraErzlL6+S1mXFu6pUH6rlWjWvowCncAYX4ME11OAOGtAEAk/wDK/w5kycF+fd+ViMrjn5zgn8gfP5A9Jwkhs=</latexit>
y1H
<latexit sha1_base64="8GMRkfFpcNFmjJZsgG/gWt48A7g=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZlS1GXBTZcV7APaacmkmTY0yQxJRilD/8ONC0Xc+i/u/Bsz7Sy09UDgcM693JMTxJxp47rfzsbm1vbObmGvuH9weHRcOjlt6yhRhLZIxCPVDbCmnEnaMsxw2o0VxSLgtBNM7zK/80iVZpF8MLOY+gKPJQsZwcZKg9nQG/QFNhMl0sZ8WCq7FXcBtE68nJQhR3NY+uqPIpIIKg3hWOue58bGT7EyjHA6L/YTTWNMpnhMe5ZKLKj200XqObq0ygiFkbJPGrRQf2+kWGg9E4GdzBLqVS8T//N6iQlv/ZTJODFUkuWhMOHIRCirAI2YosTwmSWYKGazIjLBChNjiyraErzVL6+TdrXiXVdq97VyvZrXUYBzuIAr8OAG6tCAJrSAgIJneIU358l5cd6dj+XohpPvnMEfOJ8/miWSiw==</latexit>
y1L x2
<latexit sha1_base64="HkjDia/DYWYx47Q6vXLU59CEAKc=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDrx+T2AzViK9nQ3KFbfqzoFWiZeTCuRoDspfvWFEEkGlIRxr3fXc2PgpVoYRTmelXqJpjMkEj2jXUokF1X46Tz1DZ1YZojBS9kmD5urvjRQLracisJNZQr3sZeJ/Xjcx4ZWfMhknhkqyOBQmHJkIZRWgIVOUGD61BBPFbFZExlhhYmxRJVuCt/zlVdKuVb2Lav2uXmnU8jqKcAKncA4eXEIDbqAJLSCg4Ble4c15cl6cd+djMVpw8p1j+APn8wegOZKP</latexit>
r1
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>
<latexit sha1_base64="IhCunT2xzWQPSfkaGxQnwyLKW1o=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2g9oQ9lsN+3SzSbsToQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1iNOE+xEdKREKRtFKD3rgDcoVt+ouQNaJl5MK5GgOyl/9YczSiCtkkhrT89wE/YxqFEzyWamfGp5QNqEj3rNU0YgbP1ucOiMXVhmSMNa2FJKF+nsio5Ex0yiwnRHFsVn15uJ/Xi/F8MbPhEpS5IotF4WpJBiT+d9kKDRnKKeWUKaFvZWwMdWUoU2nZEPwVl9eJ+1a1buq1u/rlUYtj6MIZ3AOl+DBNTTgDprQAgYjeIZXeHOk8+K8Ox/L1oKTz5zCHzifPwCqjZM=</latexit>
x1
<latexit sha1_base64="afKV9aJhHeoYg4uVrgiaaB3XUu8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqIaAaBZfYMtwIfEgU0igQ2AnG1zO/84hK81jem0mCfkSHkoecUWOlu6e+1y9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fptvdKo5XEU4QRO4Rw8uIQG3EATWsBgCM/wCm+OcF6cd+dj0Vpw8plj+APn8wcJzo2Z</latexit>
x1
<latexit sha1_base64="afKV9aJhHeoYg4uVrgiaaB3XUu8=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsJ+3SzSbsbsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMTqIaAaBZfYMtwIfEgU0igQ2AnG1zO/84hK81jem0mCfkSHkoecUWOlu6e+1y9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fptvdKo5XEU4QRO4Rw8uIQG3EATWsBgCM/wCm+OcF6cd+dj0Vpw8plj+APn8wcJzo2Z</latexit>
y1L
<latexit sha1_base64="HkjDia/DYWYx47Q6vXLU59CEAKc=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDrx+T2AzViK9nQ3KFbfqzoFWiZeTCuRoDspfvWFEEkGlIRxr3fXc2PgpVoYRTmelXqJpjMkEj2jXUokF1X46Tz1DZ1YZojBS9kmD5urvjRQLracisJNZQr3sZeJ/Xjcx4ZWfMhknhkqyOBQmHJkIZRWgIVOUGD61BBPFbFZExlhhYmxRJVuCt/zlVdKuVb2Lav2uXmnU8jqKcAKncA4eXEIDbqAJLSCg4Ble4c15cl6cd+djMVpw8p1j+APn8wegOZKP</latexit>
y2H
<latexit sha1_base64="V8O31qBQhOJIvX9JXolzdJgmyBw=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZlS1GXBTZcV7APaacmkaRuaZIYkowxD/8ONC0Xc+i/u/Bsz7Sy09UDgcM693JMTRJxp47rfzsbm1vbObmGvuH9weHRcOjlt6zBWhLZIyEPVDbCmnEnaMsxw2o0UxSLgtBPM7jK/80iVZqF8MElEfYEnko0ZwcZKg2RYHfQFNlMl0sZ8WCq7FXcBtE68nJQhR3NY+uqPQhILKg3hWOue50bGT7EyjHA6L/ZjTSNMZnhCe5ZKLKj200XqObq0ygiNQ2WfNGih/t5IsdA6EYGdzBLqVS8T//N6sRnf+imTUWyoJMtD45gjE6KsAjRiihLDE0swUcxmRWSKFSbGFlW0JXirX14n7WrFu67U7mvlejWvowDncAFX4MEN1KEBTWgBAQXP8ApvzpPz4rw7H8vRDSffOYM/cD5/AJu0kow=</latexit>
y2L
<latexit sha1_base64="ojCuRe8tzwXSPocIh065uj1ELXA=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDmr9nsBmrER6OxuUK27VnQOtEi8nFcjRHJS/esOIJIJKQzjWuuu5sfFTrAwjnM5KvUTTGJMJHtGupRILqv10nnqGzqwyRGGk7JMGzdXfGykWWk9FYCezhHrZy8T/vG5iwis/ZTJODJVkcShMODIRyipAQ6YoMXxqCSaK2ayIjLHCxNiiSrYEb/nLq6Rdq3oX1fpdvdKo5XUU4QRO4Rw8uIQG3EATWkBAwTO8wpvz5Lw4787HYrTg5DvH8AfO5w+hyJKQ</latexit>
<latexit sha1_base64="vKwz7DbnlHWxCVKEJIGlKWng4/g=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2g9oQ9lsN+3SzSbsToQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW//4PCofHzSNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMbud+54lrI2L1iNOE+xEdKREKRtFKD3pQG5QrbtVdgKwTLycVyNEclL/6w5ilEVfIJDWm57kJ+hnVKJjks1I/NTyhbEJHvGepohE3frY4dUYurDIkYaxtKSQL9fdERiNjplFgOyOKY7PqzcX/vF6K4Y2fCZWkyBVbLgpTSTAm87/JUGjOUE4toUwLeythY6opQ5tOyYbgrb68Ttq1qndVrd/XK41aHkcRzuAcLsGDa2jAHTShBQxG8Ayv8OZI58V5dz6WrQUnnzmFP3A+fwACLo2U</latexit>
r2
x2
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>
x2
<latexit sha1_base64="aeQxYJPVes6SS0talrfteKXVpPo=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4r2lZoQ9lsN+3SzSbsTsQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T31a/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvUuqvXbeqVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8LUo2a</latexit>
y2L
<latexit sha1_base64="ojCuRe8tzwXSPocIh065uj1ELXA=">AAAB9XicbVBNTwIxFHyLX4hfqEcvjcTEE9klRD2SePHgARMBE1hIt3Shoe1u2q6GbPgfXjxojFf/izf/jV3Yg4KTNJnMvJc3nSDmTBvX/XYKa+sbm1vF7dLO7t7+QfnwqK2jRBHaIhGP1EOANeVM0pZhhtOHWFEsAk47weQ68zuPVGkWyXszjakv8EiykBFsrNSfDmr9nsBmrER6OxuUK27VnQOtEi8nFcjRHJS/esOIJIJKQzjWuuu5sfFTrAwjnM5KvUTTGJMJHtGupRILqv10nnqGzqwyRGGk7JMGzdXfGykWWk9FYCezhHrZy8T/vG5iwis/ZTJODJVkcShMODIRyipAQ6YoMXxqCSaK2ayIjLHCxNiiSrYEb/nLq6Rdq3oX1fpdvdKo5XUU4QRO4Rw8uIQG3EATWkBAwTO8wpvz5Lw4787HYrTg5DvH8AfO5w+hyJKQ</latexit>
y3H
<latexit sha1_base64="9aKi0Q1GwTeLAaJajFgKdw2Nh0A=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCmy4r2Ae0aZlMJ+3QmSTMTJQQ+h9uXCji1n9x5984abPQ1gMDh3Pu5Z45XsSZ0rb9bRU2Nre2d4q7pb39g8Oj8vFJR4WxJLRNQh7KnocV5Sygbc00p71IUiw8Trve7C7zu49UKhYGDzqJqCvwJGA+I1gbaZiMroYDgfVUirQ5H5UrdtVeAK0TJycVyNEalb8G45DEggaacKxU37Ej7aZYakY4nZcGsaIRJjM8oX1DAyyoctNF6jm6MMoY+aE0L9Boof7eSLFQKhGemcwSqlUvE//z+rH2b92UBVGsaUCWh/yYIx2irAI0ZpISzRNDMJHMZEVkiiUm2hRVMiU4q19eJ51a1bmu1u/rlUYtr6MIZ3AOl+DADTSgCS1oAwEJz/AKb9aT9WK9Wx/L0YKV75zCH1ifP51Dko0=</latexit>
y3L
<latexit sha1_base64="ZnzXZ6ls7M1jJd2fc9OoVGqZeY0=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCGxcuKtgHtGmZTCft0MkkzEyUEPofblwo4tZ/ceffOGmz0NYDA4dz7uWeOV7EmdK2/W0V1tY3NreK26Wd3b39g/LhUVuFsSS0RUIeyq6HFeVM0JZmmtNuJCkOPE473vQm8zuPVCoWigedRNQN8FgwnxGsjTRIhheDfoD1RAbp3WxYrthVew60SpycVCBHc1j+6o9CEgdUaMKxUj3HjrSbYqkZ4XRW6seKRphM8Zj2DBU4oMpN56ln6MwoI+SH0jyh0Vz9vZHiQKkk8MxkllAte5n4n9eLtX/tpkxEsaaCLA75MUc6RFkFaMQkJZonhmAimcmKyARLTLQpqmRKcJa/vEratapzWa3f1yuNWl5HEU7gFM7BgStowC00oQUEJDzDK7xZT9aL9W59LEYLVr5zDH9gff4Ao1eSkQ==</latexit>
<latexit sha1_base64="ixnTP8mUDOcvFSeF/yQrFZ6phJg=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4Kkkt6rHgxWNF+wFtKJvtpF262YTdjVBCf4IXD4p49Rd589+4bXPQ1gcDj/dmmJkXJIJr47rfztr6xubWdmGnuLu3f3BYOjpu6ThVDJssFrHqBFSj4BKbhhuBnUQhjQKB7WB8O/PbT6g0j+WjmSToR3QoecgZNVZ6UP3LfqnsVtw5yCrxclKGHI1+6as3iFkaoTRMUK27npsYP6PKcCZwWuylGhPKxnSIXUsljVD72fzUKTm3yoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPEzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2naEPwll9eJa1qxbuq1O5r5Xo1j6MAp3AGF+DBNdThDhrQBAZDeIZXeHOE8+K8Ox+L1jUnnzmBP3A+fwADso2V</latexit>
r3
y3L
<latexit sha1_base64="ZnzXZ6ls7M1jJd2fc9OoVGqZeY0=">AAAB9XicbVDLSsNAFL2pr1pfVZduBovgqiS1qMuCGxcuKtgHtGmZTCft0MkkzEyUEPofblwo4tZ/ceffOGmz0NYDA4dz7uWeOV7EmdK2/W0V1tY3NreK26Wd3b39g/LhUVuFsSS0RUIeyq6HFeVM0JZmmtNuJCkOPE473vQm8zuPVCoWigedRNQN8FgwnxGsjTRIhheDfoD1RAbp3WxYrthVew60SpycVCBHc1j+6o9CEgdUaMKxUj3HjrSbYqkZ4XRW6seKRphM8Zj2DBU4oMpN56ln6MwoI+SH0jyh0Vz9vZHiQKkk8MxkllAte5n4n9eLtX/tpkxEsaaCLA75MUc6RFkFaMQkJZonhmAimcmKyARLTLQpqmRKcJa/vEratapzWa3f1yuNWl5HEU7gFM7BgStowC00oQUEJDzDK7xZT9aL9W59LEYLVr5zDH9gff4Ao1eSkQ==</latexit>
yiH = ri yiL
<latexit sha1_base64="3TfcrNRJZnQwOCfbFeYxPwLyqZQ=">AAACCnicbVDLSsNAFJ34rPUVdelmtAiuSlKKuhEKbrpwUcE+oI1hMp20QyeTMDMRQsjajb/ixoUibv0Cd/6NkzaL2nrgwuGce7n3Hi9iVCrL+jFWVtfWNzZLW+Xtnd29ffPgsCPDWGDSxiELRc9DkjDKSVtRxUgvEgQFHiNdb3KT+91HIiQN+b1KIuIEaMSpTzFSWnLNk8SlD4MAqbEI0mZ2LVwK56XbzDUrVtWaAi4TuyAVUKDlmt+DYYjjgHCFGZKyb1uRclIkFMWMZOVBLEmE8ASNSF9TjgIinXT6SgbPtDKEfih0cQWn6vxEigIpk8DTnfmFctHLxf+8fqz8KyelPIoV4Xi2yI8ZVCHMc4FDKghWLNEEYUH1rRCPkUBY6fTKOgR78eVl0qlV7Ytq/a5eadSKOErgGJyCc2CDS9AATdACbYDBE3gBb+DdeDZejQ/jc9a6YhQzR+APjK9fBM+bEQ==</latexit>
FIG. 1. Examples of the original and modified 2-step MF NN architectures. Both architectures have the same N NL (step 1:
the LF NN) but different N NLH (step 2: the NN used to correct the LF output). The original N NLH (a) approximates the
correlation between the LF and HF functions, with (x, yL ) as input and yH as output. The modified N NLH (b) learns the
mapping from x to the ratio of yH to yL , r = G(x), and the final HF output is the element-wise product of the LF output with
the correction ratio r.
Following Ref. [33], we determine the number of PCs L(W, b) = Ltrain (W, b) + λ||W||22 , (6)
based on the cumulative variance they explain. Specifi-
cally, we select the smallest value of nPCA (or niPCA for where Ltrain is the training loss, measuring the distance
local PCA) such that the remaining unexplained variance between the predicted output and the training output
is < 10−5 . While we do not investigate how emulator per- data, and the second term is the regularization term,
formance varies with this threshold in the present work, which penalizes large weights to prevent overfitting. The
we note that the optimal choice is likely data-dependent. regularization parameter λ is a hyperparameter that con-
As such, it is generally advisable to assess emulator ac- trols the strength of the regularization. We use the mean
5
Phase 1: searching for a good local minimum Likewise, for N NL , the data set has nL samples, and
(with separate training and validation data) we split the data into k = nL folds. However, only the
nH HF cosmologies should be tested on for our purpose,
1 2 3 4 5 6 7 8 9 i.e., we only need to iterate over the nH folds that leave
the HF cosmologies out in training. We can take advan-
tage of this feature and use a 2-phase training strategy,
3 random seeds where a good local minimum is found in the first phase
and then used as the common initial model for the sec-
Model A Model B Model C ond phase of training for each fold. This way is much
more efficient than regular methods (such as what we do
for N NL ), since there will be no need to search for good
Choose the best minima for every fold by trying different random seeds
Training independently. Specifically, in the first phase, the LF
Model C
cosmology-only data (samples with HF cosmologies are
Test L
Initialize excluded), T1,train = (xi , yi ) : 1 ≤ i ≤ nL , xi ∈
/ X H , is
used as the training set, and the LF data with the HF
Phase 2: regular k-fold training and validation L
= T L \ T1,train
L
<latexit sha1_base64="/n8nEJKBtYAVuxMaizwscki5HXs=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4Kkkp6rHgxWML9gPaUDbbSbt2swm7G6GE/gIvHhTx6k/y5r9x2+agrQ8GHu/NMDMvSATXxnW/nY3Nre2d3cJecf/g8Oi4dHLa1nGqGLZYLGLVDahGwSW2DDcCu4lCGgUCO8Hkbu53nlBpHssHM03Qj+hI8pAzaqzUnAxKZbfiLkDWiZeTMuRoDEpf/WHM0gilYYJq3fPcxPgZVYYzgbNiP9WYUDahI+xZKmmE2s8Wh87IpVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNeGtn3GZpAYlWy4KU0FMTOZfkyFXyIyYWkKZ4vZWwsZUUWZsNkUbgrf68jppVyvedaXWrJXr1TyOApzDBVyBBzdQh3toQAsYIDzDK7w5j86L8+58LFs3nHzmDP7A+fwB0KOM6A==</latexit>
out).
FIG. 3. Illustration of the two-phase k-fold training and cross- For the best-performing set of hyperparameters, we ini-
validation strategy for N NL , assuming a total of 9 samples, tialize the final model training using the fold model with
of which 3 (orange circles) are supposed to be tested against
the median regularized loss. To prevent overfitting dur-
(i.e., the HF cosmologies). In phase 1, the model is trained on
the remaining 6 samples (blue circles) using 3 separate runs ing this final training step, we impose a lower bound on
with different random seeds, and validated on the 3 held-out the training loss: specifically, the final training loss is not
test samples. In phase 2, we perform regular k-fold training allowed to fall below 80% of the median training loss ob-
and validation, with the initial model (weights and biases) set served across the folds. This threshold has proven effec-
to the best model found in phase 1. tive in practice, as we have verified that the final model’s
performance remains consistent with the LOOCV results
(see Appendix A). Nevertheless, a more comprehensive
combined loss as a function of the hyperparameters: evaluation of this thresholding strategy could be pursued
in future work.
1 X
k The 2-phase training strategy ensures that all the fold
Φ(L, M, λ) = [Φtrain,i (L, M, λ) + Φval,i (L, M, λ)] , models fall into the same local minimum, and the valida-
2k i=1 tion error should be a better representative of the gener-
(9) alization error for the final model (also in the same local
where Φtrain (L, M, λ) = minW,b Ltrain (L, M, λ; W, b) minimum) trained on the full LF data set compared to
(the minimum training loss from Eq. (7)) and i is the in- regular k-fold validation. Note that the HF cosmologies
dex of the fold. Φval,i (L, M, λ) is defined in a similar way must be excluded from the training set in the first phase,
as Φtrain,i (L, M, λ) but with the training loss replaced by though that phase is just for the sake of local minima
the validation loss. searching instead of final validation. Because otherwise,
For N NLH , the data set has nH samples, and we split
the data into k = nH folds. In each iteration, we use
nH − 1 samples for training and 1 sample for validation.
In addition, nLHseed = 5 random seeds are used to initialize
7 In practice, we also set the initial learning rate in the second
the weights and biases of the NN for each fold training phase to be equal to the final learning rate from the first phase
to avoid bad local minima. to avoid jumping to other local minima.
7
the model for initialization would have memorized the discuss the impact of each technique at the level of the
data we are supposed to test on, and validation in the component NNs (N NL and N NLH ).
second phase would be invalid. This is also the reason
why we cannot use the 2-phase strategy for N NLH (no
data available other than the test points) but have to try III. RESULTS
multiple random seeds for each fold training.
We present the results of the comparative study in this
section. The models trained with different approaches
D. Comparative Study Design are evaluated using LOOCV, with the validation error de-
fined as the relative mean absolute error (rMAE) of the
The techniques evaluated in this work are summarized predicted power spectrum compared to the true power
in Table III. To assess the effectiveness of each technique, spectrum, denoted as ΦrMAE . For clarity, each model is
we design a comparative study with a series of different identified by the name of the approach used in its con-
approaches for emulator construction. These approaches struction (e.g., the model trained with the Base approach
are distinct combinations of the techniques we mentioned is referred to as Base).
above. The configurations for each approach are defined Figure 4 shows the validation errors as functions of k
in Table IV. and z for Base, Mid, and Optimal. We found that even
Mid serves as the reference approach, which uses the the basic model, Base, achieves a validation error sig-
modified 2-step architecture, separate PCA for each red- nificantly lower than GokuEmu’s 3% error (see Fig. 13 of
shift, but does not include hyperparameter fine-tuning Ref. [35]). This suggests that NNs may be better-suited
and 2-phase training of N NL . Base is the most basic for emulation tasks involving large training sets and high-
approach, with the original 2-step architecture, global dimensional parameter spaces than GPs. The improve-
PCA, and no additional optimization strategies. The ment may be accounted for by more efficient training
most advanced approach, Optimal, incorporates all en- that allows more intensive hyperparameter optimization,
hanced techniques. The remaining approaches differ from though PCA could have also contributed to the perfor-
Mid by altering only one component, allowing us to iso- mance improvement. Compared to Base, Mid achieves
late the contribution of each technique. For example, a significant improvement in accuracy, with an overall
Arch-0 uses the original 2-step architecture but keeps the validation error of 1.03% (compared to 1.73% for Base),
other techniques the same as Mid. HO-2 uses 2-stage hy- attributed to the modified 2-step architecture and the lo-
perparameter optimization without changing other com- cal PCA strategy.9 The improvement is observed across
ponents. all redshifts and wavenumbers, though the worst-case er-
By comparing HO-2 and Mid, we will see the effect of ror is still much higher than the average. The validation
hyperparameter fine-tuning. However, it is not a strictly error of Optimal is less than 1% for all redshifts and al-
fair comparison, since they would have significant differ- most all wavenumbers, with an overall mean of 0.62%,
ences in compute time. To make a fair comparison, we which is a further improvement over Mid resulting from
define HO-3, which uses the same 1-stage hyperparameter the changes in hyperparameter optimization and training
optimization as Mid but with a larger number of trials, of N NL . Not only is the overall validation error reduced,
ntrial = 120, ensuring that the total compute time is sim- but the worst-case error is also considerably lower than
ilar to HO-2. Similarly, while comparing NNL-1 and Mid that of Mid (a reduction by a factor of 5 will be seen in
will show the effect of 2-phase training of N NL , we also Fig. 5).
define NNL-0+ which uses the same 1-phase training as A summary comparison of the LOO errors of the emu-
Mid (which does not try multiple seeds) but with a larger lators built with different approaches is shown in Fig. 5,
number of random seeds, nseed = 3, leading to a similar with both the overall mean error and the worst-case error
compute time as NNL-1. Although we do not present a shown. The error of Mid is lower than that of Arch-0 and
detailed quantitative comparison of compute times across PCA-0, indicating both the modified 2-step architecture
all approaches, we note that training each emulator re- and the local PCA strategy are effective in improving
quires less than 24 hours on a single Grace-Hopper node the performance of the emulator, while the 2-step ar-
of the Vista supercomputer8 . This cost is negligible rel- chitecture leads to a larger improvement than the local
ative to the computational expense of running the sim- PCA data compression strategy. While the aforemen-
ulations themselves in the context of simulation-based tioned techniques improve the overall mean error, the
emulation. worst-case error is still high. A substantial reduction in
The results of the comparative study will be shown the worst-case error is seen when implementing a 2-phase
in Sec. III, where we will compare the performance of training strategy, NNL-1, which allows a large number
the emulators built with different approaches and also of local minima to be explored efficiently. NNL-0+ also
8 https://tacc.utexas.edu/systems/vista/ 9 Similar compute times were used to train these two models.
8
TABLE III. Techniques considered in this work. The numbers 0, 1, and 0+ (if applicable) refer to the choices of the strategies,
e.g., choice 0 represents the original 2-step model for the MF NN architecture. ntrial is the number of trials in the coarse search
stage of the hyperparameter optimization process, and ntune trial is the number of trials in the fine-tuning stage. “1-stage” means
no fine-tuning. For the training of N NL , “1-phase” refers to regular k-fold training and validation (nseed = 1 by default).
Choice MF NN architecture PCA Hyperparameter optimization Training of N NL
0 Original 2-step Global (all-z) 1-stage with ntrial = 80 1-phase
1 Modified 2-step Local (per-z) 2-stage with ntrial = 80 and ntune
trial = 40 2-phase
0+ 1-phase with nseed = 3
6
z = 2.0
4 z = 3.0
1%
2
0
10 1 100 101 10 1 100 101 10 1 100 101
−1 −1 −1
k (h Mpc ) k (h Mpc ) k (h Mpc )
FIG. 4. LOO errors of the emulators built with approaches Base, Mid, and Optimal. Redshifts are color coded. The solid lines
are the error averaged over cosmologies, and the corresponding shaded regions indicate the range of individual cosmologies.
The gray-shaded area marks the region where the error is less than 1%. Each model is titled with the name of the approach
and its overall validation error.
Base 0 0 0 0
Arch-0 0 1 0 0
4 PCA-0 1 0 0 0
3 Mid 1 1 0 0
NNL-1 1 1 0 1
2 NNL-0+ 1 1 0 0+
Optimal 1 1 1 1
1
0
shows an improvement over Mid, but the improvement
se
l
d
0+
0
1
ma
Mi
-
A-
L-
L-
PC
NN
ti
NN
Ar
Op
3.0 3.0
Arch-0 PCA-0
2.5 (1.03%) 2.5 (1.15%)
Mid Mid
2.0 (0.33%) 2.0 (0.97%)
Φ LrMAE (%)
rMAE (%)
z = 0.0 z = 0.0
1.5 1.5 z = 1.0
z = 1.0
Φ LH
0.5 0.5
0.0 0.0
1 10 1 100 101
10 100 101 3.0
−1
k (h Mpc ) k (h Mpc −1 ) PCA-0
2.5 (0.38%)
Mid
FIG. 6. Comparison of the LF-to-HF correction NNs of the 2.0 (0.33%)
rMAE (%)
original (Arch-0) and the modified (Mid) 2-step architectures.
Blue lines are LOO errors of Arch-0’s N NLH , while orange 1.5
Φ LH
lines are Mid’s. The solid, dashed, and dotted lines correspond
1.0
to z = 0, 1, and 3, respectively. The overall mean errors
averaged over 6 redshifts are shown in the legends. 0.5
0.0
10 1 100 101
each technique in more detail, by comparing the perfor-
−1
mance of the component NNs (N NL and N NLH ) trained k (h Mpc )
with different approaches. Figures 6, 7 and 8 show the
rMAE of the component NNs, defined as the LOO error FIG. 7. Comparison of the two data compression strategies:
of the predicted power spectrum compared to the true global PCA (PCA-0, in blue) and separate PCA for each red-
power spectrum. This ensures that N NL and N NLH are shift (Mid, in orange). The top and bottom panels show the
evaluated separately and independently. Specifically, in LOO errors for N NL and N NLH , respectively. The solid,
Figure 6 and the lower panel of Figure 7, the component dashed, and dotted lines correspond to z = 0, 1, and 3, re-
shown is N NLH . Thus the input is the test cosmology spectively. The overall mean errors averaged over 6 redshifts
and the true LF power spectrum, instead of the LF power are shown in the legends.
spectrum predicted by N NL . In the upper panel of Fig-
ure 7 and Figure 8, the component shown is N NL . In this
case, both the predicted power spectrum and the true
power spectrum that it is tested against are LF power the information of the data is not fully exploited. The
spectra. improvement is likely due to the aforementioned signifi-
cantly reduced complexity of the NN (Sec. II B) relative
to the original architecture [44].
A. Architecture: 2-Step vs. Modified 2-Step
IV. CONCLUSION
YY and SB acknowledge funding from NASA ATP
80NSSC22K1897. MFH is supported by the Leinweber
We have developed T2N-MusE, a multifidelity neural net- Foundation and DOE grant DE-SC0019193. Computing
work framework for cosmological emulation, which is ca- resources were provided by Frontera LRAC AST21005.
pable of building highly optimized regression models to The authors acknowledge the Frontera and Vista com-
predict summary statistics. This framework is character- puting projects at the Texas Advanced Computing Cen-
ized by a novel 2-step architecture, per-z PCA for data ter (TACC, http://www.tacc.utexas.edu) for provid-
compression, 2-stage hyperparameter optimization, and ing HPC and storage resources that have contributed to
a 2-phase training strategy for the low-fidelity regression the research results reported within this paper. Frontera
model. This NN approach improves on our earlier GP and Vista are made possible by National Science Foun-
approach by a factor of more than 5 on the same data.11 dation award OAC-1818253.
11 The training of GokuEmu used both the L1 and L2 nodes, while estimate.
we only use L2 in this work. So a factor of 5 is a very conservative
11
[1] DESI Collaboration, A. Aghamousa, and J. Aguilar et ulation, The Open Journal of Astrophysics 7, 10 (2024),
al., The DESI Experiment Part I: Science,Targeting, and arXiv:2307.14339 [astro-ph.CO].
Survey Design, arXiv e-prints , arXiv:1611.00036 (2016), [14] M. Bonici, G. D’Amico, J. Bel, and C. Carbone, Ef-
arXiv:1611.00036 [astro-ph.IM]. fort: a fast and differentiable emulator for the Ef-
[2] P. A. Abell et al., LSST Science Book, Version 2.0 fective Field Theory of the Large Scale Structure of
(arXiv, 2009) arXiv:0912.0201 [astro-ph.IM]. the Universe, arXiv e-prints , arXiv:2501.04639 (2025),
[3] R. Laureijs et al., Euclid Definition Study Report, arXiv arXiv:2501.04639 [astro-ph.CO].
e-prints , arXiv:1110.3193 (2011), arXiv:1110.3193 [astro- [15] K. Heitmann, D. Higdon, M. White, S. Habib, B. J.
ph.CO]. Williams, E. Lawrence, and C. Wagner, THE COY-
[4] R. Akeson et al., The Wide Field Infrared Survey OTE UNIVERSE. II. COSMOLOGICAL MODELS
Telescope: 100 Hubbles for the 2020s, arXiv e-prints AND PRECISION EMULATION OF THE NONLIN-
, arXiv:1902.05569 (2019), arXiv:1902.05569 [astro- EAR MATTER POWER SPECTRUM, The Astrophys-
ph.IM]. ical Journal 705, 156 (2009).
[5] Y. Gong, X. Liu, Y. Cao, X. Chen, Z. Fan, R. Li, X.- [16] K. Heitmann, M. White, C. Wagner, S. Habib, and
D. Li, Z. Li, X. Zhang, and H. Zhan, Cosmology from D. Higdon, THE COYOTE UNIVERSE. I. PRECISION
the Chinese Space Station Optical Survey (CSS-OS), DETERMINATION OF THE NONLINEAR MATTER
Astrophys. J. 883, 203 (2019), arXiv:1901.04634 [astro- POWER SPECTRUM, The Astrophysical Journal 715,
ph.CO]. 104 (2010).
[6] M. Takada, R. S. Ellis, M. Chiba, J. E. Greene, H. Ai- [17] K. Heitmann, E. Lawrence, J. Kwan, S. Habib,
hara, N. Arimoto, K. Bundy, J. Cohen, O. Doré, and D. Higdon, THE COYOTE UNIVERSE EX-
G. Graves, J. E. Gunn, T. Heckman, C. M. Hirata, TENDED: PRECISION EMULATION OF THE MAT-
P. Ho, J.-P. Kneib, O. Le Fèvre, L. Lin, S. More, H. Mu- TER POWER SPECTRUM, The Astrophysical Journal
rayama, T. Nagao, M. Ouchi, M. Seiffert, J. D. Silver- 780, 111 (2013).
man, L. Sodré, D. N. Spergel, M. A. Strauss, H. Sugai, [18] J. DeRose, R. H. Wechsler, J. L. Tinker, M. R. Becker,
Y. Suto, H. Takami, and R. Wyse, Extragalactic science, Y.-Y. Mao, T. McClintock, S. McLaughlin, E. Rozo, and
cosmology, and Galactic archaeology with the Subaru Z. Zhai, The Aemulus Project. I. Numerical Simulations
Prime Focus Spectrograph, Publications of the Astro- for Precision Cosmology, The Astrophysical Journal 875,
nomical Society of Japan 66, R1 (2014), arXiv:1206.0737 69 (2019).
[astro-ph.CO]. [19] T. McClintock, E. Rozo, M. R. Becker, J. DeRose, Y.-
[7] T. Auld, M. Bridges, M. P. Hobson, and S. F. Y. Mao, S. McLaughlin, J. L. Tinker, R. H. Wechsler,
Gull, Fast cosmological parameter estimation using neu- and Z. Zhai, The Aemulus Project. II. Emulating the
ral networks, MNRAS 376, L11 (2007), arXiv:astro- Halo Mass Function, The Astrophysical Journal 872, 53
ph/0608174 [astro-ph]. (2019).
[8] T. Auld, M. Bridges, and M. P. Hobson, COSMONET: [20] Z. Zhai, J. L. Tinker, M. R. Becker, J. DeRose, Y.-Y.
fast cosmological parameter estimation in non-flat mod- Mao, T. McClintock, S. McLaughlin, E. Rozo, and R. H.
els using neural networks, MNRAS 387, 1575 (2008), Wechsler, The Aemulus Project. III. Emulation of the
arXiv:astro-ph/0703445 [astro-ph]. Galaxy Correlation Function, The Astrophysical Journal
[9] G. Aricò, R. E. Angulo, and M. Zennaro, Accelerating 874, 95 (2019).
Large-Scale-Structure data analyses by emulating Boltz- [21] R. E. Smith and R. E. Angulo, Precision modelling of
mann solvers and Lagrangian Perturbation Theory, arXiv the matter power spectrum in a Planck-like Universe,
e-prints , arXiv:2104.14568 (2021), arXiv:2104.14568 Monthly Notices of the Royal Astronomical Society 486,
[astro-ph.CO]. 1448 (2019).
[10] A. Spurio Mancini, D. Piras, J. Alsing, B. Joachimi, [22] T. Nishimichi, M. Takada, R. Takahashi, K. Osato,
and M. P. Hobson, COSMOPOWER: emulating cosmo- M. Shirasaki, T. Oogi, H. Miyatake, M. Oguri, R. Mu-
logical power spectra for accelerated Bayesian inference rata, Y. Kobayashi, and N. Yoshida, Dark Quest. I. Fast
from next-generation surveys, MNRAS 511, 1771 (2022), and Accurate Emulation of Halo Clustering Statistics and
arXiv:2106.03846 [astro-ph.CO]. Its Application to Galaxy Clustering, The Astrophysical
[11] A. Nygaard, E. B. Holm, S. Hannestad, and T. Tram, Journal 884, 29 (2019).
CONNECT: a neural network based framework for em- [23] D. Valcin, F. Villaescusa-Navarro, L. Verde, and A. Rac-
ulating cosmological observables and cosmological pa- canelli, BE-HaPPY: bias emulator for halo power spec-
rameter inference, Journal of Cosmology and Astropar- trum including massive neutrinos, Journal of Cosmology
ticle Physics 2023, 025 (2023), arXiv:2205.15726 [astro- and Astroparticle Physics 2019 (12), 057.
ph.IM]. [24] G. Aricò, R. E. Angulo, S. Contreras, L. Ondaro-Mallea,
[12] S. Günther, J. Lesgourgues, G. Samaras, N. Schöneberg, M. Pellejero-Ibañez, and M. Zennaro, The BACCO sim-
F. Stadtmann, C. Fidler, and J. Torrado, CosmicNet II: ulation project: a baryonification emulator with neural
emulating extended cosmologies with efficient and ac- networks, MNRAS 506, 4070 (2021), arXiv:2011.15018
curate neural networks, Journal of Cosmology and As- [astro-ph.CO].
troparticle Physics 2022, 035 (2022), arXiv:2207.05707 [25] F. Villaescusa-Navarro, C. Hahn, E. Massara, A. Baner-
[astro-ph.CO]. jee, A. M. Delgado, D. K. Ramanah, T. Charnock,
[13] M. Bonici, F. Bianchini, and J. Ruiz-Zapatero, Capse.jl: E. Giusarma, Y. Li, E. Allys, A. Brochard, C. Uhlemann,
efficient and auto-differentiable CMB power spectra em- C.-T. Chiang, S. He, A. Pisani, A. Obuljen, Y. Feng,
12
lanta, Georgia, USA, 2013) pp. 115–123. it on the available test set. Goku-pre-N contains 297 pairs
[53] R. Kohavi, A study of cross-validation and Bootstrap for of LF simulations and 27 HF simulations in the training
accuracy estimation and model selection, in Proceedings set and 12 HF simulations in the test set. Following the
of the International Joint Conference on Artificial Intelli- main text, we do not use L1 simulations in this study.
gence (IJCAI) (Morgan Kaufmann, 1995) pp. 1137–1143. The HF simulations evolve 3003 particles in a box of size
[54] Y. Yang, S. Bird, M.-F. Ho, and M. Qezlou, Ten-
100 Mpc/h. For more details about the Goku-pre-N simu-
dimensional neural network emulator for the nonlinear
matter power spectrum, arXiv e-prints (2025), submit- lations, see Ref. [35].
ted concurrently to arXiv.
[55] S. Bird, M. Fernandez, M.-F. Ho, M. Qezlou, R. Monadi,
Y. Ni, N. Chen, R. Croft, and T. Di Matteo, PRIYA: a
new suite of Lyman-α forest simulations for cosmology,
Journal of Cosmology and Astroparticle Physics 2023,
037 (2023), arXiv:2306.05471 [astro-ph.CO]. The LOO error and test error of the emulator are
shown in the top and bottom panels of Fig. 9, respec-
tively. They are consistent with each other, with the
Appendix A: LOOCV vs. Separate Test Set test error being slightly lower than the LOO error. This
indicates that the LOO cross-validation is a good repre-
We train an emulator based on the preliminary simula- sentative of the generalization error of the final emulator
tion set, Goku-pre-N, using the Optimal approach and test trained on the full training set.
14
LOOCV (2.24%)
20
15
Φ rMAE (%)
10
0
Test (2.05%)
20
z = 0.0
z = 0.2
15 z = 0.5
z = 1.0
Φ rMAE (%)
10 z = 2.0
z = 3.0
0
100 101
−1
k (h Mpc )