Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization

Hongshu Guo South China University of TechnologyGuangzhouGuangdongChina , Wenjie Qiu South China University of TechnologyGuangzhouGuangdongChina , Zeyuan Ma South China University of TechnologyGuangzhouGuangdongChina , Xinglin Zhang South China University of TechnologyGuangzhouGuangdongChina , Jun Zhang Nankai University, China; Hanyang UniversitySouth Korea and Yue-Jiao Gong South China University of TechnologyGuangzhouGuangdongChina *Corresponding-Author: [email protected]

(2018)

Abstract.

Recent research in Cooperative Coevolution (CC) have achieved promising progress in solving large-scale global optimization problems. However, existing CC paradigms have a primary limitation in that they require deep expertise for selecting or designing effective variable decomposition strategies. Inspired by advancements in Meta-Black-Box Optimization, this paper introduces LCC, a pioneering learning-based cooperative coevolution framework that dynamically schedules decomposition strategies during optimization processes. The decomposition strategy selector is parameterized through a neural network, which processes a meticulously crafted set of optimization status features to determine the optimal strategy for each optimization step. The network is trained via the Proximal Policy Optimization method in a reinforcement learning manner across a collection of representative problems, aiming to maximize the expected optimization performance. Extensive experimental results demonstrate that LCC not only offers certain advantages over state-of-the-art baselines in terms of optimization effectiveness and resource consumption, but it also exhibits promising transferability towards unseen problems.

CMA-ES, cooperative co-evolution, reinforcement learning, large scale global optimization, meta-black-box optimization

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†journal: JACM^†^†journalvolume: 37^†^†journalnumber: 4^†^†article: 111^†^†publicationmonth: 8^†^†ccs: Mathematics of computing Bio-inspired optimization^†^†ccs: Computing methodologies Reinforcement learning

1. Introduction

Black box optimization (BBO) is a class of optimization problems whose objective function is either unknown or too intricate to be mathematically formulated (Ma et al., 2024b). Consequently, BBO requires interaction-based information acquisition without access to underlying mathematical expressions or gradients. Within the context of BBO, Large-Scale Global Optimization (LSGO), which involves thousands to tens of thousands of variables, has numerous real-world applications (Elsken et al., 2019; Dranka et al., 2021; Bhattacharya et al., 2016) to drive resource savings, cost control, and efficiency enhancement (Guidotti et al., 2018; Omidvar et al., 2021a; Liu et al., 2024; Zhang et al., 2024). Many works have proposed LSGO variants of algorithms originally applied to lower-dimensional BBO problems, such as Sep-CMAES(Ros and Hansen, 2008), LM-MA-ES(Loshchilov et al., 2018), and so on(Akimoto and Hansen, 2016; Loshchilov, 2017; He et al., 2020; Li and Zhang, 2017), to tackle such problems. Besides, Persistent Evolution Strategies (PES) (Vicol et al., 2021), presented in an outstanding paper at ICML-2021, combines ideas from gradient-based optimization with evolutionary strategies to improve optimization efficiency and accuracy. However, the “curse of dimensionality” represents a significant challenge for such problems: as the number of variables increases, the complexity of optimization grows exponentially, necessitating extensive iterations for exploration (Hammer, 1962).

Refer to caption — Figure 1. The core idea of LCC-CMAES.

To address LSGO, inspired by the divide-and-conquer philosophy, a framework named Cooperative Co-evolution (CC) first divides the variables into several subgroups, then optimize these subgroups (considered as lower-dimensional BBO problems) using Evolutionary Algorithms (EAs), and finally integrates them into a comprehensive global optimization solution (Potter and De Jong, 1994; Jia et al., 2020; Chen et al., 2019; Yang et al., 2008). In the CC framework, an important issue is placing non-separable variables within the same subgroup to accurately divide the problem dimensions, which is so called decomposition strategy (Van den Bergh and Engelbrecht, 2004). The researchers initially tried random decomposition and some decomposition strategies utilizing statistical data but did not obtain satisfactory results (Potter and De Jong, 1994; Van den Bergh and Engelbrecht, 2004). Later, they attempted to dynamically select strategies by calculating the probability of each using a table of historical statistical information, designed by expert-level knowledge, which yielded some positive effects (e.g., CC-CMAES (Liu and Tang, 2013)). Furthermore, the researchers designed a series of decomposition strategies based on expert-level knowledge to more accurately identify variable interactions for precise decomposition, but this precise decomposition led to substantial additional function evaluations (FEs) costs (Sun et al., 2017; Omidvar et al., 2017; Tian et al., 2024b). According to the above, a primary limitation in the current CC framework is the Expert-Level Knowledge Dependency: these decomposition strategies are based on hand-crafted rules, heavily reliant on expert-level optimization knowledge and might not be generalizable towards unseen problems. Therefore, considering methods that do not require expert-level knowledge for decomposition could be a more suitable solution for tackling challenging real-world problems.

To alleviate the burdensome task of manual fine-tuning with expert-level knowledge, recent research has proposed the concept of Meta-Black-Box Optimization (MetaBBO) (Ma et al., 2024b, c; Li et al., 2024a; Mo et al., 2025; Li et al., 2025, 2024b). This paradigm has showcased the power of leveraging deep reinforcement learning (DRL) in a data-driven fashion at the meta-level to mitigate expert-level knowledge of low-level black-box optimizers. Numerous studies have shown that MetaBBO enables the black-box optimizers to achieve more effective optimization performance through enhanced parameter configuration (Xue et al., 2022; Ma et al., 2024a; Sun et al., 2021), algorithm/operator selection (Guo et al., 2024; Liao et al., 2023), and update rule generation (Lange et al., 2023; Chen et al., 2024a, b; Yi et al., 2022). Inspired by MetaBBO, we introduce Learning-Based Cooperative Coevolution (LCC), a pioneering framework that dynamically schedules decomposition strategies without expertise during optimization processes. The main contributions of this work are summarized as follows:

•

LCC is designed to create an intelligent decision-making agent that autonomously selects effective decomposition strategies tailored to various problem environments and optimization states. We have formulated this process as a Markov Decision Process (MDP) and utilized DRL to construct the agent. This approach replaces traditional, expert-designed selection modes, marking a significant advance in automating and optimizing the decomposition strategy within the CC frameworks for large-scale BBO.
•

Taking the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as the underlying optimizer, we develop the LCC-CMAES algorithm. Figure 1 shows the core idea of LCC-CMAES. We have designed a set of straightforward yet representative statistical features to capture essential grouping information and reflect the optimization state. Based on the state, LCC selects an appropriate decomposition strategy from a strategy pool of random decomposition (RD), Min-Variance decomposition (MiVD) and Max-Variance decomposition (MaVD) - enhancing the efficacy of CMA-ES.
•

We conducted detailed comparisons with various leading LSGO algorithms in more challenging settings to illustrate the limitations in practical problems. The experimental results demonstrate that LCC-CMAES excels not only in terms of resource consumption but also in optimization results compared to other algorithms. Additionally, LCC-CMAES exhibits transferability, showing outstanding performance on other unseen problem sets after training.

The remainder of this paper is organized as follows: Section 2 discusses related work. Section 3 provides the preliminary knowledge necessary for understanding CC-CMAES and MDP. Section 4 describes the overall architecture of LCC, as well as the specific design of its MDP and network. Section 5 presents the experimental results and provides a detailed analysis. Finally, Section 6 concludes the paper and outlines future work.

2. Related Works

As mentioned earlier, our LCC is inspired by MetaBBO and operates within the CC framework. Therefore, in this section, we will review MetaBBO and several important problem decomposition strategies under the CC framework.

2.1. MetaBBO

To alleviate the burdensome task of manual fine-tuning, the concept of MetaBBO has been proposed by recent research (Ma et al., 2024b; Yang et al., [n. d.]; Ma et al., 2024c; Chen et al., 2025; Shao et al., 2025; Faldor et al., 2025). MetaBBO aims to refine black-box optimizers by identifying optimal configurations or parameters through an automatic decision process without requiring expertise, thereby boosting overall performance across various problem instances within a given problem domain. MetaBBO-RL is one of approaches of MetaBBO (Sharma et al., 2019; Tan and Li, 2021; Guo et al., 2025a; Ma et al., 2025b, a), which models the optimizer fine-tuning as a MDP and learns an RL agent to automatically make decisions without expertise. The meta-objective of MetaBBO-RL is to learn a policy (RL agent) $\Pi^{*}$ that maximizes the expectation of the accumulated meta-performance improvement $r_{t}$ (also called reward) over the problem set distribution $\xi$ , $\mathbb{E}_{\upsilon\sim\xi,\Pi^{*}}\left[\sum_{t=0}^{T}r_{t}\right]$ , where $T$ denotes the all times of making decisions and $\upsilon$ is the problem of problem set $\Upsilon$ . Specifically, in the aspect of operator selection, MetaBBO-RL automates the tuning process, significantly reducing the time and expertise needed to customize algorithms for specific unseen problems, while also potentially enhancing overall optimization performance (Xu and Pi, 2020; Wu and Wang, 2022; Yin et al., 2021; Guo et al., 2025b). This has been confirmed in numerous research studies: RL-DAS (Guo et al., 2024), based on MetaBBO-RL, selects operators for Differential Evolution algorithms, leveraging their complementary strengths to enhance optimization performance and demonstrating favorable generalization across different problem classes; RLDMDE (Yang et al., 2024) employs RL so that each subpopulation can adaptively select a mutation strategy based on the current environmental state (population diversity), thereby boosting the self-adaptation of subpopulations; similarly, RLEMMO (Lian et al., 2024), the first generalizable MetaBBO-RL framework for solving multimodal optimization problems (MMOP), selects operators for search strategies, directly addresses unseen problems, and achieves competitive optimization performance in both quality and diversity against several strong MMOP solvers.

2.2. CC and the Problem Decomposition Strategies

Inspired by the “divide and conquer” philosophy, CC is a framework to solve LSGO by the decomposition-based approach(Omidvar et al., 2021b, a). It first divides the variables into several subgroups, then optimize these subgroups using EAs, and finally integrates them into a global optimization solution.

CCGA (Potter and De Jong, 1994) is the first strategy to use CC for problem decomposition, splitting an $D$ -dimensional problem into $D$ one-dimensional problems, where $D$ is the dimensionality of the problem. However, both practical tests (Potter and De Jong, 1994) and theoretical analyses (Van den Bergh and Engelbrecht, 2004) have suggested that completely decomposing into one-dimensional problems poses a risk of introducing spurious minima. To mitigate this issue, strategies such as $k$ - $s$ dimensional decomposition and bipartite decomposition have been proposed (Van den Bergh and Engelbrecht, 2004; Shi et al., 2005), but these algorithms do not take into account the structure of the problem or interactions between variables, potentially placing interacting variables in different components, which adversely affects optimization performance (Omidvar et al., 2021a). To achieve more precise decomposition, researchers have started from the definition of separability, defining various types of separability such as additive separability (Li et al., 2013), multiplicative separability (Li et al., 2022), and composite separability (Tian et al., 2024b), and have developed a range of variable interaction identification algorithms, such as DG2 (Omidvar et al., 2017), RDG (Sun et al., 2017), ERDG (Yang et al., 2020), MDG (Ma et al., 2022), GDG (Mei et al., 2016) and CSG (Tian et al., 2024b). In addition, researchers have further studied how to accurately decompose overlapping variables, such as DOV (Meselhi et al., 2022), OCC (Komarnicki et al., 2024), and OEDG (Tian et al., 2024a). However, the cost of improving accuracy in this way includes a large number of expert-designed separability methods and additional FEs. Strategies based on probabilistic and statistical methods do not have these issues. They perform multiple rounds of grouping optimization before forming the final optimization result to capture problem structure and variable interactions (Yang et al., 2008). Relying on expertise, many algorithms were proposed (Tiwari et al., 2001; Roy and Tiwari, 2002; Tiwari and Roy, 2002; Soboĺ, 1993), such as the the Delta method (Omidvar et al., 2010) based on theory that the improvement intervals for inseparable variables are relatively smaller than those for separable variables (Salomon, 1996), the Fitness Difference Minimization (FDM) method exemplified by DIMA (Sayed et al., 2012) and CC-CMAES (Liu and Tang, 2013) based on covariance matrices and expert-designed selection mode. Besides, contribution-based decomposition methods (Yang et al., 2023), such as CCFR (Yang et al., 2016), DCC (Zhang et al., 2019), and CBCCO (Jia et al., 2020), represent another novel strategy. Although these two strategies do not have the additional FEs, they rely on expertise, so in different scenarios, they may fail to meet the requirements for reasonable decomposition(Omidvar et al., 2021a; Qiu et al., 2025). Therefore, considering methods that do not require expert-level knowledge for decomposition might be a more suitable solution for more challenging real-world problems.

3. Preliminaries

3.1. CC-CMAES

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) (Hansen, 2016) is a representative EA that operates by repeatedly sampling offspring according to a distribution and updating the distribution with the performance of the sampled offspring until a stopping criterion is met (e.g., reaching the total number of generations $TG$ ).

(1)

x_{(g+1)}^{(k)}\sim N\left(\omega_{(g)},\sigma_{(g)}^{2}\cdot C_{(g)}\right),% \quad k=1,2,\cdots,\lambda

Equation (1) shows the sampling process in a population $P$ with offspring size $\lambda$ at generation $g$ . $\omega_{(g)}\in\mathbb{R}^{D}$ , $C_{(g)}\in\mathbb{R}^{D\times D}$ , and $\sigma_{(g)}\in\mathbb{R}$ are the Gaussian mean, covariance matrix, and global step size, respectively, at generation $g$ . CC-CMAES (Liu and Tang, 2013) uses the CC framework with CMA-ES, featuring three decomposition strategies: Min-Variance Decomposition (MiVD), Random Decomposition (RD), and Max-Variance Decomposition (MaVD), ranging from exploitative to exploratory. It dynamically selects one strategy to optimize subgroups with CMA-ES for a fixed number of generations until termination criteria are met. MiVD, RD, and MaVD decompose the space based on the rank of the diagonal of the covariance matrix. MiVD sequentially selects $D/m$ variables following the rank order to minimize the diversity among their variances. In contrast, MaVD selects one variable, then skips $D/m$ variables to select the next variable each time, which maximizes diversity. RD randomly selects $D/m$ variables within each subspace. The subspace covariance matrix $C_{sub_{i}}\in\mathbb{R}^{(D/m)\times(D/m)}$ and mean $\omega_{sub_{i}}\in\mathbb{R}^{D/m}$ are extracted from the global covariance matrix $C$ and mean $\omega$ as $C_{sub_{i}}=C[subdims_{i},subdims_{i}],\omega_{sub_{i}}=\omega[subdims_{i}]$ , where $subdims_{i}\in[1,D]^{D/m}$ represents the dimension index set of subgroup $i\in\left[1,\dots,m\right]$ . $C$ and $\omega$ are updated using $C_{sub_{i}}$ and $\omega_{sub_{i}}$ is its inverse process.

3.2. Markov Decision Process

A Markov Decision Process (MDP) is commonly characterized as $\mathcal{M}:=<\mathcal{S},\mathcal{A},\mathcal{T},R>$ . At each time step $t$ , given the current environment state $s_{t}\in\mathcal{S}$ , an action $a_{t}\in\mathcal{A}$ is performed according to a policy $\Pi:\mathcal{S}\rightarrow\mathcal{A}$ . Then the environment reaches at the next state $s_{t+1}$ according to the transition dynamics $\mathcal{T}\left(s_{t+1}\mid s_{t},a_{t}\right)$ . The reward function $R:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}$ indicates the feedback $r_{t}$ from the environment. Given a finite horizon (suppose $T$ steps) of interactions with the environment, a sampled trajectory is defined as $\tau:=\left(s_{0},a_{0},s_{1},\cdots,s_{T}\right)$ . Then an MDP is solved by finding an optimal policy $\pi^{*}$ that maximizes the expected accumulated rewards over all possible trajectories:

(2)

\pi^{*}=\underset{\pi\in\Pi}{\arg\max}\mathbb{E}_{\tau\sim\pi(\tau)}[\sum_{t=0% }^{T}\gamma^{t-1}r_{t}]

where $\pi(\tau)$ denotes the sampling probability of $\tau$ and $\gamma$ is a pre-defined discount factor. In the context of DRL (Mnih et al., 2015), the policy $\pi$ is parameterized with a neural network $\pi_{\theta}$ , which makes the gradient based learning methods (e.g., PPO (Schulman et al., 2017)) available for searching the optimal policy.

4. Methodology

4.1. LCC Overview

LCC primarily consists of three main components: a problem set $\Upsilon$ with $N$ problems, a CC decomposition strategy pool $\Lambda$ , and an underlying EA optimizer (e.g., CMA-ES). $\Lambda$ includes a variety of strategies chosen from existing decomposition strategies. The detailed architecture, illustrated in Figure 2, can be conceptualized as an MDP. In an MDP, as introduced in Section 2, multiple elements are fundamental, such as state $s_{t}\in\mathcal{S}$ , action $a_{t}\in\mathcal{A}$ , and reward $R:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}$ . The DRL agent targets at the optimal policy $\pi^{*}$ that select an appropriate decomposition strategy in $\Lambda$ to maximize the expected accumulated reward over all the problems $\upsilon\in\Upsilon$ as $\pi^{*}=\underset{\pi\in\Pi}{\arg\max}\frac{1}{N}\sum_{k=1}^{N}\sum_{t=0}^{T}% \gamma^{t-1}R\left(s_{t},a_{t}|\upsilon\right)$ . First, a problem $\upsilon$ is selected from the problem set $\Upsilon$ . For this problem, we analyze Global Optimization (GO) information, Subgroup Decomposition (SD) information, and Action History (AH) information using Exploratory Landscape Analysis (ELA) (Mersmann et al., 2011) and Fitness Landscape Analysis (FLA) (Pitzer and Affenzeller, 2012). This analysis is used to design the state to ensure it contains sufficient information to select an appropriate decomposition strategy. Based on the state $s_{t}$ , LCC selects a decomposition strategy from the CC decomposition strategy pool $\Lambda$ to decompose the problem and subsequently optimize each subgroup using the underlying EA optimizer. A corresponding reward is designed to reflect improvements in the MDP $\mathcal{M}$ . Finally, the optimized subgroups are combined to form a new global population, completing an epoch. With CMA-ES as the underlying optimizer, we instantiate LCC, naming it the LCC-CMAES algorithm. Using LCC-CMAES as a concrete example, we will describe the specific design of the MDP and the network.

4.2. MDP Formulation

4.2.1. State

The state space of LCC-CMAES encompasses the Global Optimization (GO) information $s_{\text{GO}}\in\mathbb{R}^{12}$ , Subgroup Decomposition (SD) information $s_{\text{SD}}\in\mathbb{R}^{4\times m}$ , and Action History (AH) information $s_{\text{AH}}\in\mathbb{R}^{2\times L}$ . Details are shown in the Table 1.

Table 1. State features.

Feature	Feature Index	Calculation Formula	Explain
$s_{\text{GO}}$	1	$\text{Max}(\frac{\omega_{t}}{radius})$	$\text{Max}(\cdot),\text{Min}(\cdot)$ extracts the maximum and minimum element in the vector. $\text{Mean}(\cdot)$ extracts the mean of the vector elements. The $\omega_{t}$ is the global mean at step $t$ . The $radius$ is the search radius of the problem, which is half of the difference between the upper and lower bounds. To reflect the state of population optimization and status of CMA-ES
	2	$\text{Mean}(\frac{\omega_{t}}{radius})$
	3	$\text{Min}(\frac{\omega_{t}}{radius})$
	4	$\text{Max}(\text{Corrcoef}(C_{t}))$	$\text{Corrcoef}(\cdot)$ transforms a covariance matrix into a correlation coefficient matrix. $C_{t}$ is the global covariance matrix at step $t$ . To reflect the correlations between variables and status of CMA-ES
	5	$\text{Mean}(\text{Corrcoef}(C_{t}))$
	6	$\text{Min}(\text{Corrcoef}(C_{t}))$
	7	$\frac{\sigma_{t}}{radius}$	The $\sigma_{t}$ is the global step size at step $t$ . To reflect the current global exploration and exploitation conditions
	8	$\text{Max}(\frac{gbest_{t}}{radius})$	The $gbest_{t}$ is the global best point at step $t$ . To reflect the current position of the optimization state within the problem domain.
	9	$\text{Mean}(\frac{gbest_{t}}{radius})$
	10	$\text{Min}(\frac{gbest_{t}}{radius})$
	11	$\frac{f^{}_{t}}{f^{}_{t-1}}$	The $f^{*}_{t}$ is the global best fitness at step $t$ . To reflect the incremental optimization effects of each step.
	12	$\frac{FEs}{MaxFEs}$	$FEs$ is the number of remaining function evaluations, and $MaxFEs$ is the maximum number of function
	12	$\frac{FEs}{MaxFEs}$	evaluations. To keep the agent informed about computational budget consumption
$s_{\text{SD}}$	13-22	$\text{Mean}(\text{Corrcoef}(C_{sub_{i}}))$	$C_{sub_{i}}$ is the covariance matrix of subgroup $i$ . To reflect the correlation between variables within the subgroup.
	23-32	$\text{Mean}(\frac{\Delta subpop_{i}}{\lambda\times radius})$	The $\Delta subpop_{i}$ is the sum of the vector set consisting of the difference between the last generation and the first
			generation of each element of the subgroup. Inspired by the Delta method mentioned in Section 2.2,
			it aim to reveal interactions between variables.
	33-42	$\text{Mean}(\frac{\text{Var}(subpop_{i})}{radius^{2}})$	$\text{Var}(\cdot)$ is calculated for the elements at each position within the vector set and $subpop_{i}$ is a vector set of all
			the generations of the subgroup. To reflect the volatility of the population across different dimensions within
			that subgroup and the exploration and exploitation of each dimension.
	43-52	$\frac{d_{max_{i}}}{diameter}$	The $d_{max_{i}}$ is the maximum distance in the subgroup $i$ ’s population. $diameter$ is the diameter of the search space.
	43-52	$\frac{d_{max_{i}}}{diameter}$	To describe the convergence state of the population within subgroup $i$ .
$s_{\text{AH}}$	53-55	$\frac{\sum(\Delta r_{j})}{num_{j}}$	$\sum(\cdot)$ is the sum of all values $\cdot$ for action $j$ and $j$ is the index of action, j = 1,2,3. The $\Delta r_{j}$ refers to the difference between
			the reward obtained at step $t$ for action $j$ and the reward obtained at step $t-1$ . The $num_{j}$ is the number of times
			action $j$ has been selected. To reflect the algorithm $j$ ’s contribution to optimization.
	56-58	$\frac{\sum(\Delta gbest^{(j)})}{2\times radius\times num_{j}}$	The $\Delta gbest^{(j)}$ refers to the Euclidean norm of the difference between $gbest_{t}$ obtained at step $t$ for action $j$
	56-58		and $gbest_{t-1}$ obtained at step $t-1$ . To reflect the algorithm $j$ ’s effectiveness in optimization.

For $s_{\text{GO}}$ , reflects the CMA-ES state and global optimization state, revealing the complexity and difficulty of the optimization problem as well as the relationships between various dimensions. For $s_{\text{SD}}$ , we have designed four types of features based on probabilistic and statistical methods within the CC framework to reflect the variable grouping status within a subgroup, which provides detailed insights into the dynamics of variable relationships and optimization progress in the subgroups. For $s_{\text{AH}}$ , given that LCC includes a CC decomposition strategy pool $\Lambda$ , we derive $s_{\text{AH}}$ to provide the RL agent with additional contextual knowledge about the optimization capabilities of the candidate strategies.

Finally, the complete state in the MDP of LCC-CMAES is the integration of $s_{\text{GO}}$ , $s_{\text{SD}}$ and $s_{\text{AH}}$ .

(3)

\text{ state }:=\left\{s_{\text{GO}}\in\mathbb{R}^{12},s_{\text{SD}}\in\mathbb% {R}^{4\times m},s_{\text{AH}}\in\mathbb{R}^{2\times L}\right\}

Here, $m$ represents the number of subgroups (where $m$ is 10 in LCC-CMAES), and $L$ denotes the number of CC decomposition strategies in the pool (where $L$ is 3 in LCC-CMAES).

4.2.2. Action

We designed a strategy pool $\Lambda$ in advance, containing various decomposition strategies for selection. LCC selects a CC decomposition strategy from $\Lambda$ based on the state to achieve dynamic decomposition. For the purpose of balancing exploration and exploitation, LCC-CMAES utilize three types of decomposition strategies (Liu and Tang, 2013): MiVD, RD, and MaVD as introduced in Section 3.1. This operation is represented as an integer, which indicates the index of the chosen strategy within the strategy pool of $L$ candidate strategies, denoted as $a\in[1,L]$ . Next, based on the selected strategy, the problem is divided into smaller-dimensional subproblems and then optimized using CMA-ES. The optimization results of each subproblem are subsequently combined into a global optimization result.

4.2.3. Reward

To guide the agent towards achieving a lower cost, the reward function should consider the absolute reduction in cost at each time step $t$ :

(4)

r_{t}=\frac{f_{t-1}^{*}-f_{t}^{*}}{f_{0}^{*}-f^{*}}

where $f_{t-1}^{*}$ and $f_{t}^{*}$ are the global best fitness in the $t$ - $1$ step and the $t$ step. $f^{*}$ is the optimal fitness of the problem, $f_{0}^{*}$ is the global best fitness in the initial population, which serves as a normalization factor. This measures the performance improvement brought in the step $t$ optimization.

4.3. Network Design

As shown in Figure 3, the network consists of three modules: Feature Processing, Actor, and Critic. $s_{\text{GO}}$ , $s_{\text{SD}}$ , and $s_{\text{AH}}$ are first fused to form a state representation vector $DV$ . Based on this representation, the Actor outputs the probability distribution of candidate strategies, while the Critic estimates the return value.

The Actor decides the probability for selecting a strategy from the CC decomposition strategies pool. As mentioned in Section 4.2.1, LCC-CMAES has $m=10$ subgroups and $L=3$ actions. We first concatenate $s_{\text{GO}}\in\mathbb{R}^{12}$ , $s_{\text{SD}}\in\mathbb{R}^{4\times m}$ and $s_{\text{AH}}\in\mathbb{R}^{2\times L}$ to generate the Decision Vector $DV\in\mathbb{R}^{58}$ as $DV=s_{\text{GO}}\oplus s_{\text{SD}}\oplus s_{\text{AH}}$ . Then we map $DV$ to a three-layer Multi-Layer Perceptron (MLP) network with the structure ( $58\times 64\times 64\times L$ ), cooperating with a ReLU (Nair and Hinton, 2010) activation after each hidden layer. Following the Softmax operation, the Actor outputs a probability distribution over the strategy pool $\Lambda$ , which is then used to sample the strategy.

The Critic also takes $DV$ as input and uses the same MLP structure as Actor, where the output dimension is set to be $1$ for critic value prediction. However, their MLP parameters are not shared, and the training is conducted independently.

4.4. Workflow

LCC-CMAES’s workflow begins with selecting a problem $\upsilon$ from the problem set $\Upsilon$ , initializing the global dimension $D$ , global covariance matrix $C_{0}=I$ , global population $P_{0}$ , global step size $\sigma_{0}=radius$ and global mean vector $\omega_{0}$ . Then training for $\upsilon$ starts and terminates when $MaxFEs$ is exhausted or the global best fitness $gbest_{t}$ is lower than the termination error. After initialization, an MDP starts. At step $t$ , state $s_{t}$ can be calculated by following Table 1. Based on $s_{t}$ , the Actor policy $\pi_{\theta}$ with parameters $\theta$ takes the Decision Vector $s_{t}$ as input and outputs the probability distribution of candidate strategies $\pi(a_{t}|s_{t})$ , while the the critic network $v_{\phi}$ with parameters $\phi$ predicts the expected return values (accumulated rewards) of $s_{t}$ . Once the strategy is determined, the problem is decomposed into subgroups, marking the end of the CC problem decomposition layer and transitioning into the subgroup optimization layer. Each subgroup is optimized using CMA-ES until $SubMaxFEs$ is reached, with $\sigma_{t}$ updated by the offspring in each subgroup. Once all subgroup optimization completed, $\sigma_{t}$ is updated to $\sigma_{t+1}$ , $C_{t}$ , $\omega_{t}$ , and $P_{t}$ are updated to $C_{t+1}$ , $\omega_{t+1}$ , and $P_{t+1}$ , using on the $C_{sub_{i}}$ and $\omega_{sub_{i}}$ obtained from each subgroup, as mentioned in Section 3.1. At this point, state $s_{t+1}$ can be calculated by following Table 1. Then the reward $r_{t}$ is observed. The trajectories of states $s_{t}$ , actions $a_{t}$ , and rewards $r_{t}$ are recorded and then used by the PPO method to train the policy net $\pi_{\theta}$ and the critic net $v_{\phi}$ for $K$ times after the completion of optimization. PPO is trained in an actor-critic manner. It proposes a novel objective with clipped probability ratios, which forms a first-order estimate (i.e., lower bound) of the policy’s performance. Its objective function at the $k$ -th learning iteration ( $k\in[1,K]$ ) is defined as: $L_{\pi}(\theta^{(k)}):=\mathbb{E}\left[\min\left(\eta(\theta^{(k)})\hat{A},% \operatorname{clip}(\eta(\theta^{(k)}),1-\epsilon,1+\epsilon)\hat{A}\right)\right]$ where $\eta^{(k)}:=\frac{\pi_{\theta^{(k)}}(a_{t}|s_{t})}{\pi_{\theta^{(0)}}(a_{t}|s_% {t})}$ is the ratio of the probabilities under the current policy and the old policy before the $K$ -step learning process, performing the importance sampling. $\hat{A}$ is the estimated advantage calculated as the difference between the target return $G$ and the estimated return $\hat{G}$ . Using $L_{\pi}(\theta)$ , the gradients are back-propagated through the network to update the parameters and achieve the training effect. The critic network $v_{\phi}$ takes the Decision Vector as input and outputs a critic value prediction to estimate the return value $\hat{G}$ . The loss function of the critic network $v_{\phi}$ is: $L_{v}(\phi):=\operatorname{MSE}(G,\hat{G})$ .

5. Experiments

5.1. Experimental Setup

5.1.1. Comparison Algorithms for LCC-CMAES

For a comprehensive comparisons, we selected CC-CMAES (Liu and Tang, 2013), CSG (Tian et al., 2024b), ERDG (Yang et al., 2020), MDG (Chen et al., 2022), and FII (Ge et al., 2015) as the comparison algorithms under CC framework. We then selected LSGO baselines without CC: CMA-ES (Hansen, 2016), Sep-CMAES (Ros and Hansen, 2008), LM-CMA (Loshchilov, 2017), LM-MA-ES (Loshchilov et al., 2018). Besides, MetaBBO method MetaES (Lange et al., 2023) and local search method L-BFGS (Byrd et al., 1995) were also chosen. Among the comparison algorithms under CC framework, CC-CMAES, having the same decomposition strategy pool, is introduced in Section 3.1. CSG, ERDG, MDG, FII are the algorithms that need addition FEs costs for decomposition: CSG is currently the most powerful multi-stage variable identification algorithm; ERDG is a more efficient variant of RDG3 (Sun et al., 2019) (the 2018 CEC LSGO champion); MDG is an algorithm that addresses overlapping problems; FII is a rapid identification algorithm that reduces the FEs consumed by decomposition. These algorithms under CC framework all use CMA-ES as the underly optimizer. Among other types of comparison algorithms, MetaES discovers evolutionary strategies via MetaBBO and serves as the global optimization comparison algorithm within MetaBBO; L-BFGS is an optimization algorithm from outside the evolutionary algorithms community, and it is widely used in practical applications, especially for LSGO.

5.1.2. Benchmark and Hyperparameter Settings

The CEC 2013 LSGO benchmark (Li et al., 2013) comprises a total of $N=15$ problems, which are divided into five types of functions: Fully-separable Functions (F1-F3), Partially Additively Separable Functions with a separable subcomponent (F4-F7), Partially Additively Separable Functions with no separable subcomponents (F8-F11), Overlapping Functions (F12-F14), and Non-separable Function (F15). Additionally, we partitioned the CEC 2013 LSGO benchmark suite, as shown in Table 2. An asterisk “*” marks the problems used for training, while the rest were used for testing. Except for the Non-separable Function, each category has training problems, and F1, F4, and F8 are variants of the Elliptic Function; F5 and F9 are variants of the Rastrigin Function. This allows for testing LCC-CMAES’s generalizability on unseen functions within the same type, with the Non-separable Functions tested as the unseen functions and type.

The hyperparameter settings in this paper are as follows: the total number of generations ( $TG$ ) is 50, the offspring size ( $\lambda$ ) is 20, subgroup maximum function evaluations ( $SubMaxFEs$ ) is 1E3, the number of subgroups ( $m$ ) is 10, learning rate ( $lr$ ) is 6E-4, number of epochs ( $Epoch$ ) is 90, Mini PPO iterations ( $K$ ) is 3, the problem’s dimension ( $D$ ) is 1E3 (F13 and F14 in CEC2013LSGO is 905), the number of action selecting ( $ns$ ) is 20. It’s worth noting that, to realistically address the practical problems and accommodate the extra decomposition costs required by various decomposition strategies, we set $MaxFEs=TG\times\lambda\times m\times ns=50\times 20\times 10\times 20=$ 2E5. The settings of the comparison algorithms remain the same as those in their original papers.

5.2. Comparison Analysis

Table 2. Comparing LCC-CMAES with comparison algorithms on CEC 2013 LSGO.

problem	Algorithms under CC Framework						Algorithms without CC
problem	LCC-CMAES	CC-CMAES	CSG	ERDG	MDG	FII	CMA-ES	Sep-CMAES	LM-CMA	LM-MA-ES	MetaES	L-BFGS
*1	2.045E+07	3.244E+08(+)	4.182E+08(+)	4.227E+08(+)	4.182E+08(+)	4.227E+08(+)	4.223E+08(+)	8.591E+06(-)	2.313E+07(+)	4.691E+07(+)	2.420E+11(+)	9.085E+09(+)
	$\pm$ 2.781E+06	$\pm$ 1.112e+08	$\pm$ 3.882E+07	$\pm$ 3.825E+07	$\pm$ 3.884E+07	$\pm$ 2.231E+07	$\pm$ 3.827E+07	$\pm$ 1.777e+06	$\pm$ 4.576E+06	$\pm$ 2.973E+06	$\pm$ 2.351E+10	$\pm$ 1.396E+08
2	4.419E+03	2.159E+03(-)	2.636E+03(-)	2.555E+03(-)	2.636E+03(-)	2.592E+03(-)	5.108E+03(+)	5.414E+03(+)	1.949E+04(+)	6.886E+03(+)	4.789E+04(+)	4.050E+04(+)
	$\pm$ 2.022E+02	$\pm$ 4.125E+02	$\pm$ 1.393E+02	$\pm$ 1.294E+02	$\pm$ 1.397E+02	$\pm$ 9.076E+01	$\pm$ 2.32E+02	$\pm$ 4.213E+01	$\pm$ 1.483E+03	$\pm$ 2.674E+02	$\pm$ 5.721E+03	$\pm$ 3.281E+03
3	2.007E+01	2.036E+01(+)	2.162E+01(+)	\(+)	2.161E+01(+)	2.161E+01(+)	2.161E+01(+)	2.11E+01(+)	2.052E+01(+)	2.172E+01(+)	2.171E+01(+)	2.165E+01(+)
	$\pm$ 4.182E-02	$\pm$ 2.289E-02	$\pm$ 4.061E-03	\	$\pm$ 7.218E-03	$\pm$ 2.374E-01	$\pm$ 4.197E-02	$\pm$ 2.394E-02	$\pm$ 3.692E-02	$\pm$ 3.486E-02	$\pm$ 7.976E-01	$\pm$ 7.686E-02
*4	6.932E+10	1.902E+11(+)	3.502E+10(-)	4.561E+10(-)	3.270E+10(-)	2.603E+12(+)	2.681E+12(+)	1.632E+11(+)	2.492E+10(-)	1.220E+11(+)	2.440E+12(+)	5.084E+12(+)
	$\pm$ 9.792E+09	$\pm$ 8.122E+10	$\pm$ 3.645E+09	$\pm$ 3.218E+09	$\pm$ 6.397E+09	$\pm$ 1.218E+12	$\pm$ 7.639E+11	$\pm$ 2.938E+10	$\pm$ 1.009E+09	$\pm$ 8.678E+10	$\pm$ 5.409E+11	$\pm$ 5.728E+11
*5	5.386E+06	9.455E+06(+)	1.079E+06(-)	2.053E+06(-)	1.146E+06(-)	1.172E+06(-)	4.205E+06( $\approx$ )	2.274E+06(-)	9.370E+06(+)	1.534E+06(-)	4.976E+07(+)	4.890E+07(+)
	$\pm$ 1.982E+06	$\pm$ 2.514E+06	$\pm$ 1.485E+05	$\pm$ 2.363E+05	$\pm$ 1.938E+05	$\pm$ 1.845E+05	$\pm$ 2.954E+05	$\pm$ 4.255E+05	$\pm$ 1.295E+06	$\pm$ 1.126E+05	$\pm$ 5.214E+06	$\pm$ 2.386E+06
6	1.048E+06	1.056E+06( $\approx$ )	1.066E+06(+)	\(+)	1.066E+06(+)	1.065E+06(+)	1.062E+06(+)	1.078E+06(+)	1.038E+05( $\approx$ )	1.063E+06(+)	1.000E+06(-)	1.071E+06(+)
	$\pm$ 2.147E+03	$\pm$ 4.632E+03	$\pm$ 7.156E+02	\	$\pm$ 5.673E+02	$\pm$ 8.675E+02	$\pm$ 1.287E+03	$\pm$ 2.745E+03	$\pm$ 1.703E+03	$\pm$ 6.845E+03	$\pm$ 8.232E+03	$\pm$ 3.219E+04
7	7.306E+08	3.268E+09(+)	2.140E+07(-)	5.059E+07(-)	7.295E+06(-)	4.442E+09(+)	6.739E+08( $\approx$ )	1.756E+09(+)	3.080E+08(-)	2.871E+07(-)	3.240E+14(+)	1.210E+15(+)
	$\pm$ 2.502E+08	$\pm$ 2.404E+09	$\pm$ 8.465E+06	$\pm$ 8.356E+06	$\pm$ 4.113E+06	$\pm$ 1.532E+09	$\pm$ 2.035E+08	$\pm$ 7.201E+08	$\pm$ 2.736E+07	$\pm$ 4.212E+06	$\pm$ 7.248E+13	$\pm$ 5.317E+10
*8	2.299E+15	1.547E+16(+)	3.488E+15( $\approx$ )	1.543E+16(+)	1.513E+15( $\approx$ )	2.885E+15( $\approx$ )	6.162E+16(+)	3.184E+15(+)	1.061E+13(-)	2.189E+15( $\approx$ )	1.940E+16(+)	1.460E+17(+)
	$\pm$ 2.047E+15	$\pm$ 7.053E+15	$\pm$ 2.872E+15	$\pm$ 6.205E+15	$\pm$ 1.194E+15	$\pm$ 1.413E+15	$\pm$ 1.492E+16	$\pm$ 7.612E+14	$\pm$ 4.268E+12	$\pm$ 1.062E+15	$\pm$ 1.382E+16	$\pm$ 2.014E+15
*9	5.818E+08	8.084E+08(+)	6.199E+08(+)	1.443E+09(+)	5.104E+08( $\approx$ )	6.007E+08( $\approx$ )	6.812E+08(+)	3.890E+08(-)	6.380E+08(+)	3.020E+08(-)	3.270E+09(+)	3.730E+09(+)
	$\pm$ 1.195E+08	$\pm$ 1.636E+08	$\pm$ 2.793E+07	$\pm$ 1.572E+08	$\pm$ 2.175E+08	$\pm$ 4.263E+07	$\pm$ 1.751E+07	$\pm$ 3.527E+07	$\pm$ 2.165E+06	$\pm$ 3.214E+07	$\pm$ 6.832E+07	$\pm$ 5.833E+08
10	9.423E+07	9.375E+07( $\approx$ )	9.464E+07( $\approx$ )	9.576E+07(+)	9.538E+07(+)	9.523E+07(+)	9.464E+07( $\approx$ )	9.447E+07( $\approx$ )	9.062E+07(-)	9.829E+07(+)	9.803E+07(+)	9.696E+07(+)
	$\pm$ 5.623E+05	$\pm$ 6.224E+05	$\pm$ 2.725E+05	$\pm$ 1.653E+05	$\pm$ 1.712E+05	$\pm$ 1.332E+05	$\pm$ 1.426E+05	1.012E+05	$\pm$ 1.191E+05	$\pm$ 3.321E+05	3.115E+05	$\pm$ 2.133E+05
11	8.243E+09	2.297E+11(+)	2.998E+17(+)	3.609E+17(+)	4.770E+17(+)	6.327E+17(+)	2.364E+10(+)	1.850E+10(+)	5.620E+08(-)	1.720E+09(-)	6.630E+22(+)	6.470E+16(+)
	$\pm$ 6.352E+09	$\pm$ 6.313E+10	$\pm$ 1.245E+15	$\pm$ 1.431E+16	$\pm$ 1.586E+15	$\pm$ 8.848E+14	$\pm$ 4.967E+09	$\pm$ 2.786E+08	$\pm$ 3.921E+07	$\pm$ 2.179E+08	$\pm$ 8.795E+21	$\pm$ 2.545E+15
*12	2.135E+03	1.766E+05(+)	2.497E+05(+)	4.140E+06(+)	1.321E+03(-)	1.079E+03(-)	1.103E+03(-)	1.068E+03(-)	2.157E+03( $\approx$ )	1.079E+03(-)	6.860E+12(+)	1.040E+08(+)
	$\pm$ 4.072E+02	$\pm$ 1.484E+04	$\pm$ 6.373E+04	$\pm$ 1.464E+06	$\pm$ 4.063E+02	$\pm$ 5.374E+01	$\pm$ 8.365E+01	$\pm$ 1.476E+02	$\pm$ 5.921E+01	$\pm$ 1.343E+02	$\pm$ 3.215E+12	$\pm$ 2.354E+07
*13	1.298E+10	2.730E+10(+)	2.332E+11(+)	8.073E+15(+)	1.747E+10( $\approx$ )	3.274E+12(+)	7.857E+09(-)	1.698E+10( $\approx$ )	5.530E+09(-)	8.330E+08(-)	2.100E+21(+)	9.440E+16(+)
	$\pm$ 2.643E+09	$\pm$ 8.185E+09	$\pm$ 4.099E+10	$\pm$ 1.785E+16	$\pm$ 5.493E+09	$\pm$ 2.203E+12	$\pm$ 1.964E+09	$\pm$ 1.034E+10	$\pm$ 1.353E+09	$\pm$ 2.573E+08	$\pm$ 7.595E+20	$\pm$ 5.221E+15
14	1.323E+11	3.761E+11(+)	6.072E+11(+)	3.599E+13(+)	5.212E+21(+)	2.953E+13(+)	5.600E+08(-)	2.081E+10(-)	1.09E+10(-)	1.110E+10(-)	8.990E+08(-)	5.350E+22(+)
	$\pm$ 3.895E+10	$\pm$ 1.856E+11	$\pm$ 1.253E+11	$\pm$ 1.415E+13	$\pm$ 7.284E+21	$\pm$ 2.512E+13	$\pm$ 1.545E+10	$\pm$ 1.461E+10	$\pm$ 5.999E+09	$\pm$ 1.496E+08	$\pm$ 7.818E+21	$\pm$ 2.527E+17
15	3.090E+07	3.306E+07( $\approx$ )	1.508E+08(+)	\(+)	9.125E+07(+)	9.612E+07(+)	9.565E+07(+)	4.67E+08(+)	4.752E+07(+)	3.760E+07(+)	8.230E+15(+)	2.350E+15(+)
	$\pm$ 5.235E+06	$\pm$ 4.994E+06	$\pm$ 1.593E+07	\	$\pm$ 7.689E+06	$\pm$ 1.317E+07	$\pm$ 1.628E+07	$\pm$ 7.041E+06	$\pm$ 2.128E+06	$\pm$ 2.507E+06	$\pm$ 6.947E+14	$\pm$ 5.319E+14
	NA	11/3/1	8/4/3	11/0/4	6/4/5	10/2/3	9/3/3	8/2/5	6/2/7	7/1/7	13/0/2	15/0/0

5.2.1. Comparison With Other Algorithms

Table 2 presents the mean optimization results from 25 independent runs for each algorithm. The symbols “+”, “-”, and “ $\approx$ ” denote the outcomes of the Wilcoxon rank-sum test at the 0.05 significance level, indicating whether the competing method performed better (+), worse (-), or showed no significant difference ( $\approx$ ) compared to LCC-CMAES. The last column shows the test results for each algorithm, listing the number of times LCC-CMAES significantly outperformed competitors, instances with no significant difference, and cases where LCC-CMAES performed worse.

Based on the results from Table 2, we can analyze the following outcomes:

•

Superior Performance Within CC Frameworks: LCC-CAMES demonstrates significant advantages compared to existing advanced algorithms within the CC framework. LCC-CMAES shows pronounced superiority on more challenging grouping problems (such as F12-F14 Overlapping Functions), highlighting its capability to handle more complex real-world problems.
•

Generalization Capability: LCC-CMAES exhibits a degree of generalizability, thanks to the well-designed state that provides it with ample information. It shows certain generalization on similar problem types, such as F11, achieving commendable results when trained on F8-F9. Moreover, LCC-CMAES also demonstrated generalizability on completely unseen problem types (F15). This reveals LCC’s ability to solve more complex real-world problems.
•

Improvement Over CC-CMAES: LCC-CMAES also shows significant improvements compared to CC-CMAES, which employs the same decomposition strategy pool. This improvement is attributed to our effective design of the reward and state, which encourage more rational decomposition strategies and superior outcomes.
•

Comparison of Baselines Without CC: LCC-CMAES surpasses CMA-ES, LM-MA-ES, MetaES, L-BFGS and shows competitive performance with LM-CMA which validates the effectiveness of LCC-CMAES.

5.2.2. Comparison on Extended Optimization Horizon

To investigate the performance of algorithms under extended optimization horizons, we present the performance curves of the baselines on the 15 problems of CEC2013LSGO with 3E6 $MaxFEs$ in Figure 4. In most problems, LCC-CMAES performs better than CC-CMAES, demonstrating the universality of action selection effectiveness. In the early and middle stages of decomposition, LCC-CMAES outperforms other algorithms under the CC. However, after decomposition is complete, this advantage gradually diminishes and may even be surpassed. In most problems, except for L-BFGS and MetaES, which show poorer optimization results, algorithms without CC exhibit more prominent optimization performance. Algorithms under the CC perform better on separable problems (F4-F11), but struggle to show significant optimization effects (stepwise decline) on overlapping (F13-F14) and fully non-separable problems (F15). These problems are difficult to decompose, so all dimensions are often treated as a whole during optimization. This will also result in substantial resource consumption, as CMAES struggles with LSGO (e.g., the time cost of ERDG on F13 under 3E6 FEs reaches 60,000 seconds, whereas LCC-CMAES requires only 1,200 seconds). LCC-CMAES is not subject to such limitations, as it continues to perform decomposition even in overlapping and fully non-separable problems, hence significantly surpasses CC methods on these problems.

Table 3. Comparison of resource consumption.

Algorithm	LCC-CMAES	CC-CMAES	CSG	ERDG	MDG	FII	CMA-ES
FEs	0	0	4.86E+04	1.28E+05	4.10E+03	4.52E+03	0
Time Cost (s)	85.12	83.88	275.05	173.75	332.32	264.81	450.4

5.2.3. Comparison of Resource Consumption

In the analysis above, LCC-CMAES demonstrated its capability to handle more complex problems with a small budget. Besides the additional FEs, the time cost for decomposition and optimization is another bottleneck that constrains the algorithm from being applied to more complex and higher-dimensional problems. Therefore, we conducted a more thorough investigation into the additional FEs and the time cost for decomposition and optimization of each algorithm, aiming for a more detailed presentation of resource consumption.

Tables 3 show the additional FEs for decomposition and the averaged optimization time cost (in seconds) for each problem and each run. To avoid the issue of additional FEs becoming excessively high due to certain problems being difficult to decompose, we use the median to reflect the additional FEs. A notable advantage of LCC-CMAES is that it does not require additional FEs for decomposition, and due to its simple actor and critic network design, the time expenditure for LCC-CMAES is relatively low. This provides a feasible approach for solving more complex and higher-dimensional real-world problems. Other algorithms under CC framework often suffer from excessive costs due to the difficulty in identifying separable types; even when problems are identifiable, the presence of overlapping issues can result in many subgroups remaining too large, thereby causing substantial time expenditures for the underlying optimizer. Besides, we note that the algorithm without CC usually require more optimization time than CC methods. For instance, CC methods based on CMA-ES are faster than CMA-ES without CC. It is contributed by the decomposed smaller subspace dimensions, which validates the necessary of problem decomposition and CC.

5.3. The Transferability Study

To more effectively test the transferability of LCC-CMAES, we tested using an entirely new set of problems that it had never encountered before with the same settings. The majority of separable functions in the CEC 2013 LSGO are additively separable, with the Ackley function being the only non-additively separable function among the basic functions. To address these limitations, the BNS (Chen et al., 2022) introduces four non-additively separable base functions, including two multiplicatively separable base functions and two composite separable base functions. Based on these basis functions, BNS designs 12 test problems with varying degrees of separability. Compared to CEC 2013 LSGO, the problems in BNS are closer to the potential complex problems encountered in real-world scenarios. All algorithm settings are consistent with those used in the comparison in CEC2013 LSGO.

Table 4. Comparing LCC-CMAES with comparison algorithms on BNS.

problem	LCC-CMAES	CC-CMAES	CSG	ERDG	MDG	FII	CMA-ES	Sep-CMAES	LM-MA-ES	LM-CMA
1	4.752E-08	1.901E-07(+)	1.741E-06(+)	3.305E+06(+)	4.463E-03(+)	7.022E-04(+)	4.924E-11(-)	0.000E+00(-)	0.000E+00(-)	0.000E+00(-)
	$\pm$ 3.125E-08	$\pm$ 1.523E-07	$\pm$ 9.037E-07	$\pm$ 1.905E+06	$\pm$ 2.578E-03	$\pm$ 8.325E-04	$\pm$ 6.043E-11	$\pm$ 0.000E+00	$\pm$ 0.000E+00	$\pm$ 0.000E+00
2	7.044E+00	6.947E+00( $\approx$ )	1.035E+01(+)	1.198E+11(+)	1.185E+01(+)	1.432E+01(+)	1.038E+01(+)	2.375E+02(+)	8.306E+03(+)	6.403E+01(+)
	$\pm$ 4.794E-01	$\pm$ 2.413E+00	$\pm$ 1.532E+00	$\pm$ 5.394E+09	$\pm$ 3.865E+00	$\pm$ 4.413E+00	$\pm$ 4.152E-01	$\pm$ 6.350E+01	$\pm$ 4.066E+03	$\pm$ 1.583+01
3	7.292E+05	5.375E+05( $\approx$ )	7.173E+05( $\approx$ )	8.092E+06 (+)	2.569E+06(+)	7.136E+05( $\approx$ )	8.565E+05(+)	8.361E+02(-)	3.472E+06(+)	2.881E+06(+)
	$\pm$ 3.484E+05	$\pm$ 2.567E+05	$\pm$ 3.925E+05	$\pm$ 4.024E+04	$\pm$ 2.493E+06	$\pm$ 3.935E+05	$\pm$ 1.889E+05	$\pm$ 4.656E+01	$\pm$ 2.455E+05	$\pm$ 1.323E+05
4	1.423E+09	4.124E+09(+)	2.545E+10(+)	6.812E+11(+)	1.389E+11( $\approx$ )	8.347E+10(+)	1.852E+10(+)	3.865E+09(+)	7.713E+09(+)	9.296E+09(+)
	$\pm$ 3.39E+08	$\pm$ 1.484E+08	$\pm$ 8.115E+08	$\pm$ 5.046E+09	$\pm$ 7.765E+10	$\pm$ 4.438E+09	$\pm$ 6.297E+08	$\pm$ 2.779E+08	$\pm$ 2.510E+08	$\pm$ 5.068E+08
5	0.000E+00	4.402E-08(+)	4.414E-02(+)	3.682E+06(+)	5.003E-02(+)	2.624E-02(+)	4.225E-11(+)	0.000E+00( $\approx$ )	0.000E+00( $\approx$ )	0.000E+00( $\approx$ )
	$\pm$ 0.000E+00	$\pm$ 4.876E-08	$\pm$ 5.875E-03	$\pm$ 2.384E+06	$\pm$ 1.665E-03	$\pm$ 5.914E-03	$\pm$ 6.183E-11	$\pm$ 0.000E+00	$\pm$ 0.000E+00	$\pm$ 0.000E+00
6	2.105E+01	4.084E+01(+)	2.278E+03(+)	1.189E+11 (+)	5.694E+04(+)	2.366E+04(+)	8.482E+00(-)	2.763E+02(+)	4.035E+01(+)	3.913E+02(+)
	$\pm$ 3.663E+00	$\pm$ 9.414E+00	$\pm$ 1.385E+03	$\pm$ 4.286E+10	$\pm$ 9.847E+03	$\pm$ 7.106E+03	$\pm$ 1.185E+00	$\pm$ 2.884E+01	$\pm$ 5.894E+00	$\pm$ 3.653E+01
7	3.841E+06	3.846E+06( $\approx$ )	5.554E+06(+)	8.095E+06(+)	6.334E+06(+)	6.376E+06(+)	9.947E+05(-)	2.673E+06(-)	2.829E+06(-)	3.845E+06( $\approx$ )
	$\pm$ 3.052E+05	$\pm$ 9.843E+04	$\pm$ 5.645E+04	$\pm$ 5.256E+03	$\pm$ 4.891E+04	$\pm$ 4.795E+04	$\pm$ 1.596E+05	$\pm$ 2.435E+05	$\pm$ 1.932E+05	$\pm$ 6.532E+05
8	1.982E+10	2.423E+10(+)	1.844E+11(+)	6.705E+11(+)	1.085E+11(+)	1.137E+11(+)	1.932E+10( $\approx$ )	2.674E+10(+)	8.473E+09(-)	4.77E+09(-)
	$\pm$ 1.124E+09	$\pm$ 1.523E+09	$\pm$ 1.804E+10	$\pm$ 5.125E+10	$\pm$ 5.472E+09	$\pm$ 5.171E+09	$\pm$ 4.945E+08	$\pm$ 1.142E+09	$\pm$ 5.432E+08	$\pm$ 5.223E+08
9	4.883E-08	2.094E-07(+)	7.395E-04(+)	1.443E+07(+)	5.534E-03(+)	3.975E-02(+)	4.452E-08( $\approx$ )	0.000E+00(-)	0.000E+00(-)	0.000E+00(-)
	$\pm$ 1.705E-08	$\pm$ 1.467E-07	$\pm$ 2.178E-04	$\pm$ 8.453E+04	$\pm$ 1.115E-03	$\pm$ 4.973E-03	$\pm$ 1.775E-08	$\pm$ 0.000E+00	$\pm$ 0.000E+00	$\pm$ 0.000E+00
10	3.074E+02	4.213E+04(+)	3.725E+04(+)	7.054E+10(+)	1.635E+06(+)	1.713E+05(+)	1.062E+01(-)	2.943E+02( $\approx$ )	5.309E+01(-)	2.154E+03(+)
	$\pm$ 2.901E+02	$\pm$ 3.372E+04	$\pm$ 9.991E+03	$\pm$ 3.705E+09	$\pm$ 6.398E+05	$\pm$ 7.537E+04	$\pm$ 8.083E-01	$\pm$ 9.455E+01	$\pm$ 1.618E+01	$\pm$ 1.401E+02
11	3.762E+06	4.062E+06(+)	3.506E+06( $\approx$ )	8.057E+06(+)	7.228E+06(+)	7.196E+06(+)	4.574E+06(+)	3.475E+06( $\approx$ )	6.021E+06(+)	3.777E+06( $\approx$ )
	$\pm$ 2.562E+05	$\pm$ 1.235E+05	$\pm$ 7.774E+04	$\pm$ 3.916E+04	$\pm$ 4.735E+04	$\pm$ 4.051E+04	$\pm$ 2.962E+05	$\pm$ 5.456E+05	$\pm$ 6.778E+04	$\pm$ 6.542E+05
12	2.821E+10	4.289E+10(+)	5.365E+10(+)	6.782E+11(+)	1.952E+10(-)	2.075E+10(-)	1.911E+10(-)	3.913E+10(+)	2.782E+10(-)	4.612E+09(-)
	$\pm$ 1.632E+09	$\pm$ 2.254E+09	$\pm$ 1.965E+09	$\pm$ 3.076E+10	$\pm$ 8.082E+08	$\pm$ 1.494E+09	$\pm$ 7.118E+08	$\pm$ 2.371E+09	$\pm$ 2.407E+09	$\pm$ 4.578+E08
	NA	9/3/0	10/2/0	12/0/0	10/1/1	10/1/1	5/2/5	5/3/4	5/2/5	5/3/4

Table 4 presents the comparative results, indicating that LCC-CMAES has certain advantages. On one hand, some algorithms consume excessively high additional FEs for decomposition on BNS problems, which prevents them from focusing resources on the optimization process, leading to poor performance (e.g., ERDG). On the other hand, some algorithms fail to correctly identify variable interactions and group them on more complex problems, resulting in poor performance (e.g., MDG, with almost zero correct grouping rate). These results also reveal the transferability of LCC-CMAES and its potential in tackling more complex real-world problems: trained on simpler benchmarks, it can transfer decomposition knowledge to more complex real scenarios, discovering grouping structures through learning rather than relying on expert-level knowledge. However, LSGO variant algorithms and CMA-ES don’t encounter issues with inaccurate decomposition and additional FEs, and it can fully leverage the relationships between variables. This underscores the challenge that CC faces in thousands to low dimensions compared to global optimizers.

5.4. Comparison on Neuroevolution tasks

In this section, we adopt four Neuroevolution (Such et al., 2017) tasks as a showcase on real-world applications, in which optimization algorithms are used to evolve a population of neural networks according to their performance on a specific machine learning task such as robotic control (Galván and Mooney, 2021). Concretely, we consider the real-parameter optimization of 2-layer MLPs for 4 Mujoco (Todorov et al., 2012) robot control tasks: InvertedDoublePendulum-v4, HalfCheetah-v4, Pusher-v4 and Ant-v4. We set the hidden dimensions of the MLPs to 64 with Tanh activation function, while the input and output dimensions match the control protocols of the Mujoco tasks, leading to 833, 1542, 1991 and 2312 dimensions for the four MLPs of the four tasks, respectively. Because the evaluations of the networks are time consuming, we set the maximum function evaluations (FEs) of all tasks to 1,000. For baselines, we zero-shot the pre-trained LCC-CMAES agent in Section 5.2 to the Neuroevolution problems. The CC-based baselines except CC-CMAES fail to decompose the problem dimensions within 1,000 FEs so we do not include them in the comparison. The hyper-parameter settings of LCC-CMAES and included baselines are consistency with Section 5.1.2 except the total number of generations ( $TG$ ) and the offspring size ( $\lambda$ ) of LCC-CMAES which are both set to 5. Since the targets in the Mujoco tasks are to maximize the accumulated rewards gained by the networks, in Table 5 we present the negative accumulated rewards obtained by the networks optimized by LCC-CMAES and baselines to keep the minimization optimization manner.

The results show that even on realistic optimization problems with larger problem dimensions and more complex variable relationship, our LCC-CMAES still retains its advantages over CC-CMAES and global optimization baselines, validating its effectiveness.

Table 5. Comparison results on Neuroevolution tasks.

LCC-CMAES

CC-CMAES

CMA-ES

Sep-CMAES

LM-CMA

LM-MA-ES

MetaES

L-BFGS

InvertedDoublePendulum-v4

(833D)

-5.111E+03

\pm

2.914E+02

-4.854E+03 (+)

\pm

2.354E+02

-4.714E+03 (+)

\pm

2.354E+02

-4.988E+03 (+)

\pm

2.644E+02

-4.971E+03 (+)

\pm

2.541E+02

-4.951E+03 (+)

\pm

2.455E+02

-4.596E+03 (+)

\pm

2.944E+02

-4.322E+03 (+)

\pm

2.831E+02

HalfCheetah-v4

(1542D)

-2.451E+02

\pm

2.514E+02

-2.017E+02 (+)

\pm

2.119E+02

-1.914E+02 (+)

\pm

1.645E+02

-1.849E+02 (+)

\pm

2.002E+02

-1.897E+02 (+)

\pm

2.129E+02

-1.744E+02 (+)

\pm

1.988E+02

-5.554E+01 (+)

\pm

9.624E+01

-4.997E+01 (+)

\pm

9.487E+01

Pusher-v4

(1991D)

3.354E+02

\pm

2.984E+01

3.543E+02 (+)

\pm

3.791E+01

3.497E+02 (+)

\pm

4.016E+01

3.481E+02 (+)

\pm

3.594E+01

3.344E+02 (-)

\pm

2.326E+01

3.411E+02 (+)

\pm

2.746E+01

3.894E+02 (+)

\pm

4.687E+01

3.909E+02 (+)

\pm

6.314E+01

Ant-v4

(2312D)

-1.083E+03

\pm

6.476E+01

-1.080E+03 (

\approx

)

\pm

6.524E+01

-1.073E+03 (+)

\pm

5.146E+01

-1.066E+03 (+)

\pm

5.687E+01

-1.085E+03 (

\approx

)

\pm

6.971E+01

-1.079E+03 (

\approx

)

\pm

5.377E+01

-1.015E+03 (+)

\pm

3.345E+01

-1.004E+03 (+)

\pm

2.154E+01

3/1/0

4/0/0

2/1/1

3/1/0

4/0/0

5.5. Ablation Study

5.5.1. State features Study

To verify the necessity of the framework components, we conducted ablation studies on the state features. Specifically, we separately removed the embeddings and concatenations of the $s_{\text{AH}}$ (denoted as W/O AH), $s_{\text{GO}}$ (W/O GO), and $s_{\text{SD}}$ (W/O SD). We then tested these modifications on the CEC 2013 LSGO under the same settings otherwise unchanged.

The results are shown in Figure 5 (A). For each problem, we use the average performance of all runs for each algorithm and conduct the min-max normalization over all algorithms to restrict their performance into $[0,1]$ and eliminate the cost scale gaps between different problems. The $1-$ mean performance over all problems of each algorithms and their error bars are presented, where the higher is better. It is evident that the performance significantly deteriorates when these features are removed, highlighting their crucial roles to provide sufficient information being available to the RL agent.

5.5.2. Reward Study

The design of the reward mechanism needs to employ a ratio-based approach within the range of -1 to 1 to address that evaluation values can vary significantly across different problems and excessive impacts on network training. Here are two other reward designs that meet these requirements:

1) Global best fitness descent ratio: The global best fitness is calculated by subtracting it from the initial generation global best fitness and normalizing the decline by the global best fitness of the initial generation. reward1 := $r_{t}=\frac{f_{0}^{*}-f_{t}^{*}}{f_{0}^{*}}$

2) Relative global best fitness descent ratio: The global best fitness decline normalized by the global best fitness in previous generation. reward2 := $r_{t}=\frac{f_{t-1}^{*}-f_{t}^{*}}{f_{t-1}^{*}}$

Figure 5 (B) also presents the results under different reward schemes, with the same normalization as introduced in Section 5.5.1. It can be observed that the reward1 and the reward2 are significantly less effective than the approach we have adopted. This ineffectiveness is due to the fact that using a scheme that subtracts the initial generation’s fitness can lead to many subsequent fitness values significantly lower than the initial generation, making the numerator approximately equal to the initial value, causing the formula to approach 1. Additionally, normalizing the global best fitness from the previous generation causes the standard for reward normalization to change with each generation. This inconsistency makes it challenging for the RL agent to select appropriate actions.

6. Conclusion and Future work

We have proposed LCC, a pioneering learning-based cooperative coevolution framework that dynamically schedules decomposition strategies during optimization processes. With CMA-ES as the underlying optimizer, we instantiate LCC, naming it the LCC-CMAES algorithm. Unlike previous algorithms under the CC framework, LCC-CMAES does not use expert-designed knowledge for decomposition but instead utilizes statistical features for DRL to select most-expected decomposition strategies. More importantly, LCC-CMAES does not require the additional FEs for decomposition, allowing it to focus resources on optimization. When tested against several other advanced algorithms on two benchmarks, CEC 2013 LSGO and BNS, the comparative results demonstrated that LCC-CMAES holds a distinct advantage, especially on complex real-world problems that it had not previously encountered. This underscores LCC’s robustness, adaptability and transferability, making it a promising approach for tackling complex optimization challenges in various settings.

Looking ahead to future work, we hope to: (1) investigate the inclusion of more complex or higher-dimensional features that may capture deeper insights into the problem’s structure; (2) design more rational and effective decomposition actions. These goals aim to refine LCC’s effectiveness and applicability, ensuring it can be a versatile tool in the LSGO, capable of addressing a broader range of complex challenges.

Acknowledgements.

This work was supported in part by the National Natural Science Foundation of China No. 62276100, in part by the Guangdong Provincial Natural Science Foundation for Outstanding Youth Team Project No. 2024B1515040010, in part by the Guangdong Natural Science Funds for Distinguished Young Scholars No. 2022B1515020049, and in part by the TCL Young Scholars Program.

References

(1)
Akimoto and Hansen (2016) Youhei Akimoto and Nikolaus Hansen. 2016. Projection-based restricted covariance matrix adaptation for high dimension. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 197–204.
Bhattacharya et al. (2016) Maumita Bhattacharya, Rafiqul Islam, and Jemal Abawajy. 2016. Evolutionary optimization: a big data perspective. Journal of Network and Computer Applications 59 (2016), 416–426.
Byrd et al. (1995) Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. 1995. A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing 16, 5 (1995), 1190–1208.
Chen et al. (2024b) Jiacheng Chen, Zeyuan Ma, Hongshu Guo, Yining Ma, Jie Zhang, and Yue-jiao Gong. 2024b. Symbol: Generating Flexible Black-Box Optimizers through Symbolic Equation Learning. arXiv preprint arXiv:2402.02355 (2024).
Chen et al. (2022) Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, and Gary G Yen. 2022. A decomposition method for both additively and non-additively separable problems. IEEE Transactions on Evolutionary Computation (2022).
Chen et al. (2025) Minyang Chen, Chenchen Feng, and Ran Cheng. 2025. MetaDE: Evolving Differential Evolution by Differential Evolution. IEEE Transactions on Evolutionary Computation (2025).
Chen et al. (2019) Wei-Neng Chen, Ya-Hui Jia, Feng Zhao, Xiao-Nan Luo, Xing-Dong Jia, and Jun Zhang. 2019. A cooperative co-evolutionary approach to large-scale multisource water distribution network optimization. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 842–857.
Chen et al. (2024a) Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. 2024a. Symbolic discovery of optimization algorithms. Advances in Neural Information Processing Systems 36 (2024).
Dranka et al. (2021) Géremi Gilson Dranka, Paula Ferreira, and A Ismael F Vaz. 2021. A review of co-optimization approaches for operational and planning problems in the energy sector. Applied Energy 304 (2021), 117703.
Elsken et al. (2019) Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. Journal of Machine Learning Research 20, 55 (2019), 1–21.
Faldor et al. (2025) Maxence Faldor, Robert Tjarko Lange, and Antoine Cully. 2025. Discovering Quality-Diversity Algorithms via Meta-Black-Box Optimization. arXiv preprint arXiv:2502.02190 (2025).
Galván and Mooney (2021) Edgar Galván and Peter Mooney. 2021. Neuroevolution in deep neural networks: Current trends and future challenges. IEEE Transactions on Artificial Intelligence 2, 6 (2021), 476–493.
Ge et al. (2015) Hongwei Ge, Liang Sun, Xin Yang, Shinichi Yoshida, and Yanchun Liang. 2015. Cooperative differential evolution with fast variable interdependence learning and cross-cluster mutation. Applied Soft Computing 36 (2015), 300–314.
Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 51, 5, Article 93 (aug 2018), 42 pages. https://doi.org/10.1145/3236009
Guo et al. (2025b) Hongshu Guo, Sijie Ma, Zechuan Huang, Yuzhi Hu, Zeyuan Ma, Xinglin Zhang, and Yue-Jiao Gong. 2025b. Reinforcement Learning-based Self-adaptive Differential Evolution through Automated Landscape Feature Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
Guo et al. (2024) Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, and Yue-Jiao Gong. 2024. Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2024).
Guo et al. (2025a) Hongshu Guo, Zeyuan Ma, Jiacheng Chen, Yining Ma, Zhiguang Cao, Xinglin Zhang, and Yue-Jiao Gong. 2025a. ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 26982–26990.
Hammer (1962) PC Hammer. 1962. Adaptive control processes: a guided tour (R. Bellman).
Hansen (2016) Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772 (2016).
He et al. (2020) Xiaoyu He, Zibin Zheng, and Yuren Zhou. 2020. MMES: Mixture model-based evolution strategy for large-scale optimization. IEEE Transactions on Evolutionary Computation 25, 2 (2020), 320–333.
Jia et al. (2020) Ya-Hui Jia, Yi Mei, and Mengjie Zhang. 2020. Contribution-based cooperative co-evolution for nonseparable large-scale problems with overlapping subcomponents. IEEE Transactions on Cybernetics 52, 6 (2020), 4246–4259.
Komarnicki et al. (2024) Marcin Michal Komarnicki, Michal Witold Przewozniczek, Renato Tinós, and Xiaodong Li. 2024. Overlapping Cooperative Co-Evolution for Overlapping Large-Scale Global Optimization Problems. In Proceedings of the Genetic and Evolutionary Computation Conference. 665–673.
Lange et al. (2023) Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, and Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. 29–30.
Li et al. (2022) Jian-Yu Li, Zhi-Hui Zhan, Kay Chen Tan, and Jun Zhang. 2022. Dual differential grouping: A more general decomposition method for large-scale optimization. IEEE Transactions on Cybernetics (2022).
Li et al. (2024a) Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zhen, and Ke Tang. 2024a. Bridging evolutionary algorithms and reinforcement learning: A comprehensive survey on hybrid algorithms. IEEE Transactions on Evolutionary Computation (2024).
Li et al. (2013) Xiaodong Li, Ke Tang, Mohammad N Omidvar, Zhenyu Yang, Kai Qin, and Hefei China. 2013. Benchmark functions for the CEC 2013 special session and competition on large-scale global optimization. Gene 7, 33 (2013), 8.
Li et al. (2025) Xiaobin Li, Kai Wu, Xiaoyu Zhang, and Handing Wang. 2025. B2Opt: Learning to optimize black-box optimization with little budget. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 18502–18510.
Li et al. (2024b) Xiaobin Li, Kai Wu, Xiaoyu Zhang, Handing Wang, Jing Liu, et al. 2024b. Pretrained optimization model for zero-shot black box optimization. Advances in Neural Information Processing Systems 37 (2024), 14283–14324.
Li and Zhang (2017) Zhenhua Li and Qingfu Zhang. 2017. A simple yet efficient evolution strategy for large-scale black-box optimization. IEEE Transactions on Evolutionary Computation 22, 5 (2017), 637–646.
Lian et al. (2024) Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, and Yue-Jiao Gong. 2024. RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
Liao et al. (2023) Zuowen Liao, Wenyin Gong, and Shuijia Li. 2023. Two-stage reinforcement learning-based differential evolution for solving nonlinear equations. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2023).
Liu et al. (2024) Jing Liu, Ruhul Sarker, Saber Elsayed, Daryl Essam, and Nurhadi Siswanto. 2024. Large-scale evolutionary optimization: A review and comparative study. Swarm and Evolutionary Computation (2024), 101466.
Liu and Tang (2013) Jinpeng Liu and Ke Tang. 2013. Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 350–357.
Loshchilov (2017) Ilya Loshchilov. 2017. LM-CMA: An alternative to L-BFGS for large-scale black box optimization. Evolutionary computation 25, 1 (2017), 143–171.
Loshchilov et al. (2018) Ilya Loshchilov, Tobias Glasmachers, and Hans-Georg Beyer. 2018. Large scale black-box optimization by limited-memory matrix adaptation. IEEE Transactions on Evolutionary Computation 23, 2 (2018), 353–358.
Ma et al. (2022) Xiaoliang Ma, Zhitao Huang, Xiaodong Li, Lei Wang, Yutao Qi, and Zexuan Zhu. 2022. Merged differential grouping for large-scale global optimization. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1439–1451.
Ma et al. (2024a) Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, and Yue-Jiao Gong. 2024a. Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
Ma et al. (2024b) Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Zhenrui Li, Guojun Peng, Yue-Jiao Gong, Yining Ma, and Zhiguang Cao. 2024b. MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning. Advances in Neural Information Processing Systems 36 (2024).
Ma et al. (2024c) Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. 2024c. Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization. arXiv preprint arXiv:2411.00625 (2024).
Ma et al. (2025a) Zeyuan Ma, Zhiyang Huang, Jiacheng Chen, Zhiguang Cao, and Yue-Jiao Gong. 2025a. Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study. In Proceedings of the Genetic and Evolutionary Computation Conference.
Ma et al. (2025b) Zeyuan Ma, Hongqiao Lian, Wenjie Qiu, and Yue-Jiao Gong. 2025b. Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
Mei et al. (2016) Yi Mei, Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2016. A competitive divide-and-conquer algorithm for unconstrained large-scale black-box optimization. ACM Transactions on Mathematical Software (TOMS) 42, 2 (2016), 1–24.
Mersmann et al. (2011) Olaf Mersmann, Bernd Bischl, Heike Trautmann, Mike Preuss, Claus Weihs, and Günter Rudolph. 2011. Exploratory landscape analysis. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. 829–836.
Meselhi et al. (2022) Mohamed Meselhi, Ruhul Sarker, Daryl Essam, and Saber Elsayed. 2022. A decomposition approach for large-scale non-separable optimization problems. Applied Soft Computing 115 (2022), 108168.
Mnih et al. (2015) Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
Mo et al. (2025) Shibing Mo, Kai Wu, Qixuan Gao, Xiangyi Teng, and Jing Liu. 2025. AutoSGNN: automatic propagation mechanism discovery for spectral graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 19493–19502.
Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.
Omidvar et al. (2010) Mohammad Nabi Omidvar, Xiaodong Li, Zhenyu Yang, and Xin Yao. 2010. Cooperative co-evolution for large scale optimization through more frequent random grouping. In 2010 IEEE Congress on Evolutionary Computation(CEC). IEEE, 1–8.
Omidvar et al. (2021a) Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2021a. A review of population-based metaheuristics for large-scale black-box global optimization—Part I. IEEE Transactions on Evolutionary Computation 26, 5 (2021), 802–822.
Omidvar et al. (2021b) Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2021b. A review of population-based metaheuristics for large-scale black-box global optimization—Part II. IEEE Transactions on Evolutionary Computation 26, 5 (2021), 823–843.
Omidvar et al. (2017) Mohammad Nabi Omidvar, Ming Yang, Yi Mei, Xiaodong Li, and Xin Yao. 2017. DG2: A faster and more accurate differential grouping for large-scale black-box optimization. IEEE Transactions on Evolutionary Computation 21, 6 (2017), 929–942.
Pitzer and Affenzeller (2012) Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. Recent Advances in Intelligent Engineering Systems (2012), 161–191.
Potter and De Jong (1994) Mitchell A Potter and Kenneth A De Jong. 1994. A cooperative coevolutionary approach to function optimization. In International conference on parallel problem solving from nature. Springer, 249–257.
Qiu et al. (2025) Wenjie Qiu, Hongshu Guo, Zeyuan Ma, and Yue-Jiao Gong. 2025. A Novel Two-Phase Cooperative Co-evolution Framework for Large-Scale Global Optimization with Complex Overlapping. In Proceedings of the Genetic and Evolutionary Computation Conference.
Ros and Hansen (2008) Raymond Ros and Nikolaus Hansen. 2008. A simple modification in CMA-ES achieving linear time and space complexity. In International conference on parallel problem solving from nature. Springer, 296–305.
Roy and Tiwari (2002) Rajkumar Roy and Ashutosh Tiwari. 2002. Generalised regression GA for handling inseparable function interaction: Algorithm and applications. In International Conference on Parallel Problem Solving from Nature. Springer, 452–461.
Salomon (1996) Ralf Salomon. 1996. Re-evaluating genetic algorithm performance under coordinate rotation of benchmark functions. A survey of some theoretical and practical aspects of genetic algorithms. BioSystems 39, 3 (1996), 263–278.
Sayed et al. (2012) Eman Sayed, Daryl Essam, and Ruhul Sarker. 2012. Dependency identification technique for large scale optimization problems. In 2012 IEEE Congress on Evolutionary Computation(CEC). IEEE, 1–8.
Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
Shao et al. (2025) Shuai Shao, Ye Tian, and Yajie Zhang. 2025. Deep reinforcement learning assisted surrogate model management for expensive constrained multi-objective optimization. Swarm and Evolutionary Computation 92 (2025), 101817.
Sharma et al. (2019) Mudita Sharma, Alexandros Komninos, Manuel López-Ibáñez, and Dimitar Kazakov. 2019. Deep reinforcement learning based parameter control in differential evolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 709–717.
Shi et al. (2005) Yan-jun Shi, Hong-fei Teng, and Zi-qiang Li. 2005. Cooperative co-evolutionary differential evolution for function optimization. In Advances in Natural Computation: First International Conference, ICNC 2005, Changsha, China, August 27-29, 2005, Proceedings, Part II 1. Springer, 1080–1088.
Soboĺ (1993) IM Soboĺ. 1993. Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp. 1 (1993), 407.
Such et al. (2017) Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).
Sun et al. (2021) Jianyong Sun, Xin Liu, Thomas Bäck, and Zongben Xu. 2021. Learning adaptive differential evolution algorithm from optimization experiences by policy gradient. IEEE Transactions on Evolutionary Computation 25, 4 (2021), 666–680.
Sun et al. (2017) Yuan Sun, Michael Kirley, and Saman K Halgamuge. 2017. A recursive decomposition method for large scale continuous optimization. IEEE Transactions on Evolutionary Computation 22, 5 (2017), 647–661.
Sun et al. (2019) Yuan Sun, Xiaodong Li, Andreas Ernst, and Mohammad Nabi Omidvar. 2019. Decomposition for large-scale optimization problems with overlapping components. In 2019 IEEE Congress on Evolutionary Computation (CEC). IEEE, 326–333.
Tan and Li (2021) Zhiping Tan and Kangshun Li. 2021. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Applied Soft Computing 111 (2021), 107678.
Tian et al. (2024a) Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, and Yaochu Jin. 2024a. An Enhanced Differential Grouping Method for Large-Scale Overlapping Problems. IEEE Transactions on Evolutionary Computation (2024).
Tian et al. (2024b) Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, and Gary G Yen. 2024b. A Composite Decomposition Method for Large-Scale Global Optimization. IEEE Transactions on Artificial Intelligence (2024).
Tiwari and Roy (2002) Ashutosh Tiwari and Rajkumar Roy. 2002. Variable dependence interaction and multi-objective optimisation. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. 602–609.
Tiwari et al. (2001) Ashutosh Tiwari, Rajkumar Roy, Graham Jared, and Olivier Munaux. 2001. Interaction and multi-objective optimisation. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation. 671–678.
Todorov et al. (2012) Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026–5033.
Van den Bergh and Engelbrecht (2004) Frans Van den Bergh and Andries P Engelbrecht. 2004. A cooperative approach to particle swarm optimization. IEEE Transactions on Evolutionary Computation 8, 3 (2004), 225–239.
Vicol et al. (2021) Paul Vicol, Luke Metz, and Jascha Sohl-Dickstein. 2021. Unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies. In International Conference on Machine Learning. PMLR, 10553–10563.
Wu and Wang (2022) Di Wu and G Gary Wang. 2022. Employing reinforcement learning to enhance particle swarm optimization methods. Engineering Optimization 54, 2 (2022), 329–348.
Xu and Pi (2020) Yue Xu and Dechang Pi. 2020. A reinforcement learning-based communication topology in particle swarm optimization. Neural Computing and Applications 32 (2020), 10007–10032.
Xue et al. (2022) Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, and Yang Yu. 2022. Multi-agent dynamic algorithm configuration. Advances in Neural Information Processing Systems 35 (2022), 20147–20161.
Yang et al. (2023) Ming Yang, Jie Gao, Aimin Zhou, Changhe Li, and Xin Yao. 2023. Contribution-Based Cooperative Co-Evolution With Adaptive Population Diversity for Large-Scale Global Optimization [Research Frontier]. IEEE Computational Intelligence Magazine 18, 3 (2023), 56–68.
Yang et al. (2016) Ming Yang, Mohammad Nabi Omidvar, Changhe Li, Xiaodong Li, Zhihua Cai, Borhan Kazimipour, and Xin Yao. 2016. Efficient resource allocation in cooperative co-evolution for large-scale global optimization. IEEE Transactions on Evolutionary Computation 21, 4 (2016), 493–505.
Yang et al. (2020) Ming Yang, Aimin Zhou, Changhe Li, and Xin Yao. 2020. An efficient recursive differential grouping for large-scale continuous problems. IEEE Transactions on Evolutionary Computation 25, 1 (2020), 159–171.
Yang et al. (2024) Qingyong Yang, Shu-Chuan Chu, Jeng-Shyang Pan, Jyh-Horng Chou, and Junzo Watada. 2024. Dynamic multi-strategy integrated differential evolution algorithm based on reinforcement learning for optimization problems. Complex & Intelligent Systems 10, 2 (2024), 1845–1877.
Yang et al. ([n. d.]) Xu Yang, Rui Wang, and Kaiwen Li. [n. d.]. Meta-Black-Box Optimization for Evolutionary Algorithms: Review and Perspective. Available at SSRN 4956956 ([n. d.]).
Yang et al. (2008) Zhenyu Yang, Ke Tang, and Xin Yao. 2008. Large scale evolutionary optimization using cooperative coevolution. Information Sciences 178, 15 (2008), 2985–2999.
Yi et al. (2022) Wenjie Yi, Rong Qu, Licheng Jiao, and Ben Niu. 2022. Automated design of metaheuristics using reinforcement learning within a novel general search framework. IEEE Transactions on Evolutionary Computation 27, 4 (2022), 1072–1084.
Yin et al. (2021) Shiyuan Yin, Yi Liu, GuoLiang Gong, Huaxiang Lu, and Wenchang Li. 2021. RLEPSO: Reinforcement learning based Ensemble particle swarm optimizer. In Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. 1–6.
Zhang et al. (2024) Jinlu Zhang, Lixin Wei, Zeyin Guo, Hao Sun, and Ziyu Hu. 2024. A survey of meta-heuristic algorithms in optimization of space scale expansion. Swarm and Evolutionary Computation 84 (2024), 101462.
Zhang et al. (2019) Xin-Yuan Zhang, Yue-Jiao Gong, Ying Lin, Jie Zhang, Sam Kwong, and Jun Zhang. 2019. Dynamic cooperative coevolution for large scale optimization. IEEE Transactions on Evolutionary Computation 23, 6 (2019), 935–948.