Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization

Hongshu Guo South China University of TechnologyGuangzhouGuangdongChina Wenjie Qiu South China University of TechnologyGuangzhouGuangdongChina Zeyuan Ma South China University of TechnologyGuangzhouGuangdongChina Xinglin Zhang South China University of TechnologyGuangzhouGuangdongChina Jun Zhang Nankai University, China; Hanyang UniversitySouth Korea  and  Yue-Jiao Gong South China University of TechnologyGuangzhouGuangdongChina *Corresponding-Author: [email protected]
(2018)
Abstract.

Recent research in Cooperative Coevolution (CC) have achieved promising progress in solving large-scale global optimization problems. However, existing CC paradigms have a primary limitation in that they require deep expertise for selecting or designing effective variable decomposition strategies. Inspired by advancements in Meta-Black-Box Optimization, this paper introduces LCC, a pioneering learning-based cooperative coevolution framework that dynamically schedules decomposition strategies during optimization processes. The decomposition strategy selector is parameterized through a neural network, which processes a meticulously crafted set of optimization status features to determine the optimal strategy for each optimization step. The network is trained via the Proximal Policy Optimization method in a reinforcement learning manner across a collection of representative problems, aiming to maximize the expected optimization performance. Extensive experimental results demonstrate that LCC not only offers certain advantages over state-of-the-art baselines in terms of optimization effectiveness and resource consumption, but it also exhibits promising transferability towards unseen problems.

CMA-ES, cooperative co-evolution, reinforcement learning, large scale global optimization, meta-black-box optimization
copyright: acmlicensedjournalyear: 2024doi: XXXXXXX.XXXXXXXjournal: JACMjournalvolume: 37journalnumber: 4article: 111publicationmonth: 8ccs: Mathematics of computing Bio-inspired optimizationccs: Computing methodologies Reinforcement learning

1. Introduction

Black box optimization (BBO) is a class of optimization problems whose objective function is either unknown or too intricate to be mathematically formulated (Ma et al., 2024b). Consequently, BBO requires interaction-based information acquisition without access to underlying mathematical expressions or gradients. Within the context of BBO, Large-Scale Global Optimization (LSGO), which involves thousands to tens of thousands of variables, has numerous real-world applications (Elsken et al., 2019; Dranka et al., 2021; Bhattacharya et al., 2016) to drive resource savings, cost control, and efficiency enhancement (Guidotti et al., 2018; Omidvar et al., 2021a; Liu et al., 2024; Zhang et al., 2024). Many works have proposed LSGO variants of algorithms originally applied to lower-dimensional BBO problems, such as Sep-CMAES(Ros and Hansen, 2008), LM-MA-ES(Loshchilov et al., 2018), and so on(Akimoto and Hansen, 2016; Loshchilov, 2017; He et al., 2020; Li and Zhang, 2017), to tackle such problems. Besides, Persistent Evolution Strategies (PES) (Vicol et al., 2021), presented in an outstanding paper at ICML-2021, combines ideas from gradient-based optimization with evolutionary strategies to improve optimization efficiency and accuracy. However, the “curse of dimensionality” represents a significant challenge for such problems: as the number of variables increases, the complexity of optimization grows exponentially, necessitating extensive iterations for exploration (Hammer, 1962).

Refer to caption
Figure 1. The core idea of LCC-CMAES.

To address LSGO, inspired by the divide-and-conquer philosophy, a framework named Cooperative Co-evolution (CC) first divides the variables into several subgroups, then optimize these subgroups (considered as lower-dimensional BBO problems) using Evolutionary Algorithms (EAs), and finally integrates them into a comprehensive global optimization solution (Potter and De Jong, 1994; Jia et al., 2020; Chen et al., 2019; Yang et al., 2008). In the CC framework, an important issue is placing non-separable variables within the same subgroup to accurately divide the problem dimensions, which is so called decomposition strategy (Van den Bergh and Engelbrecht, 2004). The researchers initially tried random decomposition and some decomposition strategies utilizing statistical data but did not obtain satisfactory results (Potter and De Jong, 1994; Van den Bergh and Engelbrecht, 2004). Later, they attempted to dynamically select strategies by calculating the probability of each using a table of historical statistical information, designed by expert-level knowledge, which yielded some positive effects (e.g., CC-CMAES (Liu and Tang, 2013)). Furthermore, the researchers designed a series of decomposition strategies based on expert-level knowledge to more accurately identify variable interactions for precise decomposition, but this precise decomposition led to substantial additional function evaluations (FEs) costs (Sun et al., 2017; Omidvar et al., 2017; Tian et al., 2024b). According to the above, a primary limitation in the current CC framework is the Expert-Level Knowledge Dependency: these decomposition strategies are based on hand-crafted rules, heavily reliant on expert-level optimization knowledge and might not be generalizable towards unseen problems. Therefore, considering methods that do not require expert-level knowledge for decomposition could be a more suitable solution for tackling challenging real-world problems.

To alleviate the burdensome task of manual fine-tuning with expert-level knowledge, recent research has proposed the concept of Meta-Black-Box Optimization (MetaBBO) (Ma et al., 2024b, c; Li et al., 2024a; Mo et al., 2025; Li et al., 2025, 2024b). This paradigm has showcased the power of leveraging deep reinforcement learning (DRL) in a data-driven fashion at the meta-level to mitigate expert-level knowledge of low-level black-box optimizers. Numerous studies have shown that MetaBBO enables the black-box optimizers to achieve more effective optimization performance through enhanced parameter configuration (Xue et al., 2022; Ma et al., 2024a; Sun et al., 2021), algorithm/operator selection (Guo et al., 2024; Liao et al., 2023), and update rule generation (Lange et al., 2023; Chen et al., 2024a, b; Yi et al., 2022). Inspired by MetaBBO, we introduce Learning-Based Cooperative Coevolution (LCC), a pioneering framework that dynamically schedules decomposition strategies without expertise during optimization processes. The main contributions of this work are summarized as follows:

  • LCC is designed to create an intelligent decision-making agent that autonomously selects effective decomposition strategies tailored to various problem environments and optimization states. We have formulated this process as a Markov Decision Process (MDP) and utilized DRL to construct the agent. This approach replaces traditional, expert-designed selection modes, marking a significant advance in automating and optimizing the decomposition strategy within the CC frameworks for large-scale BBO.

  • Taking the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as the underlying optimizer, we develop the LCC-CMAES algorithm. Figure 1 shows the core idea of LCC-CMAES. We have designed a set of straightforward yet representative statistical features to capture essential grouping information and reflect the optimization state. Based on the state, LCC selects an appropriate decomposition strategy from a strategy pool of random decomposition (RD), Min-Variance decomposition (MiVD) and Max-Variance decomposition (MaVD) - enhancing the efficacy of CMA-ES.

  • We conducted detailed comparisons with various leading LSGO algorithms in more challenging settings to illustrate the limitations in practical problems. The experimental results demonstrate that LCC-CMAES excels not only in terms of resource consumption but also in optimization results compared to other algorithms. Additionally, LCC-CMAES exhibits transferability, showing outstanding performance on other unseen problem sets after training.

The remainder of this paper is organized as follows: Section 2 discusses related work. Section 3 provides the preliminary knowledge necessary for understanding CC-CMAES and MDP. Section 4 describes the overall architecture of LCC, as well as the specific design of its MDP and network. Section 5 presents the experimental results and provides a detailed analysis. Finally, Section 6 concludes the paper and outlines future work.

2. Related Works

As mentioned earlier, our LCC is inspired by MetaBBO and operates within the CC framework. Therefore, in this section, we will review MetaBBO and several important problem decomposition strategies under the CC framework.

2.1. MetaBBO

To alleviate the burdensome task of manual fine-tuning, the concept of MetaBBO has been proposed by recent research (Ma et al., 2024b; Yang et al., [n. d.]; Ma et al., 2024c; Chen et al., 2025; Shao et al., 2025; Faldor et al., 2025). MetaBBO aims to refine black-box optimizers by identifying optimal configurations or parameters through an automatic decision process without requiring expertise, thereby boosting overall performance across various problem instances within a given problem domain. MetaBBO-RL is one of approaches of MetaBBO (Sharma et al., 2019; Tan and Li, 2021; Guo et al., 2025a; Ma et al., 2025b, a), which models the optimizer fine-tuning as a MDP and learns an RL agent to automatically make decisions without expertise. The meta-objective of MetaBBO-RL is to learn a policy (RL agent) ΠsuperscriptΠ\Pi^{*}roman_Π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that maximizes the expectation of the accumulated meta-performance improvement rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (also called reward) over the problem set distribution ξ𝜉\xiitalic_ξ, 𝔼υξ,Π[t=0Trt]subscript𝔼similar-to𝜐𝜉superscriptΠdelimited-[]superscriptsubscript𝑡0𝑇subscript𝑟𝑡\mathbb{E}_{\upsilon\sim\xi,\Pi^{*}}\left[\sum_{t=0}^{T}r_{t}\right]blackboard_E start_POSTSUBSCRIPT italic_υ ∼ italic_ξ , roman_Π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , where T𝑇Titalic_T denotes the all times of making decisions and υ𝜐\upsilonitalic_υ is the problem of problem set ΥΥ\Upsilonroman_Υ. Specifically, in the aspect of operator selection, MetaBBO-RL automates the tuning process, significantly reducing the time and expertise needed to customize algorithms for specific unseen problems, while also potentially enhancing overall optimization performance (Xu and Pi, 2020; Wu and Wang, 2022; Yin et al., 2021; Guo et al., 2025b). This has been confirmed in numerous research studies: RL-DAS (Guo et al., 2024), based on MetaBBO-RL, selects operators for Differential Evolution algorithms, leveraging their complementary strengths to enhance optimization performance and demonstrating favorable generalization across different problem classes; RLDMDE (Yang et al., 2024) employs RL so that each subpopulation can adaptively select a mutation strategy based on the current environmental state (population diversity), thereby boosting the self-adaptation of subpopulations; similarly, RLEMMO (Lian et al., 2024), the first generalizable MetaBBO-RL framework for solving multimodal optimization problems (MMOP), selects operators for search strategies, directly addresses unseen problems, and achieves competitive optimization performance in both quality and diversity against several strong MMOP solvers.

2.2. CC and the Problem Decomposition Strategies

Inspired by the “divide and conquer” philosophy, CC is a framework to solve LSGO by the decomposition-based approach(Omidvar et al., 2021b, a). It first divides the variables into several subgroups, then optimize these subgroups using EAs, and finally integrates them into a global optimization solution.

CCGA (Potter and De Jong, 1994) is the first strategy to use CC for problem decomposition, splitting an D𝐷Ditalic_D-dimensional problem into D𝐷Ditalic_D one-dimensional problems, where D𝐷Ditalic_D is the dimensionality of the problem. However, both practical tests (Potter and De Jong, 1994) and theoretical analyses (Van den Bergh and Engelbrecht, 2004) have suggested that completely decomposing into one-dimensional problems poses a risk of introducing spurious minima. To mitigate this issue, strategies such as k𝑘kitalic_k-s𝑠sitalic_s dimensional decomposition and bipartite decomposition have been proposed (Van den Bergh and Engelbrecht, 2004; Shi et al., 2005), but these algorithms do not take into account the structure of the problem or interactions between variables, potentially placing interacting variables in different components, which adversely affects optimization performance (Omidvar et al., 2021a). To achieve more precise decomposition, researchers have started from the definition of separability, defining various types of separability such as additive separability (Li et al., 2013), multiplicative separability (Li et al., 2022), and composite separability (Tian et al., 2024b), and have developed a range of variable interaction identification algorithms, such as DG2 (Omidvar et al., 2017), RDG (Sun et al., 2017), ERDG (Yang et al., 2020), MDG (Ma et al., 2022), GDG (Mei et al., 2016) and CSG (Tian et al., 2024b). In addition, researchers have further studied how to accurately decompose overlapping variables, such as DOV (Meselhi et al., 2022), OCC (Komarnicki et al., 2024), and OEDG (Tian et al., 2024a). However, the cost of improving accuracy in this way includes a large number of expert-designed separability methods and additional FEs. Strategies based on probabilistic and statistical methods do not have these issues. They perform multiple rounds of grouping optimization before forming the final optimization result to capture problem structure and variable interactions (Yang et al., 2008). Relying on expertise, many algorithms were proposed (Tiwari et al., 2001; Roy and Tiwari, 2002; Tiwari and Roy, 2002; Soboĺ, 1993), such as the the Delta method (Omidvar et al., 2010) based on theory that the improvement intervals for inseparable variables are relatively smaller than those for separable variables (Salomon, 1996), the Fitness Difference Minimization (FDM) method exemplified by DIMA (Sayed et al., 2012) and CC-CMAES (Liu and Tang, 2013) based on covariance matrices and expert-designed selection mode. Besides, contribution-based decomposition methods (Yang et al., 2023), such as CCFR (Yang et al., 2016), DCC (Zhang et al., 2019), and CBCCO (Jia et al., 2020), represent another novel strategy. Although these two strategies do not have the additional FEs, they rely on expertise, so in different scenarios, they may fail to meet the requirements for reasonable decomposition(Omidvar et al., 2021a; Qiu et al., 2025). Therefore, considering methods that do not require expert-level knowledge for decomposition might be a more suitable solution for more challenging real-world problems.

3. Preliminaries

3.1. CC-CMAES

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) (Hansen, 2016) is a representative EA that operates by repeatedly sampling offspring according to a distribution and updating the distribution with the performance of the sampled offspring until a stopping criterion is met (e.g., reaching the total number of generations TG𝑇𝐺TGitalic_T italic_G).

(1) x(g+1)(k)N(ω(g),σ(g)2C(g)),k=1,2,,λformulae-sequencesimilar-tosuperscriptsubscript𝑥𝑔1𝑘𝑁subscript𝜔𝑔superscriptsubscript𝜎𝑔2subscript𝐶𝑔𝑘12𝜆x_{(g+1)}^{(k)}\sim N\left(\omega_{(g)},\sigma_{(g)}^{2}\cdot C_{(g)}\right),% \quad k=1,2,\cdots,\lambdaitalic_x start_POSTSUBSCRIPT ( italic_g + 1 ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∼ italic_N ( italic_ω start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_C start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT ) , italic_k = 1 , 2 , ⋯ , italic_λ

Equation (1) shows the sampling process in a population P𝑃Pitalic_P with offspring size λ𝜆\lambdaitalic_λ at generation g𝑔gitalic_g. ω(g)Dsubscript𝜔𝑔superscript𝐷\omega_{(g)}\in\mathbb{R}^{D}italic_ω start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT, C(g)D×Dsubscript𝐶𝑔superscript𝐷𝐷C_{(g)}\in\mathbb{R}^{D\times D}italic_C start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D end_POSTSUPERSCRIPT, and σ(g)subscript𝜎𝑔\sigma_{(g)}\in\mathbb{R}italic_σ start_POSTSUBSCRIPT ( italic_g ) end_POSTSUBSCRIPT ∈ blackboard_R are the Gaussian mean, covariance matrix, and global step size, respectively, at generation g𝑔gitalic_g. CC-CMAES (Liu and Tang, 2013) uses the CC framework with CMA-ES, featuring three decomposition strategies: Min-Variance Decomposition (MiVD), Random Decomposition (RD), and Max-Variance Decomposition (MaVD), ranging from exploitative to exploratory. It dynamically selects one strategy to optimize subgroups with CMA-ES for a fixed number of generations until termination criteria are met. MiVD, RD, and MaVD decompose the space based on the rank of the diagonal of the covariance matrix. MiVD sequentially selects D/m𝐷𝑚D/mitalic_D / italic_m variables following the rank order to minimize the diversity among their variances. In contrast, MaVD selects one variable, then skips D/m𝐷𝑚D/mitalic_D / italic_m variables to select the next variable each time, which maximizes diversity. RD randomly selects D/m𝐷𝑚D/mitalic_D / italic_m variables within each subspace. The subspace covariance matrix Csubi(D/m)×(D/m)subscript𝐶𝑠𝑢subscript𝑏𝑖superscript𝐷𝑚𝐷𝑚C_{sub_{i}}\in\mathbb{R}^{(D/m)\times(D/m)}italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_D / italic_m ) × ( italic_D / italic_m ) end_POSTSUPERSCRIPT and mean ωsubiD/msubscript𝜔𝑠𝑢subscript𝑏𝑖superscript𝐷𝑚\omega_{sub_{i}}\in\mathbb{R}^{D/m}italic_ω start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D / italic_m end_POSTSUPERSCRIPT are extracted from the global covariance matrix C𝐶Citalic_C and mean ω𝜔\omegaitalic_ω as Csubi=C[subdimsi,subdimsi],ωsubi=ω[subdimsi]formulae-sequencesubscript𝐶𝑠𝑢subscript𝑏𝑖𝐶𝑠𝑢𝑏𝑑𝑖𝑚subscript𝑠𝑖𝑠𝑢𝑏𝑑𝑖𝑚subscript𝑠𝑖subscript𝜔𝑠𝑢subscript𝑏𝑖𝜔delimited-[]𝑠𝑢𝑏𝑑𝑖𝑚subscript𝑠𝑖C_{sub_{i}}=C[subdims_{i},subdims_{i}],\omega_{sub_{i}}=\omega[subdims_{i}]italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_C [ italic_s italic_u italic_b italic_d italic_i italic_m italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_s italic_u italic_b italic_d italic_i italic_m italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] , italic_ω start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_ω [ italic_s italic_u italic_b italic_d italic_i italic_m italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], where subdimsi[1,D]D/m𝑠𝑢𝑏𝑑𝑖𝑚subscript𝑠𝑖superscript1𝐷𝐷𝑚subdims_{i}\in[1,D]^{D/m}italic_s italic_u italic_b italic_d italic_i italic_m italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 1 , italic_D ] start_POSTSUPERSCRIPT italic_D / italic_m end_POSTSUPERSCRIPT represents the dimension index set of subgroup i[1,,m]𝑖1𝑚i\in\left[1,\dots,m\right]italic_i ∈ [ 1 , … , italic_m ]. C𝐶Citalic_C and ω𝜔\omegaitalic_ω are updated using Csubisubscript𝐶𝑠𝑢subscript𝑏𝑖C_{sub_{i}}italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ωsubisubscript𝜔𝑠𝑢subscript𝑏𝑖\omega_{sub_{i}}italic_ω start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is its inverse process.

3.2. Markov Decision Process

A Markov Decision Process (MDP) is commonly characterized as :=<𝒮,𝒜,𝒯,R>\mathcal{M}:=<\mathcal{S},\mathcal{A},\mathcal{T},R>caligraphic_M := < caligraphic_S , caligraphic_A , caligraphic_T , italic_R >. At each time step t𝑡titalic_t, given the current environment state st𝒮subscript𝑠𝑡𝒮s_{t}\in\mathcal{S}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_S, an action at𝒜subscript𝑎𝑡𝒜a_{t}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A is performed according to a policy Π:𝒮𝒜:Π𝒮𝒜\Pi:\mathcal{S}\rightarrow\mathcal{A}roman_Π : caligraphic_S → caligraphic_A. Then the environment reaches at the next state st+1subscript𝑠𝑡1s_{t+1}italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT according to the transition dynamics 𝒯(st+1st,at)𝒯conditionalsubscript𝑠𝑡1subscript𝑠𝑡subscript𝑎𝑡\mathcal{T}\left(s_{t+1}\mid s_{t},a_{t}\right)caligraphic_T ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). The reward function R:𝒮×𝒜:𝑅𝒮𝒜R:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}italic_R : caligraphic_S × caligraphic_A → blackboard_R indicates the feedback rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the environment. Given a finite horizon (suppose T𝑇Titalic_T steps) of interactions with the environment, a sampled trajectory is defined as τ:=(s0,a0,s1,,sT)assign𝜏subscript𝑠0subscript𝑎0subscript𝑠1subscript𝑠𝑇\tau:=\left(s_{0},a_{0},s_{1},\cdots,s_{T}\right)italic_τ := ( italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Then an MDP is solved by finding an optimal policy πsuperscript𝜋\pi^{*}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that maximizes the expected accumulated rewards over all possible trajectories:

(2) π=argmaxπΠ𝔼τπ(τ)[t=0Tγt1rt]superscript𝜋𝜋Πsubscript𝔼similar-to𝜏𝜋𝜏delimited-[]superscriptsubscript𝑡0𝑇superscript𝛾𝑡1subscript𝑟𝑡\pi^{*}=\underset{\pi\in\Pi}{\arg\max}\mathbb{E}_{\tau\sim\pi(\tau)}[\sum_{t=0% }^{T}\gamma^{t-1}r_{t}]italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_UNDERACCENT italic_π ∈ roman_Π end_UNDERACCENT start_ARG roman_arg roman_max end_ARG blackboard_E start_POSTSUBSCRIPT italic_τ ∼ italic_π ( italic_τ ) end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]

where π(τ)𝜋𝜏\pi(\tau)italic_π ( italic_τ ) denotes the sampling probability of τ𝜏\tauitalic_τ and γ𝛾\gammaitalic_γ is a pre-defined discount factor. In the context of DRL (Mnih et al., 2015), the policy π𝜋\piitalic_π is parameterized with a neural network πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, which makes the gradient based learning methods (e.g., PPO (Schulman et al., 2017)) available for searching the optimal policy.

4. Methodology

4.1. LCC Overview

Refer to caption
Figure 2. The overall structure of LCC.

LCC primarily consists of three main components: a problem set ΥΥ\Upsilonroman_Υ with N𝑁Nitalic_N problems, a CC decomposition strategy pool ΛΛ\Lambdaroman_Λ, and an underlying EA optimizer (e.g., CMA-ES). ΛΛ\Lambdaroman_Λ includes a variety of strategies chosen from existing decomposition strategies. The detailed architecture, illustrated in Figure 2, can be conceptualized as an MDP. In an MDP, as introduced in Section 2, multiple elements are fundamental, such as state st𝒮subscript𝑠𝑡𝒮s_{t}\in\mathcal{S}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_S, action at𝒜subscript𝑎𝑡𝒜a_{t}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A, and reward R:𝒮×𝒜:𝑅𝒮𝒜R:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}italic_R : caligraphic_S × caligraphic_A → blackboard_R. The DRL agent targets at the optimal policy πsuperscript𝜋\pi^{*}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that select an appropriate decomposition strategy in ΛΛ\Lambdaroman_Λ to maximize the expected accumulated reward over all the problems υΥ𝜐Υ\upsilon\in\Upsilonitalic_υ ∈ roman_Υ as π=argmaxπΠ1Nk=1Nt=0Tγt1R(st,at|υ)superscript𝜋𝜋Π1𝑁superscriptsubscript𝑘1𝑁superscriptsubscript𝑡0𝑇superscript𝛾𝑡1𝑅subscript𝑠𝑡conditionalsubscript𝑎𝑡𝜐\pi^{*}=\underset{\pi\in\Pi}{\arg\max}\frac{1}{N}\sum_{k=1}^{N}\sum_{t=0}^{T}% \gamma^{t-1}R\left(s_{t},a_{t}|\upsilon\right)italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_UNDERACCENT italic_π ∈ roman_Π end_UNDERACCENT start_ARG roman_arg roman_max end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_R ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_υ ). First, a problem υ𝜐\upsilonitalic_υ is selected from the problem set ΥΥ\Upsilonroman_Υ. For this problem, we analyze Global Optimization (GO) information, Subgroup Decomposition (SD) information, and Action History (AH) information using Exploratory Landscape Analysis (ELA) (Mersmann et al., 2011) and Fitness Landscape Analysis (FLA) (Pitzer and Affenzeller, 2012). This analysis is used to design the state to ensure it contains sufficient information to select an appropriate decomposition strategy. Based on the state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, LCC selects a decomposition strategy from the CC decomposition strategy pool ΛΛ\Lambdaroman_Λ to decompose the problem and subsequently optimize each subgroup using the underlying EA optimizer. A corresponding reward is designed to reflect improvements in the MDP \mathcal{M}caligraphic_M. Finally, the optimized subgroups are combined to form a new global population, completing an epoch. With CMA-ES as the underlying optimizer, we instantiate LCC, naming it the LCC-CMAES algorithm. Using LCC-CMAES as a concrete example, we will describe the specific design of the MDP and the network.

4.2. MDP Formulation

4.2.1. State

The state space of LCC-CMAES encompasses the Global Optimization (GO) information sGO12subscript𝑠GOsuperscript12s_{\text{GO}}\in\mathbb{R}^{12}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT, Subgroup Decomposition (SD) information sSD4×msubscript𝑠SDsuperscript4𝑚s_{\text{SD}}\in\mathbb{R}^{4\times m}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 × italic_m end_POSTSUPERSCRIPT, and Action History (AH) information sAH2×Lsubscript𝑠AHsuperscript2𝐿s_{\text{AH}}\in\mathbb{R}^{2\times L}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 × italic_L end_POSTSUPERSCRIPT. Details are shown in the Table 1.

Table 1. State features.
Feature Feature Index Calculation Formula Explain
sGOsubscript𝑠GOs_{\text{GO}}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT 1 Max(ωtradius)Maxsubscript𝜔𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Max}(\frac{\omega_{t}}{radius})Max ( divide start_ARG italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG ) Max(),Min()MaxMin\text{Max}(\cdot),\text{Min}(\cdot)Max ( ⋅ ) , Min ( ⋅ ) extracts the maximum and minimum element in the vector. Mean()Mean\text{Mean}(\cdot)Mean ( ⋅ ) extracts the mean of the vector elements. The ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the global mean at step t𝑡titalic_t. The radius𝑟𝑎𝑑𝑖𝑢𝑠radiusitalic_r italic_a italic_d italic_i italic_u italic_s is the search radius of the problem, which is half of the difference between the upper and lower bounds. To reflect the state of population optimization and status of CMA-ES
2 Mean(ωtradius)Meansubscript𝜔𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Mean}(\frac{\omega_{t}}{radius})Mean ( divide start_ARG italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG )
3 Min(ωtradius)Minsubscript𝜔𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Min}(\frac{\omega_{t}}{radius})Min ( divide start_ARG italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG )
4 Max(Corrcoef(Ct))MaxCorrcoefsubscript𝐶𝑡\text{Max}(\text{Corrcoef}(C_{t}))Max ( Corrcoef ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) Corrcoef()Corrcoef\text{Corrcoef}(\cdot)Corrcoef ( ⋅ ) transforms a covariance matrix into a correlation coefficient matrix. Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the global covariance matrix at step t𝑡titalic_t. To reflect the correlations between variables and status of CMA-ES
5 Mean(Corrcoef(Ct))MeanCorrcoefsubscript𝐶𝑡\text{Mean}(\text{Corrcoef}(C_{t}))Mean ( Corrcoef ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
6 Min(Corrcoef(Ct))MinCorrcoefsubscript𝐶𝑡\text{Min}(\text{Corrcoef}(C_{t}))Min ( Corrcoef ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )
7 σtradiussubscript𝜎𝑡𝑟𝑎𝑑𝑖𝑢𝑠\frac{\sigma_{t}}{radius}divide start_ARG italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG The σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the global step size at step t𝑡titalic_t. To reflect the current global exploration and exploitation conditions
8 Max(gbesttradius)Max𝑔𝑏𝑒𝑠subscript𝑡𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Max}(\frac{gbest_{t}}{radius})Max ( divide start_ARG italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG ) The gbestt𝑔𝑏𝑒𝑠subscript𝑡𝑡gbest_{t}italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the global best point at step t𝑡titalic_t. To reflect the current position of the optimization state within the problem domain.
9 Mean(gbesttradius)Mean𝑔𝑏𝑒𝑠subscript𝑡𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Mean}(\frac{gbest_{t}}{radius})Mean ( divide start_ARG italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG )
10 Min(gbesttradius)Min𝑔𝑏𝑒𝑠subscript𝑡𝑡𝑟𝑎𝑑𝑖𝑢𝑠\text{Min}(\frac{gbest_{t}}{radius})Min ( divide start_ARG italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s end_ARG )
11 ftft1subscriptsuperscript𝑓𝑡subscriptsuperscript𝑓𝑡1\frac{f^{*}_{t}}{f^{*}_{t-1}}divide start_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG The ftsubscriptsuperscript𝑓𝑡f^{*}_{t}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the global best fitness at step t𝑡titalic_t. To reflect the incremental optimization effects of each step.
12 FEsMaxFEs𝐹𝐸𝑠𝑀𝑎𝑥𝐹𝐸𝑠\frac{FEs}{MaxFEs}divide start_ARG italic_F italic_E italic_s end_ARG start_ARG italic_M italic_a italic_x italic_F italic_E italic_s end_ARG FEs𝐹𝐸𝑠FEsitalic_F italic_E italic_s is the number of remaining function evaluations, and MaxFEs𝑀𝑎𝑥𝐹𝐸𝑠MaxFEsitalic_M italic_a italic_x italic_F italic_E italic_s is the maximum number of function
evaluations. To keep the agent informed about computational budget consumption
sSDsubscript𝑠SDs_{\text{SD}}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT 13-22 Mean(Corrcoef(Csubi))MeanCorrcoefsubscript𝐶𝑠𝑢subscript𝑏𝑖\text{Mean}(\text{Corrcoef}(C_{sub_{i}}))Mean ( Corrcoef ( italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) Csubisubscript𝐶𝑠𝑢subscript𝑏𝑖C_{sub_{i}}italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the covariance matrix of subgroup i𝑖iitalic_i. To reflect the correlation between variables within the subgroup.
23-32 Mean(Δsubpopiλ×radius)MeanΔ𝑠𝑢𝑏𝑝𝑜subscript𝑝𝑖𝜆𝑟𝑎𝑑𝑖𝑢𝑠\text{Mean}(\frac{\Delta subpop_{i}}{\lambda\times radius})Mean ( divide start_ARG roman_Δ italic_s italic_u italic_b italic_p italic_o italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_λ × italic_r italic_a italic_d italic_i italic_u italic_s end_ARG ) The ΔsubpopiΔ𝑠𝑢𝑏𝑝𝑜subscript𝑝𝑖\Delta subpop_{i}roman_Δ italic_s italic_u italic_b italic_p italic_o italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the sum of the vector set consisting of the difference between the last generation and the first
generation of each element of the subgroup. Inspired by the Delta method mentioned in Section 2.2,
it aim to reveal interactions between variables.
33-42 Mean(Var(subpopi)radius2)MeanVar𝑠𝑢𝑏𝑝𝑜subscript𝑝𝑖𝑟𝑎𝑑𝑖𝑢superscript𝑠2\text{Mean}(\frac{\text{Var}(subpop_{i})}{radius^{2}})Mean ( divide start_ARG Var ( italic_s italic_u italic_b italic_p italic_o italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_r italic_a italic_d italic_i italic_u italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) Var()Var\text{Var}(\cdot)Var ( ⋅ ) is calculated for the elements at each position within the vector set and subpopi𝑠𝑢𝑏𝑝𝑜subscript𝑝𝑖subpop_{i}italic_s italic_u italic_b italic_p italic_o italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a vector set of all
the generations of the subgroup. To reflect the volatility of the population across different dimensions within
that subgroup and the exploration and exploitation of each dimension.
43-52 dmaxidiametersubscript𝑑𝑚𝑎subscript𝑥𝑖𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟\frac{d_{max_{i}}}{diameter}divide start_ARG italic_d start_POSTSUBSCRIPT italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_i italic_a italic_m italic_e italic_t italic_e italic_r end_ARG The dmaxisubscript𝑑𝑚𝑎subscript𝑥𝑖d_{max_{i}}italic_d start_POSTSUBSCRIPT italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the maximum distance in the subgroup i𝑖iitalic_i’s population. diameter𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟diameteritalic_d italic_i italic_a italic_m italic_e italic_t italic_e italic_r is the diameter of the search space.
To describe the convergence state of the population within subgroup i𝑖iitalic_i.
sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT 53-55 (Δrj)numjΔsubscript𝑟𝑗𝑛𝑢subscript𝑚𝑗\frac{\sum(\Delta r_{j})}{num_{j}}divide start_ARG ∑ ( roman_Δ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n italic_u italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ()\sum(\cdot)∑ ( ⋅ ) is the sum of all values \cdot for action j𝑗jitalic_j and j𝑗jitalic_j is the index of action, j = 1,2,3. The ΔrjΔsubscript𝑟𝑗\Delta r_{j}roman_Δ italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT refers to the difference between
the reward obtained at step t𝑡titalic_t for action j𝑗jitalic_j and the reward obtained at step t1𝑡1t-1italic_t - 1. The numj𝑛𝑢subscript𝑚𝑗num_{j}italic_n italic_u italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the number of times
action j𝑗jitalic_j has been selected. To reflect the algorithm j𝑗jitalic_j’s contribution to optimization.
56-58 (Δgbest(j))2×radius×numjΔ𝑔𝑏𝑒𝑠superscript𝑡𝑗2𝑟𝑎𝑑𝑖𝑢𝑠𝑛𝑢subscript𝑚𝑗\frac{\sum(\Delta gbest^{(j)})}{2\times radius\times num_{j}}divide start_ARG ∑ ( roman_Δ italic_g italic_b italic_e italic_s italic_t start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 × italic_r italic_a italic_d italic_i italic_u italic_s × italic_n italic_u italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG The Δgbest(j)Δ𝑔𝑏𝑒𝑠superscript𝑡𝑗\Delta gbest^{(j)}roman_Δ italic_g italic_b italic_e italic_s italic_t start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT refers to the Euclidean norm of the difference between gbestt𝑔𝑏𝑒𝑠subscript𝑡𝑡gbest_{t}italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT obtained at step t𝑡titalic_t for action j𝑗jitalic_j
and gbestt1𝑔𝑏𝑒𝑠subscript𝑡𝑡1gbest_{t-1}italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT obtained at step t1𝑡1t-1italic_t - 1. To reflect the algorithm j𝑗jitalic_j’s effectiveness in optimization.

For sGOsubscript𝑠GOs_{\text{GO}}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT, reflects the CMA-ES state and global optimization state, revealing the complexity and difficulty of the optimization problem as well as the relationships between various dimensions. For sSDsubscript𝑠SDs_{\text{SD}}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT, we have designed four types of features based on probabilistic and statistical methods within the CC framework to reflect the variable grouping status within a subgroup, which provides detailed insights into the dynamics of variable relationships and optimization progress in the subgroups. For sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT, given that LCC includes a CC decomposition strategy pool ΛΛ\Lambdaroman_Λ, we derive sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT to provide the RL agent with additional contextual knowledge about the optimization capabilities of the candidate strategies.

Finally, the complete state in the MDP of LCC-CMAES is the integration of sGOsubscript𝑠GOs_{\text{GO}}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT, sSDsubscript𝑠SDs_{\text{SD}}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT and sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT.

(3)  state :={sGO12,sSD4×m,sAH2×L}assign state formulae-sequencesubscript𝑠GOsuperscript12formulae-sequencesubscript𝑠SDsuperscript4𝑚subscript𝑠AHsuperscript2𝐿\text{ state }:=\left\{s_{\text{GO}}\in\mathbb{R}^{12},s_{\text{SD}}\in\mathbb% {R}^{4\times m},s_{\text{AH}}\in\mathbb{R}^{2\times L}\right\}state := { italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 × italic_m end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 × italic_L end_POSTSUPERSCRIPT }

Here, m𝑚mitalic_m represents the number of subgroups (where m𝑚mitalic_m is 10 in LCC-CMAES), and L𝐿Litalic_L denotes the number of CC decomposition strategies in the pool (where L𝐿Litalic_L is 3 in LCC-CMAES).

4.2.2. Action

We designed a strategy pool ΛΛ\Lambdaroman_Λ in advance, containing various decomposition strategies for selection. LCC selects a CC decomposition strategy from ΛΛ\Lambdaroman_Λ based on the state to achieve dynamic decomposition. For the purpose of balancing exploration and exploitation, LCC-CMAES utilize three types of decomposition strategies (Liu and Tang, 2013): MiVD, RD, and MaVD as introduced in Section 3.1. This operation is represented as an integer, which indicates the index of the chosen strategy within the strategy pool of L𝐿Litalic_L candidate strategies, denoted as a[1,L]𝑎1𝐿a\in[1,L]italic_a ∈ [ 1 , italic_L ]. Next, based on the selected strategy, the problem is divided into smaller-dimensional subproblems and then optimized using CMA-ES. The optimization results of each subproblem are subsequently combined into a global optimization result.

4.2.3. Reward

To guide the agent towards achieving a lower cost, the reward function should consider the absolute reduction in cost at each time step t𝑡titalic_t:

(4) rt=ft1ftf0fsubscript𝑟𝑡superscriptsubscript𝑓𝑡1superscriptsubscript𝑓𝑡superscriptsubscript𝑓0superscript𝑓r_{t}=\frac{f_{t-1}^{*}-f_{t}^{*}}{f_{0}^{*}-f^{*}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

where ft1superscriptsubscript𝑓𝑡1f_{t-1}^{*}italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and ftsuperscriptsubscript𝑓𝑡f_{t}^{*}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are the global best fitness in the t𝑡titalic_t-1111 step and the t𝑡titalic_t step. fsuperscript𝑓f^{*}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the optimal fitness of the problem, f0superscriptsubscript𝑓0f_{0}^{*}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the global best fitness in the initial population, which serves as a normalization factor. This measures the performance improvement brought in the step t𝑡titalic_t optimization.

4.3. Network Design

As shown in Figure 3, the network consists of three modules: Feature Processing, Actor, and Critic. sGOsubscript𝑠GOs_{\text{GO}}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT, sSDsubscript𝑠SDs_{\text{SD}}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT, and sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT are first fused to form a state representation vector DV𝐷𝑉DVitalic_D italic_V. Based on this representation, the Actor outputs the probability distribution of candidate strategies, while the Critic estimates the return value.

The Actor decides the probability for selecting a strategy from the CC decomposition strategies pool. As mentioned in Section 4.2.1, LCC-CMAES has m=10𝑚10m=10italic_m = 10 subgroups and L=3𝐿3L=3italic_L = 3 actions. We first concatenate sGO12subscript𝑠GOsuperscript12s_{\text{GO}}\in\mathbb{R}^{12}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT, sSD4×msubscript𝑠SDsuperscript4𝑚s_{\text{SD}}\in\mathbb{R}^{4\times m}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 × italic_m end_POSTSUPERSCRIPT and sAH2×Lsubscript𝑠AHsuperscript2𝐿s_{\text{AH}}\in\mathbb{R}^{2\times L}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 × italic_L end_POSTSUPERSCRIPT to generate the Decision Vector DV58𝐷𝑉superscript58DV\in\mathbb{R}^{58}italic_D italic_V ∈ blackboard_R start_POSTSUPERSCRIPT 58 end_POSTSUPERSCRIPT as DV=sGOsSDsAH𝐷𝑉direct-sumsubscript𝑠GOsubscript𝑠SDsubscript𝑠AHDV=s_{\text{GO}}\oplus s_{\text{SD}}\oplus s_{\text{AH}}italic_D italic_V = italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT ⊕ italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT ⊕ italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT. Then we map DV𝐷𝑉DVitalic_D italic_V to a three-layer Multi-Layer Perceptron (MLP) network with the structure (58×64×64×L586464𝐿58\times 64\times 64\times L58 × 64 × 64 × italic_L), cooperating with a ReLU (Nair and Hinton, 2010) activation after each hidden layer. Following the Softmax operation, the Actor outputs a probability distribution over the strategy pool ΛΛ\Lambdaroman_Λ, which is then used to sample the strategy.

Refer to caption
Figure 3. The Neural Network workflow for πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT (Actor) and vϕsubscript𝑣italic-ϕv_{\phi}italic_v start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT(Critic).

The Critic also takes DV𝐷𝑉DVitalic_D italic_V as input and uses the same MLP structure as Actor, where the output dimension is set to be 1111 for critic value prediction. However, their MLP parameters are not shared, and the training is conducted independently.

4.4. Workflow

LCC-CMAES’s workflow begins with selecting a problem υ𝜐\upsilonitalic_υ from the problem set ΥΥ\Upsilonroman_Υ, initializing the global dimension D𝐷Ditalic_D, global covariance matrix C0=Isubscript𝐶0𝐼C_{0}=Iitalic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_I, global population P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, global step size σ0=radiussubscript𝜎0𝑟𝑎𝑑𝑖𝑢𝑠\sigma_{0}=radiusitalic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r italic_a italic_d italic_i italic_u italic_s and global mean vector ω0subscript𝜔0\omega_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Then training for υ𝜐\upsilonitalic_υ starts and terminates when MaxFEs𝑀𝑎𝑥𝐹𝐸𝑠MaxFEsitalic_M italic_a italic_x italic_F italic_E italic_s is exhausted or the global best fitness gbestt𝑔𝑏𝑒𝑠subscript𝑡𝑡gbest_{t}italic_g italic_b italic_e italic_s italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is lower than the termination error. After initialization, an MDP starts. At step t𝑡titalic_t, state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be calculated by following Table 1. Based on stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the Actor policy πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with parameters θ𝜃\thetaitalic_θ takes the Decision Vector stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as input and outputs the probability distribution of candidate strategies π(at|st)𝜋conditionalsubscript𝑎𝑡subscript𝑠𝑡\pi(a_{t}|s_{t})italic_π ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), while the the critic network vϕsubscript𝑣italic-ϕv_{\phi}italic_v start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT with parameters ϕitalic-ϕ\phiitalic_ϕ predicts the expected return values (accumulated rewards) of stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Once the strategy is determined, the problem is decomposed into subgroups, marking the end of the CC problem decomposition layer and transitioning into the subgroup optimization layer. Each subgroup is optimized using CMA-ES until SubMaxFEs𝑆𝑢𝑏𝑀𝑎𝑥𝐹𝐸𝑠SubMaxFEsitalic_S italic_u italic_b italic_M italic_a italic_x italic_F italic_E italic_s is reached, with σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT updated by the offspring in each subgroup. Once all subgroup optimization completed, σtsubscript𝜎𝑡\sigma_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is updated to σt+1subscript𝜎𝑡1\sigma_{t+1}italic_σ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, ωtsubscript𝜔𝑡\omega_{t}italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are updated to Ct+1subscript𝐶𝑡1C_{t+1}italic_C start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, ωt+1subscript𝜔𝑡1\omega_{t+1}italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, and Pt+1subscript𝑃𝑡1P_{t+1}italic_P start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, using on the Csubisubscript𝐶𝑠𝑢subscript𝑏𝑖C_{sub_{i}}italic_C start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and ωsubisubscript𝜔𝑠𝑢subscript𝑏𝑖\omega_{sub_{i}}italic_ω start_POSTSUBSCRIPT italic_s italic_u italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT obtained from each subgroup, as mentioned in Section 3.1. At this point, state st+1subscript𝑠𝑡1s_{t+1}italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT can be calculated by following Table 1. Then the reward rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is observed. The trajectories of states stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, actions atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and rewards rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are recorded and then used by the PPO method to train the policy net πθsubscript𝜋𝜃\pi_{\theta}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and the critic net vϕsubscript𝑣italic-ϕv_{\phi}italic_v start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT for K𝐾Kitalic_K times after the completion of optimization. PPO is trained in an actor-critic manner. It proposes a novel objective with clipped probability ratios, which forms a first-order estimate (i.e., lower bound) of the policy’s performance. Its objective function at the k𝑘kitalic_k-th learning iteration (k[1,K]𝑘1𝐾k\in[1,K]italic_k ∈ [ 1 , italic_K ]) is defined as: Lπ(θ(k)):=𝔼[min(η(θ(k))A^,clip(η(θ(k)),1ϵ,1+ϵ)A^)]assignsubscript𝐿𝜋superscript𝜃𝑘𝔼delimited-[]𝜂superscript𝜃𝑘^𝐴clip𝜂superscript𝜃𝑘1italic-ϵ1italic-ϵ^𝐴L_{\pi}(\theta^{(k)}):=\mathbb{E}\left[\min\left(\eta(\theta^{(k)})\hat{A},% \operatorname{clip}(\eta(\theta^{(k)}),1-\epsilon,1+\epsilon)\hat{A}\right)\right]italic_L start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) := blackboard_E [ roman_min ( italic_η ( italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) over^ start_ARG italic_A end_ARG , roman_clip ( italic_η ( italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) , 1 - italic_ϵ , 1 + italic_ϵ ) over^ start_ARG italic_A end_ARG ) ] where η(k):=πθ(k)(at|st)πθ(0)(at|st)assignsuperscript𝜂𝑘subscript𝜋superscript𝜃𝑘conditionalsubscript𝑎𝑡subscript𝑠𝑡subscript𝜋superscript𝜃0conditionalsubscript𝑎𝑡subscript𝑠𝑡\eta^{(k)}:=\frac{\pi_{\theta^{(k)}}(a_{t}|s_{t})}{\pi_{\theta^{(0)}}(a_{t}|s_% {t})}italic_η start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT := divide start_ARG italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG is the ratio of the probabilities under the current policy and the old policy before the K𝐾Kitalic_K-step learning process, performing the importance sampling. A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG is the estimated advantage calculated as the difference between the target return G𝐺Gitalic_G and the estimated return G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG. Using Lπ(θ)subscript𝐿𝜋𝜃L_{\pi}(\theta)italic_L start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_θ ), the gradients are back-propagated through the network to update the parameters and achieve the training effect. The critic network vϕsubscript𝑣italic-ϕv_{\phi}italic_v start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT takes the Decision Vector as input and outputs a critic value prediction to estimate the return value G^^𝐺\hat{G}over^ start_ARG italic_G end_ARG. The loss function of the critic network vϕsubscript𝑣italic-ϕv_{\phi}italic_v start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT is: Lv(ϕ):=MSE(G,G^)assignsubscript𝐿𝑣italic-ϕMSE𝐺^𝐺L_{v}(\phi):=\operatorname{MSE}(G,\hat{G})italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_ϕ ) := roman_MSE ( italic_G , over^ start_ARG italic_G end_ARG ).

5. Experiments

5.1. Experimental Setup

5.1.1. Comparison Algorithms for LCC-CMAES

For a comprehensive comparisons, we selected CC-CMAES (Liu and Tang, 2013), CSG (Tian et al., 2024b), ERDG (Yang et al., 2020), MDG (Chen et al., 2022), and FII (Ge et al., 2015) as the comparison algorithms under CC framework. We then selected LSGO baselines without CC: CMA-ES (Hansen, 2016), Sep-CMAES (Ros and Hansen, 2008), LM-CMA (Loshchilov, 2017), LM-MA-ES (Loshchilov et al., 2018). Besides, MetaBBO method MetaES (Lange et al., 2023) and local search method L-BFGS (Byrd et al., 1995) were also chosen. Among the comparison algorithms under CC framework, CC-CMAES, having the same decomposition strategy pool, is introduced in Section 3.1. CSG, ERDG, MDG, FII are the algorithms that need addition FEs costs for decomposition: CSG is currently the most powerful multi-stage variable identification algorithm; ERDG is a more efficient variant of RDG3 (Sun et al., 2019) (the 2018 CEC LSGO champion); MDG is an algorithm that addresses overlapping problems; FII is a rapid identification algorithm that reduces the FEs consumed by decomposition. These algorithms under CC framework all use CMA-ES as the underly optimizer. Among other types of comparison algorithms, MetaES discovers evolutionary strategies via MetaBBO and serves as the global optimization comparison algorithm within MetaBBO; L-BFGS is an optimization algorithm from outside the evolutionary algorithms community, and it is widely used in practical applications, especially for LSGO.

5.1.2. Benchmark and Hyperparameter Settings

The CEC 2013 LSGO benchmark (Li et al., 2013) comprises a total of N=15𝑁15N=15italic_N = 15 problems, which are divided into five types of functions: Fully-separable Functions (F1-F3), Partially Additively Separable Functions with a separable subcomponent (F4-F7), Partially Additively Separable Functions with no separable subcomponents (F8-F11), Overlapping Functions (F12-F14), and Non-separable Function (F15). Additionally, we partitioned the CEC 2013 LSGO benchmark suite, as shown in Table 2. An asterisk “*” marks the problems used for training, while the rest were used for testing. Except for the Non-separable Function, each category has training problems, and F1, F4, and F8 are variants of the Elliptic Function; F5 and F9 are variants of the Rastrigin Function. This allows for testing LCC-CMAES’s generalizability on unseen functions within the same type, with the Non-separable Functions tested as the unseen functions and type.

The hyperparameter settings in this paper are as follows: the total number of generations (TG𝑇𝐺TGitalic_T italic_G) is 50, the offspring size (λ𝜆\lambdaitalic_λ) is 20, subgroup maximum function evaluations (SubMaxFEs𝑆𝑢𝑏𝑀𝑎𝑥𝐹𝐸𝑠SubMaxFEsitalic_S italic_u italic_b italic_M italic_a italic_x italic_F italic_E italic_s) is 1E3, the number of subgroups (m𝑚mitalic_m) is 10, learning rate (lr𝑙𝑟lritalic_l italic_r) is 6E-4, number of epochs (Epoch𝐸𝑝𝑜𝑐Epochitalic_E italic_p italic_o italic_c italic_h) is 90, Mini PPO iterations (K𝐾Kitalic_K) is 3, the problem’s dimension (D𝐷Ditalic_D) is 1E3 (F13 and F14 in CEC2013LSGO is 905), the number of action selecting (ns𝑛𝑠nsitalic_n italic_s) is 20. It’s worth noting that, to realistically address the practical problems and accommodate the extra decomposition costs required by various decomposition strategies, we set MaxFEs=TG×λ×m×ns=50×20×10×20=𝑀𝑎𝑥𝐹𝐸𝑠𝑇𝐺𝜆𝑚𝑛𝑠50201020absentMaxFEs=TG\times\lambda\times m\times ns=50\times 20\times 10\times 20=italic_M italic_a italic_x italic_F italic_E italic_s = italic_T italic_G × italic_λ × italic_m × italic_n italic_s = 50 × 20 × 10 × 20 = 2E5. The settings of the comparison algorithms remain the same as those in their original papers.

5.2. Comparison Analysis

Table 2. Comparing LCC-CMAES with comparison algorithms on CEC 2013 LSGO.
problem Algorithms under CC Framework Algorithms without CC
LCC-CMAES CC-CMAES CSG ERDG MDG FII CMA-ES Sep-CMAES LM-CMA LM-MA-ES MetaES L-BFGS
*1 2.045E+07 3.244E+08(+) 4.182E+08(+) 4.227E+08(+) 4.182E+08(+) 4.227E+08(+) 4.223E+08(+) 8.591E+06(-) 2.313E+07(+) 4.691E+07(+) 2.420E+11(+) 9.085E+09(+)
±plus-or-minus\pm±2.781E+06 ±plus-or-minus\pm±1.112e+08 ±plus-or-minus\pm±3.882E+07 ±plus-or-minus\pm±3.825E+07 ±plus-or-minus\pm±3.884E+07 ±plus-or-minus\pm±2.231E+07 ±plus-or-minus\pm±3.827E+07 ±plus-or-minus\pm±1.777e+06 ±plus-or-minus\pm±4.576E+06 ±plus-or-minus\pm±2.973E+06 ±plus-or-minus\pm±2.351E+10 ±plus-or-minus\pm±1.396E+08
2 4.419E+03 2.159E+03(-) 2.636E+03(-) 2.555E+03(-) 2.636E+03(-) 2.592E+03(-) 5.108E+03(+) 5.414E+03(+) 1.949E+04(+) 6.886E+03(+) 4.789E+04(+) 4.050E+04(+)
±plus-or-minus\pm±2.022E+02 ±plus-or-minus\pm±4.125E+02 ±plus-or-minus\pm±1.393E+02 ±plus-or-minus\pm±1.294E+02 ±plus-or-minus\pm±1.397E+02 ±plus-or-minus\pm±9.076E+01 ±plus-or-minus\pm±2.32E+02 ±plus-or-minus\pm±4.213E+01 ±plus-or-minus\pm±1.483E+03 ±plus-or-minus\pm±2.674E+02 ±plus-or-minus\pm±5.721E+03 ±plus-or-minus\pm±3.281E+03
3 2.007E+01 2.036E+01(+) 2.162E+01(+) \(+) 2.161E+01(+) 2.161E+01(+) 2.161E+01(+) 2.11E+01(+) 2.052E+01(+) 2.172E+01(+) 2.171E+01(+) 2.165E+01(+)
±plus-or-minus\pm±4.182E-02 ±plus-or-minus\pm±2.289E-02 ±plus-or-minus\pm±4.061E-03 \ ±plus-or-minus\pm±7.218E-03 ±plus-or-minus\pm±2.374E-01 ±plus-or-minus\pm±4.197E-02 ±plus-or-minus\pm±2.394E-02 ±plus-or-minus\pm±3.692E-02 ±plus-or-minus\pm±3.486E-02 ±plus-or-minus\pm±7.976E-01 ±plus-or-minus\pm±7.686E-02
*4 6.932E+10 1.902E+11(+) 3.502E+10(-) 4.561E+10(-) 3.270E+10(-) 2.603E+12(+) 2.681E+12(+) 1.632E+11(+) 2.492E+10(-) 1.220E+11(+) 2.440E+12(+) 5.084E+12(+)
±plus-or-minus\pm±9.792E+09 ±plus-or-minus\pm±8.122E+10 ±plus-or-minus\pm±3.645E+09 ±plus-or-minus\pm±3.218E+09 ±plus-or-minus\pm±6.397E+09 ±plus-or-minus\pm±1.218E+12 ±plus-or-minus\pm±7.639E+11 ±plus-or-minus\pm±2.938E+10 ±plus-or-minus\pm±1.009E+09 ±plus-or-minus\pm±8.678E+10 ±plus-or-minus\pm±5.409E+11 ±plus-or-minus\pm±5.728E+11
*5 5.386E+06 9.455E+06(+) 1.079E+06(-) 2.053E+06(-) 1.146E+06(-) 1.172E+06(-) 4.205E+06(\approx) 2.274E+06(-) 9.370E+06(+) 1.534E+06(-) 4.976E+07(+) 4.890E+07(+)
±plus-or-minus\pm±1.982E+06 ±plus-or-minus\pm±2.514E+06 ±plus-or-minus\pm±1.485E+05 ±plus-or-minus\pm±2.363E+05 ±plus-or-minus\pm±1.938E+05 ±plus-or-minus\pm±1.845E+05 ±plus-or-minus\pm±2.954E+05 ±plus-or-minus\pm±4.255E+05 ±plus-or-minus\pm±1.295E+06 ±plus-or-minus\pm±1.126E+05 ±plus-or-minus\pm±5.214E+06 ±plus-or-minus\pm±2.386E+06
6 1.048E+06 1.056E+06(\approx) 1.066E+06(+) \(+) 1.066E+06(+) 1.065E+06(+) 1.062E+06(+) 1.078E+06(+) 1.038E+05(\approx) 1.063E+06(+) 1.000E+06(-) 1.071E+06(+)
±plus-or-minus\pm±2.147E+03 ±plus-or-minus\pm±4.632E+03 ±plus-or-minus\pm±7.156E+02 \ ±plus-or-minus\pm±5.673E+02 ±plus-or-minus\pm±8.675E+02 ±plus-or-minus\pm±1.287E+03 ±plus-or-minus\pm±2.745E+03 ±plus-or-minus\pm±1.703E+03 ±plus-or-minus\pm±6.845E+03 ±plus-or-minus\pm±8.232E+03 ±plus-or-minus\pm±3.219E+04
7 7.306E+08 3.268E+09(+) 2.140E+07(-) 5.059E+07(-) 7.295E+06(-) 4.442E+09(+) 6.739E+08(\approx) 1.756E+09(+) 3.080E+08(-) 2.871E+07(-) 3.240E+14(+) 1.210E+15(+)
±plus-or-minus\pm±2.502E+08 ±plus-or-minus\pm±2.404E+09 ±plus-or-minus\pm±8.465E+06 ±plus-or-minus\pm±8.356E+06 ±plus-or-minus\pm±4.113E+06 ±plus-or-minus\pm±1.532E+09 ±plus-or-minus\pm±2.035E+08 ±plus-or-minus\pm±7.201E+08 ±plus-or-minus\pm±2.736E+07 ±plus-or-minus\pm±4.212E+06 ±plus-or-minus\pm±7.248E+13 ±plus-or-minus\pm±5.317E+10
*8 2.299E+15 1.547E+16(+) 3.488E+15(\approx) 1.543E+16(+) 1.513E+15(\approx) 2.885E+15(\approx) 6.162E+16(+) 3.184E+15(+) 1.061E+13(-) 2.189E+15(\approx) 1.940E+16(+) 1.460E+17(+)
±plus-or-minus\pm±2.047E+15 ±plus-or-minus\pm±7.053E+15 ±plus-or-minus\pm±2.872E+15 ±plus-or-minus\pm±6.205E+15 ±plus-or-minus\pm±1.194E+15 ±plus-or-minus\pm±1.413E+15 ±plus-or-minus\pm±1.492E+16 ±plus-or-minus\pm±7.612E+14 ±plus-or-minus\pm±4.268E+12 ±plus-or-minus\pm±1.062E+15 ±plus-or-minus\pm±1.382E+16 ±plus-or-minus\pm±2.014E+15
*9 5.818E+08 8.084E+08(+) 6.199E+08(+) 1.443E+09(+) 5.104E+08(\approx) 6.007E+08(\approx) 6.812E+08(+) 3.890E+08(-) 6.380E+08(+) 3.020E+08(-) 3.270E+09(+) 3.730E+09(+)
±plus-or-minus\pm±1.195E+08 ±plus-or-minus\pm±1.636E+08 ±plus-or-minus\pm±2.793E+07 ±plus-or-minus\pm±1.572E+08 ±plus-or-minus\pm±2.175E+08 ±plus-or-minus\pm±4.263E+07 ±plus-or-minus\pm±1.751E+07 ±plus-or-minus\pm±3.527E+07 ±plus-or-minus\pm±2.165E+06 ±plus-or-minus\pm±3.214E+07 ±plus-or-minus\pm±6.832E+07 ±plus-or-minus\pm±5.833E+08
10 9.423E+07 9.375E+07(\approx) 9.464E+07(\approx) 9.576E+07(+) 9.538E+07(+) 9.523E+07(+) 9.464E+07(\approx) 9.447E+07(\approx) 9.062E+07(-) 9.829E+07(+) 9.803E+07(+) 9.696E+07(+)
±plus-or-minus\pm±5.623E+05 ±plus-or-minus\pm±6.224E+05 ±plus-or-minus\pm±2.725E+05 ±plus-or-minus\pm±1.653E+05 ±plus-or-minus\pm±1.712E+05 ±plus-or-minus\pm±1.332E+05 ±plus-or-minus\pm±1.426E+05 1.012E+05 ±plus-or-minus\pm±1.191E+05 ±plus-or-minus\pm±3.321E+05 3.115E+05 ±plus-or-minus\pm±2.133E+05
11 8.243E+09 2.297E+11(+) 2.998E+17(+) 3.609E+17(+) 4.770E+17(+) 6.327E+17(+) 2.364E+10(+) 1.850E+10(+) 5.620E+08(-) 1.720E+09(-) 6.630E+22(+) 6.470E+16(+)
±plus-or-minus\pm±6.352E+09 ±plus-or-minus\pm±6.313E+10 ±plus-or-minus\pm±1.245E+15 ±plus-or-minus\pm±1.431E+16 ±plus-or-minus\pm±1.586E+15 ±plus-or-minus\pm±8.848E+14 ±plus-or-minus\pm±4.967E+09 ±plus-or-minus\pm±2.786E+08 ±plus-or-minus\pm±3.921E+07 ±plus-or-minus\pm±2.179E+08 ±plus-or-minus\pm±8.795E+21 ±plus-or-minus\pm±2.545E+15
*12 2.135E+03 1.766E+05(+) 2.497E+05(+) 4.140E+06(+) 1.321E+03(-) 1.079E+03(-) 1.103E+03(-) 1.068E+03(-) 2.157E+03(\approx) 1.079E+03(-) 6.860E+12(+) 1.040E+08(+)
±plus-or-minus\pm±4.072E+02 ±plus-or-minus\pm±1.484E+04 ±plus-or-minus\pm±6.373E+04 ±plus-or-minus\pm±1.464E+06 ±plus-or-minus\pm±4.063E+02 ±plus-or-minus\pm±5.374E+01 ±plus-or-minus\pm±8.365E+01 ±plus-or-minus\pm±1.476E+02 ±plus-or-minus\pm±5.921E+01 ±plus-or-minus\pm±1.343E+02 ±plus-or-minus\pm±3.215E+12 ±plus-or-minus\pm±2.354E+07
*13 1.298E+10 2.730E+10(+) 2.332E+11(+) 8.073E+15(+) 1.747E+10(\approx) 3.274E+12(+) 7.857E+09(-) 1.698E+10(\approx) 5.530E+09(-) 8.330E+08(-) 2.100E+21(+) 9.440E+16(+)
±plus-or-minus\pm±2.643E+09 ±plus-or-minus\pm±8.185E+09 ±plus-or-minus\pm±4.099E+10 ±plus-or-minus\pm±1.785E+16 ±plus-or-minus\pm±5.493E+09 ±plus-or-minus\pm±2.203E+12 ±plus-or-minus\pm±1.964E+09 ±plus-or-minus\pm±1.034E+10 ±plus-or-minus\pm±1.353E+09 ±plus-or-minus\pm±2.573E+08 ±plus-or-minus\pm±7.595E+20 ±plus-or-minus\pm±5.221E+15
14 1.323E+11 3.761E+11(+) 6.072E+11(+) 3.599E+13(+) 5.212E+21(+) 2.953E+13(+) 5.600E+08(-) 2.081E+10(-) 1.09E+10(-) 1.110E+10(-) 8.990E+08(-) 5.350E+22(+)
±plus-or-minus\pm±3.895E+10 ±plus-or-minus\pm±1.856E+11 ±plus-or-minus\pm±1.253E+11 ±plus-or-minus\pm±1.415E+13 ±plus-or-minus\pm±7.284E+21 ±plus-or-minus\pm±2.512E+13 ±plus-or-minus\pm±1.545E+10 ±plus-or-minus\pm±1.461E+10 ±plus-or-minus\pm±5.999E+09 ±plus-or-minus\pm±1.496E+08 ±plus-or-minus\pm±7.818E+21 ±plus-or-minus\pm±2.527E+17
15 3.090E+07 3.306E+07(\approx) 1.508E+08(+) \(+) 9.125E+07(+) 9.612E+07(+) 9.565E+07(+) 4.67E+08(+) 4.752E+07(+) 3.760E+07(+) 8.230E+15(+) 2.350E+15(+)
±plus-or-minus\pm±5.235E+06 ±plus-or-minus\pm±4.994E+06 ±plus-or-minus\pm±1.593E+07 \ ±plus-or-minus\pm±7.689E+06 ±plus-or-minus\pm±1.317E+07 ±plus-or-minus\pm±1.628E+07 ±plus-or-minus\pm±7.041E+06 ±plus-or-minus\pm±2.128E+06 ±plus-or-minus\pm±2.507E+06 ±plus-or-minus\pm±6.947E+14 ±plus-or-minus\pm±5.319E+14
NA 11/3/1 8/4/3 11/0/4 6/4/5 10/2/3 9/3/3 8/2/5 6/2/7 7/1/7 13/0/2 15/0/0

5.2.1. Comparison With Other Algorithms

Table 2 presents the mean optimization results from 25 independent runs for each algorithm. The symbols “+”, “-”, and “\approx” denote the outcomes of the Wilcoxon rank-sum test at the 0.05 significance level, indicating whether the competing method performed better (+), worse (-), or showed no significant difference (\approx) compared to LCC-CMAES. The last column shows the test results for each algorithm, listing the number of times LCC-CMAES significantly outperformed competitors, instances with no significant difference, and cases where LCC-CMAES performed worse.

Based on the results from Table 2, we can analyze the following outcomes:

  • Superior Performance Within CC Frameworks: LCC-CAMES demonstrates significant advantages compared to existing advanced algorithms within the CC framework. LCC-CMAES shows pronounced superiority on more challenging grouping problems (such as F12-F14 Overlapping Functions), highlighting its capability to handle more complex real-world problems.

  • Generalization Capability: LCC-CMAES exhibits a degree of generalizability, thanks to the well-designed state that provides it with ample information. It shows certain generalization on similar problem types, such as F11, achieving commendable results when trained on F8-F9. Moreover, LCC-CMAES also demonstrated generalizability on completely unseen problem types (F15). This reveals LCC’s ability to solve more complex real-world problems.

  • Improvement Over CC-CMAES: LCC-CMAES also shows significant improvements compared to CC-CMAES, which employs the same decomposition strategy pool. This improvement is attributed to our effective design of the reward and state, which encourage more rational decomposition strategies and superior outcomes.

  • Comparison of Baselines Without CC: LCC-CMAES surpasses CMA-ES, LM-MA-ES, MetaES, L-BFGS and shows competitive performance with LM-CMA which validates the effectiveness of LCC-CMAES.

5.2.2. Comparison on Extended Optimization Horizon

To investigate the performance of algorithms under extended optimization horizons, we present the performance curves of the baselines on the 15 problems of CEC2013LSGO with 3E6 MaxFEs𝑀𝑎𝑥𝐹𝐸𝑠MaxFEsitalic_M italic_a italic_x italic_F italic_E italic_s in Figure 4. In most problems, LCC-CMAES performs better than CC-CMAES, demonstrating the universality of action selection effectiveness. In the early and middle stages of decomposition, LCC-CMAES outperforms other algorithms under the CC. However, after decomposition is complete, this advantage gradually diminishes and may even be surpassed. In most problems, except for L-BFGS and MetaES, which show poorer optimization results, algorithms without CC exhibit more prominent optimization performance. Algorithms under the CC perform better on separable problems (F4-F11), but struggle to show significant optimization effects (stepwise decline) on overlapping (F13-F14) and fully non-separable problems (F15). These problems are difficult to decompose, so all dimensions are often treated as a whole during optimization. This will also result in substantial resource consumption, as CMAES struggles with LSGO (e.g., the time cost of ERDG on F13 under 3E6 FEs reaches 60,000 seconds, whereas LCC-CMAES requires only 1,200 seconds). LCC-CMAES is not subject to such limitations, as it continues to perform decomposition even in overlapping and fully non-separable problems, hence significantly surpasses CC methods on these problems.

Refer to caption
Figure 4. Comparison with a 3E6 budget in CEC2013LSGO.
Table 3. Comparison of resource consumption.
Algorithm LCC-CMAES CC-CMAES CSG ERDG MDG FII CMA-ES
FEs 0 0 4.86E+04 1.28E+05 4.10E+03 4.52E+03 0
Time Cost (s) 85.12 83.88 275.05 173.75 332.32 264.81 450.4

5.2.3. Comparison of Resource Consumption

In the analysis above, LCC-CMAES demonstrated its capability to handle more complex problems with a small budget. Besides the additional FEs, the time cost for decomposition and optimization is another bottleneck that constrains the algorithm from being applied to more complex and higher-dimensional problems. Therefore, we conducted a more thorough investigation into the additional FEs and the time cost for decomposition and optimization of each algorithm, aiming for a more detailed presentation of resource consumption.

Tables 3 show the additional FEs for decomposition and the averaged optimization time cost (in seconds) for each problem and each run. To avoid the issue of additional FEs becoming excessively high due to certain problems being difficult to decompose, we use the median to reflect the additional FEs. A notable advantage of LCC-CMAES is that it does not require additional FEs for decomposition, and due to its simple actor and critic network design, the time expenditure for LCC-CMAES is relatively low. This provides a feasible approach for solving more complex and higher-dimensional real-world problems. Other algorithms under CC framework often suffer from excessive costs due to the difficulty in identifying separable types; even when problems are identifiable, the presence of overlapping issues can result in many subgroups remaining too large, thereby causing substantial time expenditures for the underlying optimizer. Besides, we note that the algorithm without CC usually require more optimization time than CC methods. For instance, CC methods based on CMA-ES are faster than CMA-ES without CC. It is contributed by the decomposed smaller subspace dimensions, which validates the necessary of problem decomposition and CC.

5.3. The Transferability Study

To more effectively test the transferability of LCC-CMAES, we tested using an entirely new set of problems that it had never encountered before with the same settings. The majority of separable functions in the CEC 2013 LSGO are additively separable, with the Ackley function being the only non-additively separable function among the basic functions. To address these limitations, the BNS (Chen et al., 2022) introduces four non-additively separable base functions, including two multiplicatively separable base functions and two composite separable base functions. Based on these basis functions, BNS designs 12 test problems with varying degrees of separability. Compared to CEC 2013 LSGO, the problems in BNS are closer to the potential complex problems encountered in real-world scenarios. All algorithm settings are consistent with those used in the comparison in CEC2013 LSGO.

Table 4. Comparing LCC-CMAES with comparison algorithms on BNS.
problem LCC-CMAES CC-CMAES CSG ERDG MDG FII CMA-ES Sep-CMAES LM-MA-ES LM-CMA
1 4.752E-08 1.901E-07(+) 1.741E-06(+) 3.305E+06(+) 4.463E-03(+) 7.022E-04(+) 4.924E-11(-) 0.000E+00(-) 0.000E+00(-) 0.000E+00(-)
±plus-or-minus\pm±3.125E-08 ±plus-or-minus\pm±1.523E-07 ±plus-or-minus\pm±9.037E-07 ±plus-or-minus\pm±1.905E+06 ±plus-or-minus\pm±2.578E-03 ±plus-or-minus\pm±8.325E-04 ±plus-or-minus\pm±6.043E-11 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00
2 7.044E+00 6.947E+00(\approx) 1.035E+01(+) 1.198E+11(+) 1.185E+01(+) 1.432E+01(+) 1.038E+01(+) 2.375E+02(+) 8.306E+03(+) 6.403E+01(+)
±plus-or-minus\pm±4.794E-01 ±plus-or-minus\pm±2.413E+00 ±plus-or-minus\pm±1.532E+00 ±plus-or-minus\pm±5.394E+09 ±plus-or-minus\pm±3.865E+00 ±plus-or-minus\pm±4.413E+00 ±plus-or-minus\pm±4.152E-01 ±plus-or-minus\pm±6.350E+01 ±plus-or-minus\pm±4.066E+03 ±plus-or-minus\pm±1.583+01
3 7.292E+05 5.375E+05(\approx) 7.173E+05(\approx) 8.092E+06 (+) 2.569E+06(+) 7.136E+05(\approx) 8.565E+05(+) 8.361E+02(-) 3.472E+06(+) 2.881E+06(+)
±plus-or-minus\pm±3.484E+05 ±plus-or-minus\pm±2.567E+05 ±plus-or-minus\pm±3.925E+05 ±plus-or-minus\pm±4.024E+04 ±plus-or-minus\pm±2.493E+06 ±plus-or-minus\pm±3.935E+05 ±plus-or-minus\pm±1.889E+05 ±plus-or-minus\pm±4.656E+01 ±plus-or-minus\pm±2.455E+05 ±plus-or-minus\pm±1.323E+05
4 1.423E+09 4.124E+09(+) 2.545E+10(+) 6.812E+11(+) 1.389E+11(\approx) 8.347E+10(+) 1.852E+10(+) 3.865E+09(+) 7.713E+09(+) 9.296E+09(+)
±plus-or-minus\pm±3.39E+08 ±plus-or-minus\pm±1.484E+08 ±plus-or-minus\pm±8.115E+08 ±plus-or-minus\pm±5.046E+09 ±plus-or-minus\pm±7.765E+10 ±plus-or-minus\pm±4.438E+09 ±plus-or-minus\pm±6.297E+08 ±plus-or-minus\pm±2.779E+08 ±plus-or-minus\pm±2.510E+08 ±plus-or-minus\pm±5.068E+08
5 0.000E+00 4.402E-08(+) 4.414E-02(+) 3.682E+06(+) 5.003E-02(+) 2.624E-02(+) 4.225E-11(+) 0.000E+00(\approx) 0.000E+00(\approx) 0.000E+00(\approx)
±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±4.876E-08 ±plus-or-minus\pm±5.875E-03 ±plus-or-minus\pm±2.384E+06 ±plus-or-minus\pm±1.665E-03 ±plus-or-minus\pm±5.914E-03 ±plus-or-minus\pm±6.183E-11 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00
6 2.105E+01 4.084E+01(+) 2.278E+03(+) 1.189E+11 (+) 5.694E+04(+) 2.366E+04(+) 8.482E+00(-) 2.763E+02(+) 4.035E+01(+) 3.913E+02(+)
±plus-or-minus\pm±3.663E+00 ±plus-or-minus\pm±9.414E+00 ±plus-or-minus\pm±1.385E+03 ±plus-or-minus\pm±4.286E+10 ±plus-or-minus\pm±9.847E+03 ±plus-or-minus\pm±7.106E+03 ±plus-or-minus\pm±1.185E+00 ±plus-or-minus\pm±2.884E+01 ±plus-or-minus\pm±5.894E+00 ±plus-or-minus\pm±3.653E+01
7 3.841E+06 3.846E+06(\approx) 5.554E+06(+) 8.095E+06(+) 6.334E+06(+) 6.376E+06(+) 9.947E+05(-) 2.673E+06(-) 2.829E+06(-) 3.845E+06(\approx)
±plus-or-minus\pm±3.052E+05 ±plus-or-minus\pm±9.843E+04 ±plus-or-minus\pm±5.645E+04 ±plus-or-minus\pm±5.256E+03 ±plus-or-minus\pm±4.891E+04 ±plus-or-minus\pm±4.795E+04 ±plus-or-minus\pm±1.596E+05 ±plus-or-minus\pm±2.435E+05 ±plus-or-minus\pm±1.932E+05 ±plus-or-minus\pm±6.532E+05
8 1.982E+10 2.423E+10(+) 1.844E+11(+) 6.705E+11(+) 1.085E+11(+) 1.137E+11(+) 1.932E+10(\approx) 2.674E+10(+) 8.473E+09(-) 4.77E+09(-)
±plus-or-minus\pm±1.124E+09 ±plus-or-minus\pm±1.523E+09 ±plus-or-minus\pm±1.804E+10 ±plus-or-minus\pm±5.125E+10 ±plus-or-minus\pm±5.472E+09 ±plus-or-minus\pm±5.171E+09 ±plus-or-minus\pm±4.945E+08 ±plus-or-minus\pm±1.142E+09 ±plus-or-minus\pm±5.432E+08 ±plus-or-minus\pm±5.223E+08
9 4.883E-08 2.094E-07(+) 7.395E-04(+) 1.443E+07(+) 5.534E-03(+) 3.975E-02(+) 4.452E-08(\approx) 0.000E+00(-) 0.000E+00(-) 0.000E+00(-)
±plus-or-minus\pm±1.705E-08 ±plus-or-minus\pm±1.467E-07 ±plus-or-minus\pm±2.178E-04 ±plus-or-minus\pm±8.453E+04 ±plus-or-minus\pm±1.115E-03 ±plus-or-minus\pm±4.973E-03 ±plus-or-minus\pm±1.775E-08 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00 ±plus-or-minus\pm±0.000E+00
10 3.074E+02 4.213E+04(+) 3.725E+04(+) 7.054E+10(+) 1.635E+06(+) 1.713E+05(+) 1.062E+01(-) 2.943E+02(\approx) 5.309E+01(-) 2.154E+03(+)
±plus-or-minus\pm±2.901E+02 ±plus-or-minus\pm±3.372E+04 ±plus-or-minus\pm±9.991E+03 ±plus-or-minus\pm±3.705E+09 ±plus-or-minus\pm±6.398E+05 ±plus-or-minus\pm±7.537E+04 ±plus-or-minus\pm±8.083E-01 ±plus-or-minus\pm±9.455E+01 ±plus-or-minus\pm±1.618E+01 ±plus-or-minus\pm±1.401E+02
11 3.762E+06 4.062E+06(+) 3.506E+06(\approx) 8.057E+06(+) 7.228E+06(+) 7.196E+06(+) 4.574E+06(+) 3.475E+06(\approx) 6.021E+06(+) 3.777E+06(\approx)
±plus-or-minus\pm±2.562E+05 ±plus-or-minus\pm±1.235E+05 ±plus-or-minus\pm±7.774E+04 ±plus-or-minus\pm±3.916E+04 ±plus-or-minus\pm±4.735E+04 ±plus-or-minus\pm±4.051E+04 ±plus-or-minus\pm±2.962E+05 ±plus-or-minus\pm±5.456E+05 ±plus-or-minus\pm±6.778E+04 ±plus-or-minus\pm±6.542E+05
12 2.821E+10 4.289E+10(+) 5.365E+10(+) 6.782E+11(+) 1.952E+10(-) 2.075E+10(-) 1.911E+10(-) 3.913E+10(+) 2.782E+10(-) 4.612E+09(-)
±plus-or-minus\pm±1.632E+09 ±plus-or-minus\pm±2.254E+09 ±plus-or-minus\pm±1.965E+09 ±plus-or-minus\pm±3.076E+10 ±plus-or-minus\pm±8.082E+08 ±plus-or-minus\pm±1.494E+09 ±plus-or-minus\pm±7.118E+08 ±plus-or-minus\pm±2.371E+09 ±plus-or-minus\pm±2.407E+09 ±plus-or-minus\pm±4.578+E08
NA 9/3/0 10/2/0 12/0/0 10/1/1 10/1/1 5/2/5 5/3/4 5/2/5 5/3/4

Table 4 presents the comparative results, indicating that LCC-CMAES has certain advantages. On one hand, some algorithms consume excessively high additional FEs for decomposition on BNS problems, which prevents them from focusing resources on the optimization process, leading to poor performance (e.g., ERDG). On the other hand, some algorithms fail to correctly identify variable interactions and group them on more complex problems, resulting in poor performance (e.g., MDG, with almost zero correct grouping rate). These results also reveal the transferability of LCC-CMAES and its potential in tackling more complex real-world problems: trained on simpler benchmarks, it can transfer decomposition knowledge to more complex real scenarios, discovering grouping structures through learning rather than relying on expert-level knowledge. However, LSGO variant algorithms and CMA-ES don’t encounter issues with inaccurate decomposition and additional FEs, and it can fully leverage the relationships between variables. This underscores the challenge that CC faces in thousands to low dimensions compared to global optimizers.

5.4. Comparison on Neuroevolution tasks

In this section, we adopt four Neuroevolution (Such et al., 2017) tasks as a showcase on real-world applications, in which optimization algorithms are used to evolve a population of neural networks according to their performance on a specific machine learning task such as robotic control (Galván and Mooney, 2021). Concretely, we consider the real-parameter optimization of 2-layer MLPs for 4 Mujoco (Todorov et al., 2012) robot control tasks: InvertedDoublePendulum-v4, HalfCheetah-v4, Pusher-v4 and Ant-v4. We set the hidden dimensions of the MLPs to 64 with Tanh activation function, while the input and output dimensions match the control protocols of the Mujoco tasks, leading to 833, 1542, 1991 and 2312 dimensions for the four MLPs of the four tasks, respectively. Because the evaluations of the networks are time consuming, we set the maximum function evaluations (FEs) of all tasks to 1,000. For baselines, we zero-shot the pre-trained LCC-CMAES agent in Section 5.2 to the Neuroevolution problems. The CC-based baselines except CC-CMAES fail to decompose the problem dimensions within 1,000 FEs so we do not include them in the comparison. The hyper-parameter settings of LCC-CMAES and included baselines are consistency with Section 5.1.2 except the total number of generations (TG𝑇𝐺TGitalic_T italic_G) and the offspring size (λ𝜆\lambdaitalic_λ) of LCC-CMAES which are both set to 5. Since the targets in the Mujoco tasks are to maximize the accumulated rewards gained by the networks, in Table 5 we present the negative accumulated rewards obtained by the networks optimized by LCC-CMAES and baselines to keep the minimization optimization manner.

The results show that even on realistic optimization problems with larger problem dimensions and more complex variable relationship, our LCC-CMAES still retains its advantages over CC-CMAES and global optimization baselines, validating its effectiveness.

Table 5. Comparison results on Neuroevolution tasks.
LCC-CMAES CC-CMAES CMA-ES Sep-CMAES LM-CMA LM-MA-ES MetaES L-BFGS
InvertedDoublePendulum-v4
(833D)
-5.111E+03
±plus-or-minus\pm±2.914E+02
-4.854E+03 (+)
±plus-or-minus\pm±2.354E+02
-4.714E+03 (+)
±plus-or-minus\pm±2.354E+02
-4.988E+03 (+)
±plus-or-minus\pm±2.644E+02
-4.971E+03 (+)
±plus-or-minus\pm±2.541E+02
-4.951E+03 (+)
±plus-or-minus\pm±2.455E+02
-4.596E+03 (+)
±plus-or-minus\pm±2.944E+02
-4.322E+03 (+)
±plus-or-minus\pm±2.831E+02
HalfCheetah-v4
(1542D)
-2.451E+02
±plus-or-minus\pm±2.514E+02
-2.017E+02 (+)
±plus-or-minus\pm±2.119E+02
-1.914E+02 (+)
±plus-or-minus\pm±1.645E+02
-1.849E+02 (+)
±plus-or-minus\pm±2.002E+02
-1.897E+02 (+)
±plus-or-minus\pm±2.129E+02
-1.744E+02 (+)
±plus-or-minus\pm±1.988E+02
-5.554E+01 (+)
±plus-or-minus\pm±9.624E+01
-4.997E+01 (+)
±plus-or-minus\pm±9.487E+01
Pusher-v4
(1991D)
3.354E+02
±plus-or-minus\pm±2.984E+01
3.543E+02 (+)
±plus-or-minus\pm±3.791E+01
3.497E+02 (+)
±plus-or-minus\pm±4.016E+01
3.481E+02 (+)
±plus-or-minus\pm±3.594E+01
3.344E+02 (-)
±plus-or-minus\pm±2.326E+01
3.411E+02 (+)
±plus-or-minus\pm±2.746E+01
3.894E+02 (+)
±plus-or-minus\pm±4.687E+01
3.909E+02 (+)
±plus-or-minus\pm±6.314E+01
Ant-v4
(2312D)
-1.083E+03
±plus-or-minus\pm±6.476E+01
-1.080E+03 (\approx)
±plus-or-minus\pm±6.524E+01
-1.073E+03 (+)
±plus-or-minus\pm±5.146E+01
-1.066E+03 (+)
±plus-or-minus\pm±5.687E+01
-1.085E+03 (\approx)
±plus-or-minus\pm±6.971E+01
-1.079E+03 (\approx)
±plus-or-minus\pm±5.377E+01
-1.015E+03 (+)
±plus-or-minus\pm±3.345E+01
-1.004E+03 (+)
±plus-or-minus\pm±2.154E+01
NA 3/1/0 4/0/0 4/0/0 2/1/1 3/1/0 4/0/0 4/0/0

5.5. Ablation Study

5.5.1. State features Study

To verify the necessity of the framework components, we conducted ablation studies on the state features. Specifically, we separately removed the embeddings and concatenations of the sAHsubscript𝑠AHs_{\text{AH}}italic_s start_POSTSUBSCRIPT AH end_POSTSUBSCRIPT (denoted as W/O AH), sGOsubscript𝑠GOs_{\text{GO}}italic_s start_POSTSUBSCRIPT GO end_POSTSUBSCRIPT (W/O GO), and sSDsubscript𝑠SDs_{\text{SD}}italic_s start_POSTSUBSCRIPT SD end_POSTSUBSCRIPT (W/O SD). We then tested these modifications on the CEC 2013 LSGO under the same settings otherwise unchanged.

The results are shown in Figure 5 (A). For each problem, we use the average performance of all runs for each algorithm and conduct the min-max normalization over all algorithms to restrict their performance into [0,1]01[0,1][ 0 , 1 ] and eliminate the cost scale gaps between different problems. The 1limit-from11-1 - mean performance over all problems of each algorithms and their error bars are presented, where the higher is better. It is evident that the performance significantly deteriorates when these features are removed, highlighting their crucial roles to provide sufficient information being available to the RL agent.

5.5.2. Reward Study

The design of the reward mechanism needs to employ a ratio-based approach within the range of -1 to 1 to address that evaluation values can vary significantly across different problems and excessive impacts on network training. Here are two other reward designs that meet these requirements:

1) Global best fitness descent ratio: The global best fitness is calculated by subtracting it from the initial generation global best fitness and normalizing the decline by the global best fitness of the initial generation. reward1 := rt=f0ftf0subscript𝑟𝑡superscriptsubscript𝑓0superscriptsubscript𝑓𝑡superscriptsubscript𝑓0r_{t}=\frac{f_{0}^{*}-f_{t}^{*}}{f_{0}^{*}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

2) Relative global best fitness descent ratio: The global best fitness decline normalized by the global best fitness in previous generation. reward2 := rt=ft1ftft1subscript𝑟𝑡superscriptsubscript𝑓𝑡1superscriptsubscript𝑓𝑡superscriptsubscript𝑓𝑡1r_{t}=\frac{f_{t-1}^{*}-f_{t}^{*}}{f_{t-1}^{*}}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG

Refer to caption
Figure 5. The ablation study on state features and reward designs.

Figure 5 (B) also presents the results under different reward schemes, with the same normalization as introduced in Section 5.5.1. It can be observed that the reward1 and the reward2 are significantly less effective than the approach we have adopted. This ineffectiveness is due to the fact that using a scheme that subtracts the initial generation’s fitness can lead to many subsequent fitness values significantly lower than the initial generation, making the numerator approximately equal to the initial value, causing the formula to approach 1. Additionally, normalizing the global best fitness from the previous generation causes the standard for reward normalization to change with each generation. This inconsistency makes it challenging for the RL agent to select appropriate actions.

6. Conclusion and Future work

We have proposed LCC, a pioneering learning-based cooperative coevolution framework that dynamically schedules decomposition strategies during optimization processes. With CMA-ES as the underlying optimizer, we instantiate LCC, naming it the LCC-CMAES algorithm. Unlike previous algorithms under the CC framework, LCC-CMAES does not use expert-designed knowledge for decomposition but instead utilizes statistical features for DRL to select most-expected decomposition strategies. More importantly, LCC-CMAES does not require the additional FEs for decomposition, allowing it to focus resources on optimization. When tested against several other advanced algorithms on two benchmarks, CEC 2013 LSGO and BNS, the comparative results demonstrated that LCC-CMAES holds a distinct advantage, especially on complex real-world problems that it had not previously encountered. This underscores LCC’s robustness, adaptability and transferability, making it a promising approach for tackling complex optimization challenges in various settings.

Looking ahead to future work, we hope to: (1) investigate the inclusion of more complex or higher-dimensional features that may capture deeper insights into the problem’s structure; (2) design more rational and effective decomposition actions. These goals aim to refine LCC’s effectiveness and applicability, ensuring it can be a versatile tool in the LSGO, capable of addressing a broader range of complex challenges.

Acknowledgements.
This work was supported in part by the National Natural Science Foundation of China No. 62276100, in part by the Guangdong Provincial Natural Science Foundation for Outstanding Youth Team Project No. 2024B1515040010, in part by the Guangdong Natural Science Funds for Distinguished Young Scholars No. 2022B1515020049, and in part by the TCL Young Scholars Program.

References

  • (1)
  • Akimoto and Hansen (2016) Youhei Akimoto and Nikolaus Hansen. 2016. Projection-based restricted covariance matrix adaptation for high dimension. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. 197–204.
  • Bhattacharya et al. (2016) Maumita Bhattacharya, Rafiqul Islam, and Jemal Abawajy. 2016. Evolutionary optimization: a big data perspective. Journal of Network and Computer Applications 59 (2016), 416–426.
  • Byrd et al. (1995) Richard H Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. 1995. A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing 16, 5 (1995), 1190–1208.
  • Chen et al. (2024b) Jiacheng Chen, Zeyuan Ma, Hongshu Guo, Yining Ma, Jie Zhang, and Yue-jiao Gong. 2024b. Symbol: Generating Flexible Black-Box Optimizers through Symbolic Equation Learning. arXiv preprint arXiv:2402.02355 (2024).
  • Chen et al. (2022) Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, and Gary G Yen. 2022. A decomposition method for both additively and non-additively separable problems. IEEE Transactions on Evolutionary Computation (2022).
  • Chen et al. (2025) Minyang Chen, Chenchen Feng, and Ran Cheng. 2025. MetaDE: Evolving Differential Evolution by Differential Evolution. IEEE Transactions on Evolutionary Computation (2025).
  • Chen et al. (2019) Wei-Neng Chen, Ya-Hui Jia, Feng Zhao, Xiao-Nan Luo, Xing-Dong Jia, and Jun Zhang. 2019. A cooperative co-evolutionary approach to large-scale multisource water distribution network optimization. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 842–857.
  • Chen et al. (2024a) Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. 2024a. Symbolic discovery of optimization algorithms. Advances in Neural Information Processing Systems 36 (2024).
  • Dranka et al. (2021) Géremi Gilson Dranka, Paula Ferreira, and A Ismael F Vaz. 2021. A review of co-optimization approaches for operational and planning problems in the energy sector. Applied Energy 304 (2021), 117703.
  • Elsken et al. (2019) Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. Journal of Machine Learning Research 20, 55 (2019), 1–21.
  • Faldor et al. (2025) Maxence Faldor, Robert Tjarko Lange, and Antoine Cully. 2025. Discovering Quality-Diversity Algorithms via Meta-Black-Box Optimization. arXiv preprint arXiv:2502.02190 (2025).
  • Galván and Mooney (2021) Edgar Galván and Peter Mooney. 2021. Neuroevolution in deep neural networks: Current trends and future challenges. IEEE Transactions on Artificial Intelligence 2, 6 (2021), 476–493.
  • Ge et al. (2015) Hongwei Ge, Liang Sun, Xin Yang, Shinichi Yoshida, and Yanchun Liang. 2015. Cooperative differential evolution with fast variable interdependence learning and cross-cluster mutation. Applied Soft Computing 36 (2015), 300–314.
  • Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 51, 5, Article 93 (aug 2018), 42 pages. https://doi.org/10.1145/3236009
  • Guo et al. (2025b) Hongshu Guo, Sijie Ma, Zechuan Huang, Yuzhi Hu, Zeyuan Ma, Xinglin Zhang, and Yue-Jiao Gong. 2025b. Reinforcement Learning-based Self-adaptive Differential Evolution through Automated Landscape Feature Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Guo et al. (2024) Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, and Yue-Jiao Gong. 2024. Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2024).
  • Guo et al. (2025a) Hongshu Guo, Zeyuan Ma, Jiacheng Chen, Yining Ma, Zhiguang Cao, Xinglin Zhang, and Yue-Jiao Gong. 2025a. ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 26982–26990.
  • Hammer (1962) PC Hammer. 1962. Adaptive control processes: a guided tour (R. Bellman).
  • Hansen (2016) Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772 (2016).
  • He et al. (2020) Xiaoyu He, Zibin Zheng, and Yuren Zhou. 2020. MMES: Mixture model-based evolution strategy for large-scale optimization. IEEE Transactions on Evolutionary Computation 25, 2 (2020), 320–333.
  • Jia et al. (2020) Ya-Hui Jia, Yi Mei, and Mengjie Zhang. 2020. Contribution-based cooperative co-evolution for nonseparable large-scale problems with overlapping subcomponents. IEEE Transactions on Cybernetics 52, 6 (2020), 4246–4259.
  • Komarnicki et al. (2024) Marcin Michal Komarnicki, Michal Witold Przewozniczek, Renato Tinós, and Xiaodong Li. 2024. Overlapping Cooperative Co-Evolution for Overlapping Large-Scale Global Optimization Problems. In Proceedings of the Genetic and Evolutionary Computation Conference. 665–673.
  • Lange et al. (2023) Robert Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dalibard, Chris Lu, Satinder Singh, and Sebastian Flennerhag. 2023. Discovering evolution strategies via meta-black-box optimization. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation. 29–30.
  • Li et al. (2022) Jian-Yu Li, Zhi-Hui Zhan, Kay Chen Tan, and Jun Zhang. 2022. Dual differential grouping: A more general decomposition method for large-scale optimization. IEEE Transactions on Cybernetics (2022).
  • Li et al. (2024a) Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zhen, and Ke Tang. 2024a. Bridging evolutionary algorithms and reinforcement learning: A comprehensive survey on hybrid algorithms. IEEE Transactions on Evolutionary Computation (2024).
  • Li et al. (2013) Xiaodong Li, Ke Tang, Mohammad N Omidvar, Zhenyu Yang, Kai Qin, and Hefei China. 2013. Benchmark functions for the CEC 2013 special session and competition on large-scale global optimization. Gene 7, 33 (2013), 8.
  • Li et al. (2025) Xiaobin Li, Kai Wu, Xiaoyu Zhang, and Handing Wang. 2025. B2Opt: Learning to optimize black-box optimization with little budget. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 18502–18510.
  • Li et al. (2024b) Xiaobin Li, Kai Wu, Xiaoyu Zhang, Handing Wang, Jing Liu, et al. 2024b. Pretrained optimization model for zero-shot black box optimization. Advances in Neural Information Processing Systems 37 (2024), 14283–14324.
  • Li and Zhang (2017) Zhenhua Li and Qingfu Zhang. 2017. A simple yet efficient evolution strategy for large-scale black-box optimization. IEEE Transactions on Evolutionary Computation 22, 5 (2017), 637–646.
  • Lian et al. (2024) Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, and Yue-Jiao Gong. 2024. RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Liao et al. (2023) Zuowen Liao, Wenyin Gong, and Shuijia Li. 2023. Two-stage reinforcement learning-based differential evolution for solving nonlinear equations. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2023).
  • Liu et al. (2024) Jing Liu, Ruhul Sarker, Saber Elsayed, Daryl Essam, and Nurhadi Siswanto. 2024. Large-scale evolutionary optimization: A review and comparative study. Swarm and Evolutionary Computation (2024), 101466.
  • Liu and Tang (2013) Jinpeng Liu and Ke Tang. 2013. Scaling up covariance matrix adaptation evolution strategy using cooperative coevolution. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 350–357.
  • Loshchilov (2017) Ilya Loshchilov. 2017. LM-CMA: An alternative to L-BFGS for large-scale black box optimization. Evolutionary computation 25, 1 (2017), 143–171.
  • Loshchilov et al. (2018) Ilya Loshchilov, Tobias Glasmachers, and Hans-Georg Beyer. 2018. Large scale black-box optimization by limited-memory matrix adaptation. IEEE Transactions on Evolutionary Computation 23, 2 (2018), 353–358.
  • Ma et al. (2022) Xiaoliang Ma, Zhitao Huang, Xiaodong Li, Lei Wang, Yutao Qi, and Zexuan Zhu. 2022. Merged differential grouping for large-scale global optimization. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1439–1451.
  • Ma et al. (2024a) Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, and Yue-Jiao Gong. 2024a. Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Ma et al. (2024b) Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Zhenrui Li, Guojun Peng, Yue-Jiao Gong, Yining Ma, and Zhiguang Cao. 2024b. MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning. Advances in Neural Information Processing Systems 36 (2024).
  • Ma et al. (2024c) Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. 2024c. Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization. arXiv preprint arXiv:2411.00625 (2024).
  • Ma et al. (2025a) Zeyuan Ma, Zhiyang Huang, Jiacheng Chen, Zhiguang Cao, and Yue-Jiao Gong. 2025a. Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Ma et al. (2025b) Zeyuan Ma, Hongqiao Lian, Wenjie Qiu, and Yue-Jiao Gong. 2025b. Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Mei et al. (2016) Yi Mei, Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2016. A competitive divide-and-conquer algorithm for unconstrained large-scale black-box optimization. ACM Transactions on Mathematical Software (TOMS) 42, 2 (2016), 1–24.
  • Mersmann et al. (2011) Olaf Mersmann, Bernd Bischl, Heike Trautmann, Mike Preuss, Claus Weihs, and Günter Rudolph. 2011. Exploratory landscape analysis. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. 829–836.
  • Meselhi et al. (2022) Mohamed Meselhi, Ruhul Sarker, Daryl Essam, and Saber Elsayed. 2022. A decomposition approach for large-scale non-separable optimization problems. Applied Soft Computing 115 (2022), 108168.
  • Mnih et al. (2015) Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
  • Mo et al. (2025) Shibing Mo, Kai Wu, Qixuan Gao, Xiangyi Teng, and Jing Liu. 2025. AutoSGNN: automatic propagation mechanism discovery for spectral graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 19493–19502.
  • Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.
  • Omidvar et al. (2010) Mohammad Nabi Omidvar, Xiaodong Li, Zhenyu Yang, and Xin Yao. 2010. Cooperative co-evolution for large scale optimization through more frequent random grouping. In 2010 IEEE Congress on Evolutionary Computation(CEC). IEEE, 1–8.
  • Omidvar et al. (2021a) Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2021a. A review of population-based metaheuristics for large-scale black-box global optimization—Part I. IEEE Transactions on Evolutionary Computation 26, 5 (2021), 802–822.
  • Omidvar et al. (2021b) Mohammad Nabi Omidvar, Xiaodong Li, and Xin Yao. 2021b. A review of population-based metaheuristics for large-scale black-box global optimization—Part II. IEEE Transactions on Evolutionary Computation 26, 5 (2021), 823–843.
  • Omidvar et al. (2017) Mohammad Nabi Omidvar, Ming Yang, Yi Mei, Xiaodong Li, and Xin Yao. 2017. DG2: A faster and more accurate differential grouping for large-scale black-box optimization. IEEE Transactions on Evolutionary Computation 21, 6 (2017), 929–942.
  • Pitzer and Affenzeller (2012) Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. Recent Advances in Intelligent Engineering Systems (2012), 161–191.
  • Potter and De Jong (1994) Mitchell A Potter and Kenneth A De Jong. 1994. A cooperative coevolutionary approach to function optimization. In International conference on parallel problem solving from nature. Springer, 249–257.
  • Qiu et al. (2025) Wenjie Qiu, Hongshu Guo, Zeyuan Ma, and Yue-Jiao Gong. 2025. A Novel Two-Phase Cooperative Co-evolution Framework for Large-Scale Global Optimization with Complex Overlapping. In Proceedings of the Genetic and Evolutionary Computation Conference.
  • Ros and Hansen (2008) Raymond Ros and Nikolaus Hansen. 2008. A simple modification in CMA-ES achieving linear time and space complexity. In International conference on parallel problem solving from nature. Springer, 296–305.
  • Roy and Tiwari (2002) Rajkumar Roy and Ashutosh Tiwari. 2002. Generalised regression GA for handling inseparable function interaction: Algorithm and applications. In International Conference on Parallel Problem Solving from Nature. Springer, 452–461.
  • Salomon (1996) Ralf Salomon. 1996. Re-evaluating genetic algorithm performance under coordinate rotation of benchmark functions. A survey of some theoretical and practical aspects of genetic algorithms. BioSystems 39, 3 (1996), 263–278.
  • Sayed et al. (2012) Eman Sayed, Daryl Essam, and Ruhul Sarker. 2012. Dependency identification technique for large scale optimization problems. In 2012 IEEE Congress on Evolutionary Computation(CEC). IEEE, 1–8.
  • Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  • Shao et al. (2025) Shuai Shao, Ye Tian, and Yajie Zhang. 2025. Deep reinforcement learning assisted surrogate model management for expensive constrained multi-objective optimization. Swarm and Evolutionary Computation 92 (2025), 101817.
  • Sharma et al. (2019) Mudita Sharma, Alexandros Komninos, Manuel López-Ibáñez, and Dimitar Kazakov. 2019. Deep reinforcement learning based parameter control in differential evolution. In Proceedings of the Genetic and Evolutionary Computation Conference. 709–717.
  • Shi et al. (2005) Yan-jun Shi, Hong-fei Teng, and Zi-qiang Li. 2005. Cooperative co-evolutionary differential evolution for function optimization. In Advances in Natural Computation: First International Conference, ICNC 2005, Changsha, China, August 27-29, 2005, Proceedings, Part II 1. Springer, 1080–1088.
  • Soboĺ (1993) IM Soboĺ. 1993. Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp. 1 (1993), 407.
  • Such et al. (2017) Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).
  • Sun et al. (2021) Jianyong Sun, Xin Liu, Thomas Bäck, and Zongben Xu. 2021. Learning adaptive differential evolution algorithm from optimization experiences by policy gradient. IEEE Transactions on Evolutionary Computation 25, 4 (2021), 666–680.
  • Sun et al. (2017) Yuan Sun, Michael Kirley, and Saman K Halgamuge. 2017. A recursive decomposition method for large scale continuous optimization. IEEE Transactions on Evolutionary Computation 22, 5 (2017), 647–661.
  • Sun et al. (2019) Yuan Sun, Xiaodong Li, Andreas Ernst, and Mohammad Nabi Omidvar. 2019. Decomposition for large-scale optimization problems with overlapping components. In 2019 IEEE Congress on Evolutionary Computation (CEC). IEEE, 326–333.
  • Tan and Li (2021) Zhiping Tan and Kangshun Li. 2021. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Applied Soft Computing 111 (2021), 107678.
  • Tian et al. (2024a) Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, and Yaochu Jin. 2024a. An Enhanced Differential Grouping Method for Large-Scale Overlapping Problems. IEEE Transactions on Evolutionary Computation (2024).
  • Tian et al. (2024b) Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, and Gary G Yen. 2024b. A Composite Decomposition Method for Large-Scale Global Optimization. IEEE Transactions on Artificial Intelligence (2024).
  • Tiwari and Roy (2002) Ashutosh Tiwari and Rajkumar Roy. 2002. Variable dependence interaction and multi-objective optimisation. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. 602–609.
  • Tiwari et al. (2001) Ashutosh Tiwari, Rajkumar Roy, Graham Jared, and Olivier Munaux. 2001. Interaction and multi-objective optimisation. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation. 671–678.
  • Todorov et al. (2012) Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026–5033.
  • Van den Bergh and Engelbrecht (2004) Frans Van den Bergh and Andries P Engelbrecht. 2004. A cooperative approach to particle swarm optimization. IEEE Transactions on Evolutionary Computation 8, 3 (2004), 225–239.
  • Vicol et al. (2021) Paul Vicol, Luke Metz, and Jascha Sohl-Dickstein. 2021. Unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies. In International Conference on Machine Learning. PMLR, 10553–10563.
  • Wu and Wang (2022) Di Wu and G Gary Wang. 2022. Employing reinforcement learning to enhance particle swarm optimization methods. Engineering Optimization 54, 2 (2022), 329–348.
  • Xu and Pi (2020) Yue Xu and Dechang Pi. 2020. A reinforcement learning-based communication topology in particle swarm optimization. Neural Computing and Applications 32 (2020), 10007–10032.
  • Xue et al. (2022) Ke Xue, Jiacheng Xu, Lei Yuan, Miqing Li, Chao Qian, Zongzhang Zhang, and Yang Yu. 2022. Multi-agent dynamic algorithm configuration. Advances in Neural Information Processing Systems 35 (2022), 20147–20161.
  • Yang et al. (2023) Ming Yang, Jie Gao, Aimin Zhou, Changhe Li, and Xin Yao. 2023. Contribution-Based Cooperative Co-Evolution With Adaptive Population Diversity for Large-Scale Global Optimization [Research Frontier]. IEEE Computational Intelligence Magazine 18, 3 (2023), 56–68.
  • Yang et al. (2016) Ming Yang, Mohammad Nabi Omidvar, Changhe Li, Xiaodong Li, Zhihua Cai, Borhan Kazimipour, and Xin Yao. 2016. Efficient resource allocation in cooperative co-evolution for large-scale global optimization. IEEE Transactions on Evolutionary Computation 21, 4 (2016), 493–505.
  • Yang et al. (2020) Ming Yang, Aimin Zhou, Changhe Li, and Xin Yao. 2020. An efficient recursive differential grouping for large-scale continuous problems. IEEE Transactions on Evolutionary Computation 25, 1 (2020), 159–171.
  • Yang et al. (2024) Qingyong Yang, Shu-Chuan Chu, Jeng-Shyang Pan, Jyh-Horng Chou, and Junzo Watada. 2024. Dynamic multi-strategy integrated differential evolution algorithm based on reinforcement learning for optimization problems. Complex & Intelligent Systems 10, 2 (2024), 1845–1877.
  • Yang et al. ([n. d.]) Xu Yang, Rui Wang, and Kaiwen Li. [n. d.]. Meta-Black-Box Optimization for Evolutionary Algorithms: Review and Perspective. Available at SSRN 4956956 ([n. d.]).
  • Yang et al. (2008) Zhenyu Yang, Ke Tang, and Xin Yao. 2008. Large scale evolutionary optimization using cooperative coevolution. Information Sciences 178, 15 (2008), 2985–2999.
  • Yi et al. (2022) Wenjie Yi, Rong Qu, Licheng Jiao, and Ben Niu. 2022. Automated design of metaheuristics using reinforcement learning within a novel general search framework. IEEE Transactions on Evolutionary Computation 27, 4 (2022), 1072–1084.
  • Yin et al. (2021) Shiyuan Yin, Yi Liu, GuoLiang Gong, Huaxiang Lu, and Wenchang Li. 2021. RLEPSO: Reinforcement learning based Ensemble particle swarm optimizer. In Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. 1–6.
  • Zhang et al. (2024) Jinlu Zhang, Lixin Wei, Zeyin Guo, Hao Sun, and Ziyu Hu. 2024. A survey of meta-heuristic algorithms in optimization of space scale expansion. Swarm and Evolutionary Computation 84 (2024), 101462.
  • Zhang et al. (2019) Xin-Yuan Zhang, Yue-Jiao Gong, Ying Lin, Jie Zhang, Sam Kwong, and Jun Zhang. 2019. Dynamic cooperative coevolution for large scale optimization. IEEE Transactions on Evolutionary Computation 23, 6 (2019), 935–948.