Theory-Inspired Parameter Control Benchmarks for DAC

Accompanying blog post for our GECCO'22 paper

NOTE: We are using pyscript for the example below. Loading might take a bit longer.

Loading...
LeadingOnes:
0
Steps Taken:
0
Speed: 1% Number of bitflips: 1

{\color{gray}(1+1)}RLS on a randomly initialized LeadingOnes problem. You can manually configure how many bits get flipped in each iteration via the lower slider. We only render the current best solution, thus some steps might not change the image. Cells that will not change anymore are colored in green. For pseudocode see Algorithm 1.


To achieve peak-performance on a problem, it is often crucial to correctly setup, i.e. configure, the algorithm which is supposed to solve the problem. In many communities it has been observed that fixed parameter choices are often not optimal and that dynamic changes to parameter during the algorithms run can be highly beneficial (see, e.g., ). The evolutionary algorithms community is no stranger to this observation. Under the heading of parameter control (for an overview see ) various methods have been proposed to adapt parameters on the fly. More importantly however, the community has provided various theoretical insights.

In a similar vain to parameter control, the novel dynamic algorithm configuration (DAC) framework proposes to learn parameter adaptation policies in a dedicated offline learning phase. Once a policy was learned, it can then be used to adapt algorithm parameters on a variety of problem instances. Still, the field is very young and there is much to learn. In our recently accepted GECCO paper Theory-Inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration builds on such insights of the parameter control community to build benchmarks which can be used to evaluate DAC methods and policies.

Why RLS on LeadingOnes?

The short answer is, LeadingOnes is very well understood. The slightly longer answer is, LeadingOnes is one of the most rigorously studied problems in the parameter control community. The parameter control community has thoroughly studied the dynamic fitness-dependent selection of mutation rates for greedy evolutionary algorithms on LeadingOnes. Thus, it is very well understood how the expected runtime of such algorithms depend on the mutation rates during the run. Overall LeadingOnes is an important benchmark for parameter control studies both for empirical and theoretical analysis . Thus, this makes LeadingOnes an ideal benchmark to also study DAC methods in depth. Further, the theoretical insights can be used to provide a ground truth on LeadingOnes. This was not possible for prior DAC benchmarks , besides some manually designed artificial benchmarks.

A Short Primer on LeadingOnes and RLS

If you are already familiar with the LeadingOnes problem you can skip this short introduction. LeadingOnes is perfectly named. For a bitstring of length \color{gray}n, the LeadingOnes problem is to maximize the number of uninterrupted leading ones in the bitsring.

There are a variety of algorithms one could use for solving LeadingOnes. Here, we chose to use \color{gray}(1+1)RLS. The pseudo code for this is:

def RLS(problem_dimension, max_budget):
    partial_sol = numpy.random.choice(2, problem_dimension)
    for t in range(max_budget):
        s = get_current_state()
        num_bits_to_flip = policy(s)
        new_partial_sol = flip(partial_sol, num_bits_to_flip)
        if fitness(new_partial_sol) >= fitness(partial_sol):
            partial_sol = new_partial_sol

Algorithm 1: Pseudocode for {\color{gray}(1+1)}RLS

The algorithm starts from a randomly initialized bitstring and in each iteration randomly flips num_bits_to_flip. The so created new solution candidate is compared to the old one. The old one is replaced by the new one if the latter one is not worse than the former one. When using an algorithm for solving the LeadingOnes problem, we are interested in setting the algorithm up such that we solve the problem using as few function evaluations as possible, or in other words in as few iterations as possible. At the top of this post you can find a working implementation of this setup where you are in charge of setting the number of bits to flip in each iteration. If you’ve played around with this setting a bit, you might have noticed a few things:

Ground Truth and Optimal Policies

It was proven that we can compute the probability of improving the current partial solution of length \color{gray}n with fitness value \color{gray}i\leq n by flipping \color{gray}r bits as q(r,i)=rnj[1,,r1]nijnj. Further, an optimal policy that can choose to flip any number of bits in \color{gray}\left[1,\ldots,n\right] satisfies \color{gray}\pi_{\text{opt}}\colon i\mapsto\lfloor n/(i+1)\rfloor. Thus, if our current solution has fitness \color{gray}0 (i.e., no leading ones) we should flip all bits. If we only have one leading one, we should flip exactly \color{gray}\lfloor n/2\rfloor bits, and so on. Let’s compare this optimal policy to static ones:

Optimal Policy:
\color{gray}\pi_{\text{opt}}(
?
\color{gray})\mapsto
?
Loading...
LeadingOnes:
0
Steps Taken:
0
#Solved:
0
\color{gray}\mu_{\text{steps}}:
0
Static Policy:
Loading...
LeadingOnes:
0
Steps Taken:
0
#Solved:
0
\color{gray}\mu_{\text{steps}}:
0
Speed: 1% Number of bitflips: 1

We can see that the optimal policy is quite a bit faster at solving the LeadingOnes example. However, what about policies that are restricted and cannot choose from all values in \color{gray}\left[1,\ldots,n\right]? In our paper, we present a method to compute the optimal policy for any such restricted choice. In essence, we can use the probability of improvement as defined above and only need to compare the probability of consecutive elementsWe assume the portfolio is always sorted. of the portfolio. Whenever we have a higher probability of improvement with the smaller portfolio element, we switch to that. So, for any \color{gray}n and portfolio \color{gray}\mathcal{K} we can compute the optimal policy and thus generate ground-truth about the behaviour of \color{gray}(1+1)RLS.Note that 1 always needs to be included as we otherwise can't guarantee that a solution will be found. This ability to compute the optimal policy for any portfolio (i.e., configuration space) makes this setting ideal to study how different problem sizes and configuration spaces influence DAC methods.

Learning DAC Policies

Now that we know all about the benchmark, lets use it to gain insights into a DAC methodIf you need a lightweight introduction to DAC we refer to . To give an example of the use-case of this benchmark we train a small DDQN agent to dynamically configure the number of bit flips of RLS on LeadingOnes based on the observed fitness value.

Comparison of a learned policy (dqn) compared to optimal ones. The dotted green line used the same restricted configuration space as the DQN. (left): The configuration space consists of three evenly spread values [1, 16, 32]. (right): The configuration space consists of five values [1, 2, 4, 8 16].

Our first results (shown in the figures above) show that the DQN is indeed capable of learning optimal policies. Indeed on the same restricted configuration space for bitstrings of length 50, the DQN quickly learns the to play the optimal policy or ones that result in virtually the same reward.

The availability of ground truth however not only enables us to compare the learned performance to the known optimal one, we can also study the limits of the chosen DAC method. For example, we can evaluate how the same DQNs learning capabilities scale with the size of the configuration space.

Comparison of a learned policy (dqn) compared to optimal and random ones on problems of size 100 with varying portfolio sizes. For \color{gray}k\geq7 we plot three distinct runs to highlight the increased instability of DQN. For \color{gray}k = 15 none of the dqn runs were capable of learning meaningful policies.

Overall we could observe that the chosen DAC method, with its used parameter settings was capable of learning optimal policies but struggled to scale to larger action spaces as well as portfolio sizes. For more details we refer the interested reader to our paper.

Conclusion

Our work presents a novel benchmark that is useful for both parameter control and DAC research. The benchmark fills an important gap in the existing benchmarks for DAC research as it enables us to compute optimal policies. We thus can easily evaluate the quality of learned DAC policies and as well as DAC techniques themselves. We showed how we can use the benchmark to gain insights into a DAC method and explored settings in which the chosen DQN method started to break down. We hope that our work is the first of many exchanges of benchmarks between the parameter control and dynamic algorithm configuration communities. With the growing literature on parameter control and its theoretical analysis we hope to provide other use-cases with a known ground truth.

Our code and data are publicly available at https://github.com/ndangtt/LeadingOnesDAC and the benchmark has recently been merged into DACBench. For feedback or questions, feel free to reach out to us.

Footnotes

  1. We assume the portfolio is always sorted.[↩]
  2. Note that 1 always needs to be included as we otherwise can't guarantee that a solution will be found.[↩]
  3. If you need a lightweight introduction to DAC we refer to [↩]

References

  1. Optimization by Simulated Annealing
    Kirkpatrick, S., Gelatt, C.D. and Vecchi, M.P., 1983. Science, Vol 220, pp. 671-680. American Association for the Advancement of Science.
  2. Completely Derandomized Self-Adaptation in Evolution Strategies[link]
    Hansen, N. and Ostermeier, A., 2001. Evolutionary Computation, Vol 9(2), pp. 159--195. DOI: 10.1162/106365601750190398
  3. Reactive search and intelligent optimization
    Battiti, R., Brunato, M. and Mascia, F., 2008. , Vol 45. Springer Science \& Business Media.
  4. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning[PDF]
    Bach, F.R. and Moulines, E., 2011. Proceedings of the 24th International Conference on Advances in Neural Information Processing Systems (NeurIPS'11), pp. 451--459. Curran Associates.
  5. Hyper-heuristics: a survey of the state of the art[link]
    Burke, E.K., Gendreau, M., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E. and Qu, R., 2013. J. Oper. Res. Soc., Vol 64(12), pp. 1695--1724. DOI: 10.1057/jors.2013.71
  6. Learning Step Size Controllers for Robust Neural Network Training
    Daniel, C., Taylor, J. and Nowozin, S., 2016. Proceedings of the Thirtieth National Conference on Artificial Intelligence (AAAI'16). AAAI Press.
  7. SGDR: Stochastic Gradient Descent with Warm Restarts
    Loshchilov, I. and Hutter, F., 2017. Proceedings of the International Conference on Learning Representations (ICLR'17).
  8. Population Based Training of Neural Networks
    Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W.M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C. and Kavukcuoglu, K., 2017. arXiv:1711.09846 [cs.LG].
  9. Optimal Static and Self-Adjusting Parameter Choices for the (1+(\lambda,\lambda)) Genetic Algorithm[link]
    Doerr, B. and Doerr, C., 2018. Algorithmica, Vol 80, pp. 1658--1709. DOI: 10.1007/s00453-017-0354-9
  10. Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
    Parker{-}Holder, J., Nguyen, V. and Roberts, S.J., 2020. Proceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS'20). Curran Associates.
  11. Theory of Parameter Control for Discrete Black-Box Optimization: Provable Performance Gains Through Dynamic Parameter Choices[link]
    Doerr, B. and Doerr, C., 2020. Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pp. 271--321. Springer International Publishing. DOI: 10.1007/978-3-030-29414-4_6
  12. Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework
    Biedenkapp, A., Bozkurt, F.H., Eimer, T., Hutter, F. and Lindauer, M., 2020. Proceedings of the Twenty-fourth European Conference on Artificial Intelligence (ECAI'20).
  13. k-Bit Mutation with Self-Adjusting k Outperforms Standard Bit Mutation[link]
    Doerr, B., Doerr, C. and Yang, J., 2016. Proc. of Parallel Problem Solving from Nature (PPSN'16), Vol 9921, pp. 824--834. Springer. DOI: 10.1007/978-3-319-45823-6_77
  14. Simple on-the-fly parameter selection mechanisms for two classical discrete black-box optimization benchmark problems[link]
    Doerr, C. and Wagner, M., 2018. Proc. of Genetic and Evolutionary Computation Conference (GECCO'18), pp. 943--950. ACM. DOI: 10.1145/3205455.3205560
  15. Simple Hyper-Heuristics Control the Neighbourhood Size of Randomised Local Search Optimally for LeadingOnes[link]
    Lissovoi, A., Oliveto, P.S. and Warwicker, J.A., 2020. Evol. Comput., Vol 28(3), pp. 437--461. DOI: 10.1162/evco_a_00258
  16. On the runtime analysis of selection hyper-heuristics with adaptive learning periods[link]
    Doerr, B., Lissovoi, A., Oliveto, P.S. and Warwicker, J.A., 2018. Proc. of Genetic and Evolutionary Computation Conference (GECCO'18), pp. 1015--1022. ACM. DOI: 10.1145/3205455.3205611
  17. Self-Adjusting Mutation Rates with Provably Optimal Success Rules[link]
    Doerr, B., Doerr, C. and Lengler, J., 2021. Algorithmica, Vol 83(10), pp. 3108--3147. DOI: 10.1007/s00453-021-00854-3
  18. DACBench: A Benchmark Library for Dynamic Algorithm Configuration
    Eimer, T., Biedenkapp, A., Reimer, M., Adriaensen, S., Hutter, F. and Lindauer, M., 2021. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI'21). ijcai.org.
  19. Analyzing randomized search heuristics via stochastic domination[link]
    Doerr, B., 2019. Theor. Comput. Sci., Vol 773, pp. 115--137. DOI: 10.1016/j.tcs.2018.09.024
  20. Dynamic Algorithm Configuration[link]
    Biedenkapp, A., 2020.

Loading runtime...

Runtime created...

Initializing components...