Genetic multi-armed bandits: a reinforcement learning inspired approach for simulation optimization
- Many real-world problems are inherently stochastic, complicating or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This paper specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning, we propose a novel memory concept for evolutionary algorithms that ensures global convergence andMany real-world problems are inherently stochastic, complicating or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This paper specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning, we propose a novel memory concept for evolutionary algorithms that ensures global convergence and significantly improves their finite time performance. Unlike previous evolutionary algorithms, our approach permanently preserves simulation observations to progressively improve the accuracy of sample means when revisiting solutions in later iterations. Moreover, the selection of new solutions is based on the entire memory rather than just the last population. The numerical experiments demonstrate that this novel approach, which combines a genetic algorithm with such memory, consistently outperforms popular convergent state-of-the-art benchmark algorithms in a large variety of established test problems while requiring considerably less computational effort.This marks the so-called genetic multi-armed bandit as one of the currently most powerful algorithms for solving stochastic problems.…

