Catalogue Search | MBRL

Towards the genome-scale discovery of bivariate monotonic classifiers

by Schwikowski, Benno , Krejca, Martin S. , Doerr, Carola in Algorithms , Bioinformatics , Biomarkers

2025

Background Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scale was hampered by the high computational complexity of the search for pairs of features with a high leave-one-out performance estimate. Results We introduce the fastBMC algorithm, which drastically speeds up the identification of BMCs. The algorithm is based on a mathematical bound for the BMC performance estimate while maintaining optimality. We show empirically that fastBMC speeds up the computation by a factor of at least 15 already for a small number of features, compared to the traditional approach. For two of the three smaller biomedical datasets that we consider here, the resulting possibility of considering much larger sets of features translates into significantly improved classification performance. As an example of the high degree of interpretability of BMCs, we discuss a straightforward interpretation of a BMC glioblastoma survival predictor, an immediate novel biomedical hypothesis, options for biomedical validation, and treatment implications. In addition, we study the performance of fastBMC on a larger and well-known breast cancer dataset, validating the benefits of the BMCs for biomarker identification and biomedical hypothesis generation. Conclusion fastBMC enables the rapid construction of robust and interpretable ensemble models using BMC, facilitating the discovery of gene pairs predictive of relevant phenotypes and their interaction in that context. Availability We provide the first open-source implementation for learning BMCs, a Python implementation of fastBMC in particular, and Python code to reproduce the fastBMC results on real and simulated data in this paper, at https://github.com/oceanefrqt/fastBMC .

Journal Article

Share this book

Add to My Shelf

Surfing on the seascape

by Trubenová, Barbora , Kötzing, Timo , Krejca, Martin S. in Adaptation , Adaptation, Biological , adaptive walk

2019

The environment changes constantly at various time scales and, in order to survive, species need to keep adapting. Whether these species succeed in avoiding extinction is a major evolutionary question. Using a multilocus evolutionary model of a mutation-limited population adapting under strong selection, we investigate the effects of the frequency of environmental fluctuations on adaptation. Our results rely on an “adaptive-walk” approximation and use mathematical methods from evolutionary computation theory to investigate the interplay between fluctuation frequency, the similarity of environments, and the number of loci contributing to adaptation. First, we assume a linear additive fitness function, but later generalize our results to include several types of epistasis. We show that frequent environmental changes prevent populations from reaching a fitness peak, but they may also prevent the large fitness loss that occurs after a single environmental change. Thus, the population can survive, although not thrive, in a wide range of conditions. Furthermore, we show that in a frequently changing environment, the similarity of threats that a population faces affects the level of adaptation that it is able to achieve. We check and supplement our analytical results with simulations.

Journal Article

Share this book

Add to My Shelf

Sampling repulsive Gibbs point processes using random graphs

by Göbel, Andreas , Krejca, Martin S. , Pappik, Marcus in Algorithms , Graphs , Partitions (mathematics)

2025

We study computational aspects of repulsive Gibbs point processes, which are probabilistic models of interacting particles in a finite-volume region of space. We introduce an approach for reducing a Gibbs point process to the hard-core model, a well-studied discrete spin system. Given an instance of such a point process, our reduction generates a random graph drawn from a natural geometric model. We show that the partition function of a hard-core model on graphs generated by the geometric model concentrates around the partition function of the Gibbs point process. Our reduction allows us to use a broad range of algorithms developed for the hard-core model to sample from the Gibbs point process and approximate its partition function. This is, to the extent of our knowledge, the first approach that deals with pair potentials of unbounded range.

Journal Article

Share this book

Add to My Shelf

A Flexible Evolutionary Algorithm With Dynamic Mutation Rate Archive

by Krejca, Martin S , Witt, Carsten in Archives & records , Evolutionary algorithms , Genetic algorithms

2024

We propose a new, flexible approach for dynamically maintaining successful mutation rates in evolutionary algorithms using \\(k\\)-bit flip mutations. The algorithm adds successful mutation rates to an archive of promising rates that are favored in subsequent steps. Rates expire when their number of unsuccessful trials has exceeded a threshold, while rates currently not present in the archive can enter it in two ways: (i) via user-defined minimum selection probabilities for rates combined with a successful step or (ii) via a stagnation detection mechanism increasing the value for a promising rate after the current bit-flip neighborhood has been explored with high probability. For the minimum selection probabilities, we suggest different options, including heavy-tailed distributions. We conduct rigorous runtime analysis of the flexible evolutionary algorithm on the OneMax and Jump functions, on general unimodal functions, on minimum spanning trees, and on a class of hurdle-like functions with varying hurdle width that benefit particularly from the archive of promising mutation rates. In all cases, the runtime bounds are close to or even outperform the best known results for both stagnation detection and heavy-tailed mutations.

Paper

Share this book

Add to My Shelf

Proven Runtime Guarantees for How the MOEA/D Computes the Pareto Front From the Subproblem Solutions

by Krejca, Martin S , Weeks, Noé , Doerr, Benjamin in Algorithms , Evolutionary algorithms , Hierarchies

2024

The decomposition-based multi-objective evolutionary algorithm (MOEA/D) does not directly optimize a given multi-objective function \\(f\\), but instead optimizes \\(N + 1\\) single-objective subproblems of \\(f\\) in a co-evolutionary manner. It maintains an archive of all non-dominated solutions found and outputs it as approximation to the Pareto front. Once the MOEA/D found all optima of the subproblems (the \\(g\\)-optima), it may still miss Pareto optima of \\(f\\). The algorithm is then tasked to find the remaining Pareto optima directly by mutating the \\(g\\)-optima. In this work, we analyze for the first time how the MOEA/D with only standard mutation operators computes the whole Pareto front of the OneMinMax benchmark when the \\(g\\)-optima are a strict subset of the Pareto front. For standard bit mutation, we prove an expected runtime of \\(O(n N \\log n + n^{n/(2N)} N \\log n)\\) function evaluations. Especially for the second, more interesting phase when the algorithm start with all \\(g\\)-optima, we prove an \\(\\Omega(n^{(1/2)(n/N + 1)} \\sqrt{N} 2^{-n/N})\\) expected runtime. This runtime is super-polynomial if \\(N = o(n)\\), since this leaves large gaps between the \\(g\\)-optima, which require costly mutations to cover. For power-law mutation with exponent \\(\\beta \\in (1, 2)\\), we prove an expected runtime of \\(O\\left(n N \\log n + n^{\\beta} \\log n\\right)\\) function evaluations. The \\(O\\left(n^{\\beta} \\log n\\right)\\) term stems from the second phase of starting with all \\(g\\)-optima, and it is independent of the number of subproblems \\(N\\). This leads to a huge speedup compared to the lower bound for standard bit mutation. In general, our overall bound for power-law suggests that the MOEA/D performs best for \\(N = O(n^{\\beta - 1})\\), resulting in an \\(O(n^\\beta \\log n)\\) bound. In contrast to standard bit mutation, smaller values of \\(N\\) are better for power-law mutation, as it is capable of easily creating missing solutions.

Paper

Share this book

Add to My Shelf

Superior Genetic Algorithms for the Target Set Selection Problem Based on Power-Law Parameter Choices and Simple Greedy Heuristics

by Krejca, Martin S , Nguyen, Vu , Doerr, Benjamin in Apexes , Artificial neural networks , Genetic algorithms

2024

The target set selection problem (TSS) asks for a set of vertices such that an influence spreading process started in these vertices reaches the whole graph. The current state of the art for this NP-hard problem are three recently proposed randomized search heuristics, namely a biased random-key genetic algorithm (BRKGA) obtained from extensive parameter tuning, a max-min ant system (MMAS), and a MMAS using Q-learning with a graph convolutional network. We show that the BRKGA with two simple modifications and without the costly parameter tuning obtains significantly better results. Our first modification is to simply choose all parameters of the BRKGA in each iteration randomly from a power-law distribution. The resulting parameterless BRKGA is already competitive with the tuned BRKGA, as our experiments on the previously used benchmarks show. We then add a natural greedy heuristic, namely to repeatedly discard small-degree vertices that are not necessary for reaching the whole graph. The resulting algorithm consistently outperforms all of the state-of-the-art algorithms. Besides providing a superior algorithm for the TSS problem, this work shows that randomized parameter choices and elementary greedy heuristics can give better results than complex algorithms and costly parameter tuning.

Paper

Share this book

Add to My Shelf

The Univariate Marginal Distribution Algorithm Copes Well With Deception and Epistasis

by Krejca, Martin S , Doerr, Benjamin in Deception , Evolutionary algorithms , Fitness

2020

In their recent work, Lehre and Nguyen (FOGA 2019) show that the univariate marginal distribution algorithm (UMDA) needs time exponential in the parent populations size to optimize the DeceptiveLeadingBlocks (DLB) problem. They conclude from this result that univariate EDAs have difficulties with deception and epistasis. In this work, we show that this negative finding is caused by an unfortunate choice of the parameters of the UMDA. When the population sizes are chosen large enough to prevent genetic drift, then the UMDA optimizes the DLB problem with high probability with at most \\(\\lambda(\\frac{n}{2} + 2 e \\ln n)\\) fitness evaluations. Since an offspring population size \\(\\lambda\\) of order \\(n \\log n\\) can prevent genetic drift, the UMDA can solve the DLB problem with \\(O(n^2 \\log n)\\) fitness evaluations. In contrast, for classic evolutionary algorithms no better run time guarantee than \\(O(n^3)\\) is known (which we prove to be tight for the \\({(1+1)}\\) EA), so our result rather suggests that the UMDA can cope well with deception and epistatis. From a broader perspective, our result shows that the UMDA can cope better with local optima than evolutionary algorithms; such a result was previously known only for the compact genetic algorithm. Together with the lower bound of Lehre and Nguyen, our result for the first time rigorously proves that running EDAs in the regime with genetic drift can lead to drastic performance losses.

Paper

Share this book

Add to My Shelf

Difficulties of the NSGA-II with the Many-Objective LeadingOnes Problem

by Krejca, Martin S , Doerr, Benjamin , Korkotashvili, Dimitri in Benchmarks , Evolutionary algorithms , Evolutionary computation

2024

The NSGA-II is the most prominent multi-objective evolutionary algorithm (cited more than 50,000 times). Very recently, a mathematical runtime analysis has proven that this algorithm can have enormous difficulties when the number of objectives is larger than two (Zheng, Doerr. IEEE Transactions on Evolutionary Computation (2024)). However, this result was shown only for the OneMinMax benchmark problem, which has the particularity that all solutions are on the Pareto front, a fact heavily exploited in the proof of this result. In this work, we show a comparable result for the LeadingOnesTrailingZeroes benchmark. This popular benchmark problem appears more natural in that most of its solutions are not on the Pareto front. With a careful analysis of the population dynamics of the NGSA-II optimizing this benchmark, we manage to show that when the population grows on the Pareto front, then it does so much faster by creating known Pareto optima than by spreading out on the Pareto front. Consequently, already when still a constant fraction of the Pareto front is unexplored, the crowding distance becomes the crucial selection mechanism, and thus the same problems arise as in the optimization of OneMinMax. With these and some further arguments, we show that the NSGA-II, with a population size by at most a constant factor larger than the Pareto front, cannot compute the Pareto front in less than exponential time.

Paper

Share this book

Add to My Shelf

Estimation-of-Distribution Algorithms for Multi-Valued Decision Variables

by Firas Ben Jedidia , Krejca, Martin S , Doerr, Benjamin in Algorithms , Optimization , Parameters

2024

The majority of research on estimation-of-distribution algorithms (EDAs) concentrates on pseudo-Boolean optimization and permutation problems, leaving the domain of EDAs for problems in which the decision variables can take more than two values, but which are not permutation problems, mostly unexplored. To render this domain more accessible, we propose a natural way to extend the known univariate EDAs to this setting. Different from a naive reduction to the binary case, our approach avoids additional constraints. Since understanding genetic drift is crucial for an optimal parameter choice, we extend the known quantitative analysis of genetic drift to EDAs for multi-valued variables. Roughly speaking, when the variables take \\(r\\) different values, the time for genetic drift to become significant is \\(r\\) times shorter than in the binary case. Consequently, the update strength of the probabilistic model has to be chosen \\(r\\) times lower now. To investigate how desired model updates take place in this framework, we undertake a mathematical runtime analysis on the \\(r\\)-valued \\leadingones problem. We prove that with the right parameters, the multi-valued UMDA solves this problem efficiently in \\(O(r\\ln(r)^2 n^2 \\ln(n))\\) function evaluations. This bound is nearly tight as our lower bound \\(\\Omega(r\\ln(r) n^2 \\ln(n))\\) shows. Overall, our work shows that our good understanding of binary EDAs naturally extends to the multi-valued setting, and it gives advice on how to set the main parameters of multi-values EDAs.

Paper

Share this book

Add to My Shelf

Run Time Analysis for Random Local Search on Generalized Majority Functions

by Krejca, Martin S , Doerr, Carola in Algorithms , Boolean algebra , Evolutionary algorithms

2022

Run time analysis of evolutionary algorithms recently makes significant progress in linking algorithm performance to algorithm parameters. However, settings that study the impact of problem parameters are rare. The recently proposed W-model provides a good framework for such analyses, generating pseudo-Boolean optimization problems with tunable properties. We initiate theoretical research of the W-model by studying how one of its properties -- neutrality -- influences the run time of random local search. Neutrality creates plateaus in the search space by first performing a majority vote for subsets of the solution candidate and then evaluating the smaller-dimensional string via a low-level fitness function. We prove upper bounds for the expected run time of random local search on this MAJORITY problem for its entire parameter spectrum. To this end, we provide a theorem, applicable to many optimization algorithms, that links the run time of MAJORITY with its symmetric version HASMAJORITY, where a sufficient majority is needed to optimize the subset. We also introduce a generalized version of classic drift theorems as well as a generalized version of Wald's equation, both of which we believe to be of independent interest.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter