Catalogue Search | MBRL

Evidence for the Regulation of Gynoecium Morphogenesis by ETTIN via Cell Wall Dynamics

by Reproduction et développement des plantes (RDP) ; École normale supérieure de Lyon (ENS de Lyon) ; Université de Lyon-Université de Lyon-Institut National de la Recherche Agronomique (INRA)-Université Claude Bernard Lyon 1 (UCBL) ; Université de Lyon-Centre National de la Recherche Scientifique (CNRS) , Centre National de la Recherche Scientifique (CNRS) , Pelloux, Jérome in Arabidopsis - genetics , Arabidopsis - growth & development , Arabidopsis - metabolism

2018

Background and Aims Plant stature and shape are largely determined by cell elongation, a process that is strongly controlled at the level of the cell wall. This is associated with the presence of many cell wall proteins implicated in the elongation process. Several proteins and enzyme families have been suggested to be involved in the controlled weakening of the cell wall, and these include xyloglucan endotransglucosylases/hydrolases (XTHs), yieldins, lipid transfer proteins and expansins. Although expansins have been the subject of much research, the role and involvement of expansin-like genes/proteins remain mostly unclear. This study investigates the expression and function of AtEXLA2 (At4g38400), a member of the expansin-like A (EXLA) family in arabidposis, and considers its possible role in cell wall metabolism and growth. Methods Transgenic plants of Arabidopsis thaliana were grown, and lines over-expressing AtEXLA2 were identified. Plants were grown in the dark, on media containing growth hormones or precursors, or were gravistimulated. Hypocotyls were studied using transmission electron microscopy and extensiometry. Histochemical GUS (beta-glucuronidase) stainings were performed. Key Results AtEXLA2 is one of the three EXLA members in arabidopsis. The protein lacks the typical domain responsible for expansin activity, but contains a presumed cellulose-interacting domain. Using promoter::GUS lines, the expression of AtEXLA2 was seen in germinating seedlings, hypocotyls, lateral root cap cells, columella cells and the central cylinder basally to the elongation zone of the root, and during different stages of lateral root development. Furthermore, promoter activity was detected in petioles, veins of leaves and filaments, and also in the peduncle of the flowers and in a zone just beneath the papillae. Over-expression of AtEXLA2 resulted in an increase of > 10 % in the length of dark-grown hypocotyls and in slightly thicker walls in non-rapidly elongating etiolated hypocotyl cells. Biomechanical analysis by creep tests showed that AtEXLA2 over-expression may decrease the wall strength in arabidopsis hypocotyls. Conclusions It is concluded that AtEXLA2 may function as a positive regulator of cell elongation in the dark-grown hypocotyl of arabidopsis by possible interference with cellulose metabolism, deposition or its organization.

Journal Article

Share this book

Add to My Shelf

Actor-critic multi-objective reinforcement learning for non-linear utility functions

by Hayes, Conor F. , Roijers, Diederik M. , Reymond, Mathieu in Algorithms , Artificial Intelligence , Computer Science

2023

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.

Journal Article

Share this book

Add to My Shelf

PEP1 of Arabis alpina Is Encoded by Two Overlapping Genes That Contribute to Natural Genetic Variation in Perennial Flowering

by Mateos, Julieta L. , Wang, Renhou , Albani, Maria C. in Alleles , Arabidopsis - genetics , Arabidopsis - growth & development

2012

Higher plants exhibit a variety of different life histories. Annual plants live for less than a year and after flowering produce seeds and senesce. By contrast perennials live for many years, dividing their life cycle into episodes of vegetative growth and flowering. Environmental cues control key check points in both life histories. Genes controlling responses to these cues exhibit natural genetic variation that has been studied most in short-lived annuals. We characterize natural genetic variation conferring differences in the perennial life cycle of Arabis alpina. Previously the accession Pajares was shown to flower after prolonged exposure to cold (vernalization) and only for a limited period before returning to vegetative growth. We describe five accessions of A. alpina that do not require vernalization to flower and flower continuously. Genetic complementation showed that these accessions carry mutant alleles at PERPETUAL FLOWERING 1 (PEP1), which encodes a MADS box transcription factor orthologous to FLOWERING LOCUS C in the annual Arabidopsis thaliana. Each accession carries a different mutation at PEP1, suggesting that such variation has arisen independently many times. Characterization of these alleles demonstrated that in most accessions, including Pajares, the PEP1 locus contains a tandem arrangement of a full length and a partial PEP1 copy, which give rise to two full-length transcripts that are differentially expressed. This complexity contrasts with the single gene present in A. thaliana and might contribute to the more complex expression pattern of PEP1 that is associated with the perennial life-cycle. Our work demonstrates that natural accessions of A. alpina exhibit distinct life histories conferred by differences in PEP1 activity, and that continuous flowering forms have arisen multiple times by inactivation of the floral repressor PEP1. Similar phenotypic variation is found in other herbaceous perennial species, and our results provide a paradigm for how characteristic perennial phenotypes might arise.

Journal Article

Share this book

Add to My Shelf

Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning

by Hayes, Conor F. , Mannion, Patrick , Roijers, Diederik M. in Algorithms , Artificial Intelligence , Computer Science

2023

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns–known in reinforcement learning as the value–cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.

Journal Article

Share this book

Add to My Shelf

CoPeP: Benchmarking Continual Pretraining for Protein Language Models

by Patil, Darshan , Chandar, Sarath , Malviya, Pranshu in Benchmarks , Learning , Proteins

2026

Protein language models (pLMs) have recently gained significant attention for their ability to uncover relationships between sequence, structure, and function from evolutionary statistics, thereby accelerating therapeutic drug discovery. These models learn from large protein databases that are continuously updated by the biology community and whose dynamic nature motivates the application of continual learning, not only to keep up with the ever-growing data, but also as an opportunity to take advantage of the temporal meta-information that is created during this process. As a result, we introduce the Continual Pretraining of Protein Language Models (CoPeP) benchmark, a novel benchmark for evaluating continual learning approaches on pLMs. Specifically, we curate a sequence of protein datasets derived from the UniProt Knowledgebase spanning a decade and define metrics to assess pLM performance across 31 protein understanding tasks. We evaluate several methods from the continual learning literature, including replay, unlearning, and plasticity-based methods, some of which have never been applied to models and data of this scale. Our findings reveal that incorporating temporal meta-information improves perplexity by up to 7% even when compared to training on data from all tasks jointly. Moreover, even at scale, several continual learning methods outperform naive continual pretraining. The CoPeP benchmark offers an exciting opportunity to study these methods at scale in an impactful real-world application.

Paper

Share this book

Add to My Shelf

Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning

by Rivest, François , Nilaksh , Chandar, Sarath in Representations

2026

In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes resource usage for on-device applications, it makes agents notoriously sample-inefficient, since value-based losses alone struggle to extract meaningful representations from transient data. We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame. However, due to the highly correlated samples induced by the streaming regime, naively applying this auxiliary loss results in training instabilities. Thus, we introduce orthogonal gradient updates relative to the momentum target and resolve gradient conflicts arising from streaming-specific optimizers. Validated across the Atari, MinAtar, and Octax suites, our approach systematically outperforms existing streaming baselines. Latent-space analysis, including t-SNE visualizations and effective-rank measurements, confirms that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer, while remaining efficient enough to train on just a few CPU cores.

Paper

Share this book

Add to My Shelf

$GRPO-\$\\lambda\$: Credit Assignment improves LLM Reasoning$

GRPO-\$\\lambda\$: Credit Assignment improves LLM Reasoning

by Parthasarathi, Prasanna , Chandar, Sarath , Cui, Yufei in Large language models , Reasoning , Task complexity

2025

Large language models (LLMs) are increasingly deployed for tasks requiring complex reasoning, prompting significant interest in improving their reasoning abilities through post-training. Especially RL based methods using verifiable reward, like the state-of-the-art GRPO, have shown to tremendously improve reasoning behaviors when applied as post-training methods. However, the lack of an explicit reward or critic model limits GRPO's ability to assign fine-grained credit across token sequences. In this work, we present GRPO-\$\\lambda\$, a novel extension to GRPO that enhances credit assignment in RL finetuning of LLMs for complex reasoning tasks. We approximate learning from \$\\lambda\$-return with a reformulation of eligibility traces using token-level log-probabilities applied after each sequence generation, and a novel critic-free approximation of the temporal-difference error. We introduce a few variations for the weighting of the \$\\lambda\$-return, and their applications to the eligibility-trace, where all the variations provide significant gains over GRPO. We compare GRPO-\$\\lambda\$ against GRPO by training models from 1.5B to 7B parameters on \$4\$ different math reasoning datasets. The training plots demonstrate 30-40% improved performance during RL training on both LLaMA-3.1 and Qwen-2.5 architectures. Finally, we show that with GRPO-\$\\lambda\$, the resulting average performance on AIME24, Math500, OlympiadMath, MinervaMath, and AMC improves over GRPO by over \$3\$ points and a \$4.5\$ points improvement on the 7B model.

Paper

Share this book

Add to My Shelf

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

by Phielipp, Mariano , Chandar, Sarath , Miret, Santiago in Ablation , Algorithms , Benchmarks

2025

In silico design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional theory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose CrystalGym, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark common value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Additionally, we include a case study on the scope of fine-tuning large language models with reinforcement learning for improving DFT-based rewards. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. We therefore introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.

Paper

Share this book

Add to My Shelf

Pareto Conditioned Networks

by Bargiacchi, Eugenio , Reymond, Mathieu , Nowé, Ann in Algorithms , Conditioning , Machine learning

2022

In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process. The set of optimal policies can grow exponentially with the number of objectives, and recovering all solutions requires an exhaustive exploration of the entire state space. We propose Pareto Conditioned Networks (PCN), a method that uses a single neural network to encompass all non-dominated policies. PCN associates every past transition with its episode's return. It trains the network such that, when conditioned on this same return, it should reenact said transition. In doing so we transform the optimization problem into a classification problem. We recover a concrete policy by conditioning the network on the desired Pareto-efficient solution. Our method is stable as it learns in a supervised fashion, thus avoiding moving target issues. Moreover, by using a single network, PCN scales efficiently with the number of objectives. Finally, it makes minimal assumptions on the shape of the Pareto front, which makes it suitable to a wider range of problems than previous state-of-the-art multi-objective reinforcement learning algorithms.

Paper

Share this book

Add to My Shelf

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

by Reymond, Mathieu , Avalos, Raphaël , Nowé, Ann in Algorithms , Learning , Local area networks

2023

Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is highly scalable with respect to the number of agents, opening up a promising alternative direction for MARL research.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter