Catalogue Search | MBRL

Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning

by Kolhe, Neel , Gopalam, Rohan , Peng, Andy in Euclidean geometry , Knowledge management , Learning

2026

Continual learning aims to enable neural networks to acquire new knowledge on sequential tasks. However, the key challenge in such settings is to learn new tasks without catastrophically forgetting previously learned tasks. We propose the Fisher-Orthogonal Projected Natural Gradient Descent (FOPNG) optimizer, which enforces Fisher-orthogonal constraints on parameter updates to preserve old task performance while learning new tasks. Unlike existing methods that operate in Euclidean parameter space, FOPNG projects gradients onto the Fisher-orthogonal complement of previous task gradients. This approach unifies natural gradient descent with orthogonal gradient methods within an information-geometric framework. We provide theoretical analysis deriving the projected update, describe efficient and practical implementations using the diagonal Fisher, and demonstrate strong results on standard continual learning benchmarks such as Permuted-MNIST, Split-MNIST, Rotated-MNIST, Split-CIFAR10, and Split-CIFAR100. Our code is available at https://github.com/ishirgarg/FOPNG.

Paper

Share this book

Add to My Shelf

InfoSynth: Information-Guided Benchmark Synthesis for LLMs

by Kolhe, Neel , Zhao, Xuandong , Song, Dawn in Benchmarks , Control methods , Datasets

2026

Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation. However, efficiently creating new benchmarks to evaluate these capabilities remains a challenge. Traditional benchmark creation relies on manual human effort, a process that is both expensive and time-consuming. Furthermore, existing benchmarks often contaminate LLM training data, necessitating novel and diverse benchmarks to accurately assess their genuine capabilities. This work introduces InfoSynth, a novel framework for automatically generating and evaluating reasoning benchmarks guided by information-theoretic principles. We propose metrics based on KL-divergence and entropy to quantify benchmark novelty and diversity without relying on costly model evaluations. Building on this framework, we develop an end-to-end pipeline that synthesizes robust Python coding problems from seed datasets using genetic algorithms and iterative code feedback. Our method generates accurate test cases and solutions to new problems 97% of the time, and the synthesized benchmarks consistently exhibit higher novelty and diversity compared to their seed datasets. Moreover, our algorithm provides a method for controlling the novelty/diversity and difficulty of generated problems. InfoSynth offers a scalable, self-verifying pipeline for constructing high-quality, novel and diverse benchmarks for LLMs. Project Page: https://ishirgarg.github.io/infosynth_web/

Paper

Share this book

Add to My Shelf

$Upper bounds on the \$2\$-colorability threshold of random \$d\$-regular \$k\$-uniform hypergraphs for \$k\\geq 3\$$

Upper bounds on the \$2\$-colorability threshold of random \$d\$-regular \$k\$-uniform hypergraphs for \$k\\geq 3\$

by Kolhe, Neel , Sohn, Youngtak , Chang, Evan in Coloring , Graph theory , Graphs

2023

For a large class of random constraint satisfaction problems (CSP), deep but non-rigorous theory from statistical physics predict the location of the sharp satisfiability transition. The works of Ding, Sly, Sun (2014, 2016) and Coja-Oghlan, Panagiotou (2014) established the satisfiability threshold for random regular \$k\$-NAE-SAT, random \$k\$-SAT, and random regular \$k\$-SAT for large enough \$k\\geq k_0\$ where \$k_0\$ is a large non-explicit constant. Establishing the same for small values of \$k\\geq 3\$ remains an important open problem in the study of random CSPs. In this work, we study two closely related models of random CSPs, namely the \$2\$-coloring on random \$d\$-regular \$k\$-uniform hypergraphs and the random \$d\$-regular \$k\$-NAE-SAT model. For every \$k\\geq 3\$, we prove that there is an explicit \$d_{\\ast}(k)\$ which gives a satisfiability upper bound for both of the models. Our upper bound \$d_{\\ast}(k)\$ for \$k\\geq 3\$ matches the prediction from statistical physics for the hypergraph \$2\$-coloring by Dall'Asta, Ramezanpour, Zecchina (2008), thus conjectured to be sharp. Moreover, \$d_{\\ast}(k)\$ coincides with the satisfiability threshold of random regular \$k\$-NAE-SAT for large enough \$k\\geq k_0\$ by Ding, Sly, Sun (2014).

Paper

Share this book

Add to My Shelf

WLM: Dynamics of an isolated Dwarf Irregular Galaxy Under Ram Pressure in the Local Group

by Yang, Yanbin , Carignan, Claude , Hammer, Francois in Asymmetry , Dwarf galaxies , Intergalactic media

2026

WLM is an archetypal dwarf irregular galaxy that has not experienced interactions with major Local Group galaxies within the past 8 Gyr. It has recently been shown that WLM is losing its gas due to ram pressure forces exerted by the surrounding intergalactic medium (IGM). In this work, we explore how ram pressure may also affect the WLM gas kinematics, and we show that its dynamics is especially perturbed at its outskirts, explaining the asymmetric rotation between the approaching and receding sides. Moreover, we have been able to decompose WLM in two main components, a compact one with a solid-body rotation that resembles a bar-like structure, and a more extended one with a characteristic double-horn profile suggesting an edge-on disk. The former is relatively unaffected by ram pressure while the latter has its dynamics considerably affected by ram pressure. This study shows that mass estimates of a dwarf galaxy like WLM should account for a full modeling of its dynamical components, especially accounting for its asymmetric rotation curve.

Paper

Share this book

Add to My Shelf

Gravitational Waves from Black-Hole Encounters: Prospects for Ground- and Galaxy-Based Observatories

by Tiwari, Shubhanshu , Dandapat, Subhajit , Hyung Mok Lee in Gravitational waves , Ground-based observation , Observatories

2023

Close hyperbolic encounters of black holes (BHs) generate certain Burst With Memory (BWM) events in the frequency windows of the operational, planned, and proposed gravitational wave (GW) observatories. We present detailed explorations of the detectable parameter space of such events that are relevant for the LIGO-Virgo-KAGRA and the International Pulsar Timing Array (IPTA) consortia. The underlying temporally evolving GW polarization states are adapted from Cho et al. [Phys. Rev. D 98, 024039 (2018)] and therefore incorporate general relativistic effects up to the third post-Newtonian order. Further, we provide a prescription to ensure the validity of our waveform family while describing close encounters. Preliminary investigations reveal that optimally placed BWM events should be visible to megaparsec distances for the existing ground-based observatories. In contrast, maturing IPTA datasets should be able to provide constraints on the occurrences of such hyperbolic encounters of supermassive BHs to gigaparsec distances.

Paper

Share this book

Add to My Shelf

Reliable Fine-Grained Evaluation of Natural Language Math Proofs

by Kolhe, Neel , Ma, Wenjie , Robin Said Sharif in Large language models , Natural language , Reasoning

2026

Recent advances in large language models (LLMs) for mathematical reasoning have largely focused on tasks with easily verifiable final answers while generating and verifying natural language math proofs remains an open challenge. We identify the absence of a reliable, fine-grained evaluator for LLM-generated math proofs as a critical gap. To address this, we propose a systematic methodology for developing and validating evaluators that assign fine-grained scores on a 0-7 scale to model-generated math proofs. To enable this study, we introduce ProofBench, the first expert-annotated dataset of fine-grained proof ratings, spanning 145 problems from six major math competitions (USAMO, IMO, Putnam, etc) and 435 LLM-generated solutions from Gemini-2.5-Pro, o3, and DeepSeek-R1. Using ProofBench as a testbed, we systematically explore the evaluator design space across key axes: the backbone model, input context, instructions and evaluation workflow. Our analysis delivers ProofGrader, an evaluator that combines a strong reasoning backbone LM, rich context from reference solutions and marking schemes, and a simple ensembling method; it achieves a low Mean Absolute Error (MAE) of 0.926 against expert scores, significantly outperforming naive baselines. Finally, we demonstrate its practical utility in a best-of-\$n\$ selection task: at \$n=16\$, ProofGrader achieves an average score of 4.14/7, closing 78\\% of the gap between a naive binary evaluator (2.48) and the human oracle (4.62), highlighting its potential to advance downstream proof generation.

Paper

Share this book

Add to My Shelf

Reliable Fine-Grained Evaluation of Natural Language Math Proofs

by Kolhe, Neel , Ma, Wenjie , Robin Said Sharif in Large language models , Natural language , Reasoning

2025

Recent advances in large language models (LLMs) for mathematical reasoning have largely focused on tasks with easily verifiable final answers; however, generating and verifying natural language math proofs remains an open challenge. We identify the absence of a reliable, fine-grained evaluator for LLM-generated math proofs as a critical gap. To address this, we propose a systematic methodology for developing and validating evaluators that assign fine-grained scores on a 0-7 scale to model-generated math proofs. To enable this study, we introduce ProofBench, the first expert-annotated dataset of fine-grained proof ratings, spanning 145 problems from six major math competitions (USAMO, IMO, Putnam, etc) and 435 LLM-generated solutions from Gemini-2.5-pro, o3, and DeepSeek-R1. %with expert gradings. Using ProofBench as a testbed, we systematically explore the evaluator design space across key axes: the backbone model, input context, instructions and evaluation workflow. Our analysis delivers ProofGrader, an evaluator that combines a strong reasoning backbone LM, rich context from reference solutions and marking schemes, and a simple ensembling method; it achieves a low Mean Absolute Error (MAE) of 0.926 against expert scores, significantly outperforming naive baselines. Finally, we demonstrate its practical utility in a best-of-\$n\$ selection task: at \$n=16\$, ProofGrader achieves an average score of 4.14 (out of 7), closing 78% of the gap between a naive binary evaluator (2.48) and the human oracle (4.62), highlighting its potential to advance downstream proof generation.

Paper

Share this book

Add to My Shelf

Multi-band Extension of the Wideband Timing Technique

by Gupta, Yashwant , Bagchi, Manjari , Kharbanda, Divyansh in Broadband , Evolution , Millisecond pulsars

2023

The wideband timing technique enables the high-precision simultaneous estimation of pulsar Times of Arrival (ToAs) and Dispersion Measures (DMs) while effectively modeling frequency-dependent profile evolution. We present two novel independent methods that extend the standard wideband technique to handle simultaneous multi-band pulsar data incorporating profile evolution over a larger frequency span to estimate DMs and ToAs with enhanced precision. We implement the wideband likelihood using the libstempo python interface to perform wideband timing in the tempo2 framework. We present the application of these techniques to the dataset of fourteen millisecond pulsars observed simultaneously in Band 3 (300 - 500 MHz) and Band 5 (1260 - 1460 MHz) of the upgraded Giant Metrewave Radio Telescope (uGMRT) with a large band gap of 760 MHz as a part of the Indian Pulsar Timing Array (InPTA) campaign. We achieve increased ToA and DM precision and sub-microsecond root mean square post-fit timing residuals by combining simultaneous multi-band pulsar observations done in non-contiguous bands for the first time using our novel techniques.

Paper

Share this book

Add to My Shelf

Noise analysis of the Indian Pulsar Timing Array data release I

by Gupta, Yashwant , Kharbanda, Divyansh , Bagchi, Manjari in Arrays , Millisecond pulsars , Noise measurement

2023

The Indian Pulsar Timing Array (InPTA) collaboration has recently made its first official data release (DR1) for a sample of 14 pulsars using 3.5 years of uGMRT observations. We present the results of single-pulsar noise analysis for each of these 14 pulsars using the InPTA DR1. For this purpose, we consider white noise, achromatic red noise, dispersion measure (DM) variations, and scattering variations in our analysis. We apply Bayesian model selection to obtain the preferred noise models among these for each pulsar. For PSR J1600\$-\$3053, we find no evidence of DM and scattering variations, while for PSR J1909\$-\$3744, we find no significant scattering variations. Properties vary dramatically among pulsars. For example, we find a strong chromatic noise with chromatic index \$\$ 2.9 for PSR J1939+2134, indicating the possibility of a scattering index that doesn't agree with that expected for a Kolmogorov scattering medium consistent with similar results for millisecond pulsars in past studies. Despite the relatively short time baseline, the noise models broadly agree with the other PTAs and provide, at the same time, well-constrained DM and scattering variations.

Paper

Share this book

Add to My Shelf

The Indian Pulsar Timing Array: First data release

by Gupta, Yashwant , Kharbanda, Divyansh , Bagchi, Manjari in Arrays , Chromaticity , Dispersion

2022

We present the pulse arrival times and high-precision dispersion measure estimates for 14 millisecond pulsars observed simultaneously in the 300-500 MHz and 1260-1460 MHz frequency bands using the upgraded Giant Metrewave Radio Telescope (uGMRT). The data spans over a baseline of 3.5 years (2018-2021), and is the first official data release made available by the Indian Pulsar Timing Array collaboration. This data release presents a unique opportunity for investigating the interstellar medium effects at low radio frequencies and their impact on the timing precision of pulsar timing array experiments. In addition to the dispersion measure time series and pulse arrival times obtained using both narrowband and wideband timing techniques, we also present the dispersion measure structure function analysis for selected pulsars. Our ongoing investigations regarding the frequency dependence of dispersion measures have been discussed. Based on the preliminary analysis for five millisecond pulsars, we do not find any conclusive evidence of chromaticity in dispersion measures. Data from regular simultaneous two-frequency observations are presented for the first time in this work. This distinctive feature leads us to the highest precision dispersion measure estimates obtained so far for a subset of our sample. Simultaneous multi-band uGMRT observations in Band 3 and Band 5 are crucial for high-precision dispersion measure estimation and for the prospect of expanding the overall frequency coverage upon the combination of data from the various Pulsar Timing Array consortia in the near future. Parts of the data presented in this work are expected to be incorporated into the upcoming third data release of the International Pulsar Timing Array.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter