Asset Details

MbrlCatalogueTitleDetail

Paper

A Unified Generalization Framework for Model Merging: Trade-offs, Non-Linearity, and Scaling Laws

Tang, Anke,

Zhang, Miao,

Yin, Quanjun,

Li, Qinglun,

Wang, Mengzhu,

Shen, Li

2026

Overview

Model merging efficiently aggregates capabilities from multiple fine-tuned models into a single one, operating purely in parameter space without original data or expensive re-computation. Despite empirical successes, a unified theory for its effectiveness under heterogeneous finetuning hyperparameters (e.g., varying learning rates, batch sizes) remains missing. Existing federated learning theories focus purely on optimization, which fails to explain model merging and inherently leads to theoretical paradoxes. To address this challenge, we pioneer the integration of \\(L_2\\)-Stability theory into heterogeneous environments to rigorously decouple the excess risk of the merged model \\(\\boldsymbol{x}_{avg}\\) into optimization and generalization errors. This comprehensive analysis yields three main contributions: (i) We mathematically establish the fundamental \\textit{Optimization-Generalization Trade-off}, explicitly resolving the paradox of why over-trained experts lead to catastrophic merging collapse. (ii) \\textit{A unified theoretical framework} is provided to explain not only linear merging algorithms (e.g., TA, AdaMerging) but also state-of-the-art \\textit{non-linear} merging algorithms (e.g., TIES, DARE), proving how sparsification operators strictly tighten the generalization bound by suppressing task heterogeneity. (iii) Rather than heuristic guidelines, we derive \\textit{Quantitative Scaling Laws} that theoretically predict the precise impact of hyperparameter choices, enabling practitioners to strategically construct ``merge-friendly'' experts. Extensive experiments on the ResNet and ViT architectures across 20 visual classification tasks, involving thousands of finetuning models, robustly confirm that our theoretical scaling laws accurately predict the empirical generalization behaviors of \\(\\boldsymbol{x}_{avg}\\).

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Algorithms

/ Optimization

/ Visual tasks