Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining

by Huang, Ruizhe , Yu, Baifeng , Zhang, Kexuan , Fang, Yihao

in Accuracy / Context / Parameters / Performance degradation / Retrieval

2025

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Paper

Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining

Huang, Ruizhe,

Yu, Baifeng,

Zhang, Kexuan,

Fang, Yihao

2025

Overview

This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduce costs. To enhance long-context extrapolation in compact models, we focus on Infini-attention, which builds a compressed memory from past segments while preserving local attention. In our work, we conduct an empirical study using 300M-parameter LLaMA models pretrained with Infini-attention. The model demonstrates training stability and outperforms the baseline in long-context retrieval. We identify the balance factor as a key part of the model performance, and we found that retrieval accuracy drops with repeated memory compressions over long sequences. Even so, Infini-attention still effectively compensates for the SLM's limited parameters. Particularly, despite performance degradation at a 16,384-token context, the Infini-attention model achieves up to 31% higher accuracy than the baseline. Our findings suggest that achieving robust long-context capability in SLMs benefits from architectural memory like Infini-attention.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Accuracy

/ Context

/ Parameters

/ Performance degradation

/ Retrieval