Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
by
Qiu, Xipeng
, Fu, Jinlan
, Zhang, Haowei
, Yang, Shudong
, See-Kiong Ng
in
Large language models
/ Real time
/ Video data
2026
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
by
Qiu, Xipeng
, Fu, Jinlan
, Zhang, Haowei
, Yang, Shudong
, See-Kiong Ng
in
Large language models
/ Real time
/ Video data
2026
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
2026
Request Book From Autostore
and Choose the Collection Method
Overview
Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models struggle to simultaneously maintain stable understanding performance, real-time responses, and low GPU memory overhead. To address this challenge, we propose HERMES, a novel training-free architecture for real-time and accurate understanding of video streams. Based on a mechanistic attention investigation, we conceptualize KV cache as a hierarchical memory framework that encapsulates video information across multiple granularities. During inference, HERMES reuses a compact KV cache, enabling efficient streaming understanding under resource constraints. Notably, HERMES requires no auxiliary computations upon the arrival of user queries, thereby guaranteeing real-time responses for continuous video stream interactions, which achieves 10\\(\\times\\) faster TTFT compared to prior SOTA. Even when reducing video tokens by up to 68% compared with uniform sampling, HERMES achieves superior or comparable accuracy across all benchmarks, with up to 11.4% gains on streaming datasets.
Publisher
Cornell University Library, arXiv.org
Subject
This website uses cookies to ensure you get the best experience on our website.