Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
by
Yue, Junpeng
, Xu, Xinrun
, Karlsson, Börje F
, Lu, Zongqing
in
Effectiveness
/ Learning
/ Retrieval
/ Task complexity
2025
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
by
Yue, Junpeng
, Xu, Xinrun
, Karlsson, Börje F
, Lu, Zongqing
in
Effectiveness
/ Learning
/ Retrieval
/ Task complexity
2025
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Paper
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
2025
Request Book From Autostore
and Choose the Collection Method
Overview
MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MLLM As ReTriever (MART), which enhances the performance of embodied agents by utilizing interaction data to fine-tune an MLLM retriever based on preference learning, such that the retriever fully considers the effectiveness of trajectories and prioritizes them for unseen tasks. We also introduce Trajectory Abstraction, a mechanism that leverages MLLMs' summarization capabilities to represent trajectories with fewer tokens while preserving key information, enabling agents to better comprehend milestones in the trajectory. Experimental results across various environments demonstrate our method significantly improves task success rates in unseen scenes compared to baseline methods. This work presents a new paradigm for multimodal retrieval in embodied agents, by fine-tuning a general-purpose MLLM as the retriever to assess trajectory effectiveness. All the code for benchmark tasks, simulator modifications, and the MLLM retriever is available at https://github.com/PKU-RL/MART.
Publisher
Cornell University Library, arXiv.org
Subject
This website uses cookies to ensure you get the best experience on our website.