Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

by Xue, Fuzhao , Zheng, Zian , Yang, You , Ni, Jinjie , Ghosal, Deepanway , Zhang, Kaichen , Yue, Xiang , Shah, Mahir , Li, Bo , Shieh, Michael , Zhang, David Junhao , Song, Yifan , Jain, Kabir

in Benchmarks / Mixtures

2024

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Do you wish to request the book?

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

by Xue, Fuzhao , Zheng, Zian , Yang, You , Ni, Jinjie , Ghosal, Deepanway , Zhang, Kaichen , Yue, Xiang , Shah, Mahir , Li, Bo , Shieh, Michael , Zhang, David Junhao , Song, Yifan , Jain, Kabir

in Benchmarks / Mixtures

2024

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Paper

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Xue, Fuzhao,

Zheng, Zian,

Yang, You,

Ni, Jinjie,

Ghosal, Deepanway,

Zhang, Kaichen,

Yue, Xiang,

Shah, Mahir,

Li, Bo,

Shieh, Michael,

Zhang, David Junhao,

Song, Yifan,

Jain, Kabir

2024

Overview

Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development. We identify two major issues in current evaluations: (1) inconsistent standards, shaped by different communities with varying protocols and maturity levels; and (2) significant query, grading, and generalization biases. To address these, we introduce MixEval-X, the first any-to-any, real-world benchmark designed to optimize and standardize evaluations across diverse input and output modalities. We propose multi-modal benchmark mixture and adaptation-rectification pipelines to reconstruct real-world task distributions, ensuring evaluations generalize effectively to real-world use cases. Extensive meta-evaluations show our approach effectively aligns benchmark samples with real-world task distributions. Meanwhile, MixEval-X's model rankings correlate strongly with that of crowd-sourced real-world evaluations (up to 0.98) while being much more efficient. We provide comprehensive leaderboards to rerank existing models and organizations and offer insights to enhance understanding of multi-modal evaluations and inform future research.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Benchmarks

/ Mixtures