Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

by Refoua, Elad , Meinlschmidt, Gunther , Hadar Shoval, Dorit , Piterman, David , Elyoseph, Zohar , Geller, Alon

in 4014/477 / 631/477 / 639/705 / 692/308 / 692/700 / Adult / Artificial intelligence / Bias / Chatbots / Cognition / Cognition & reasoning / Cross-cultural psychology / Emotion recognition / Emotions / Emotions - physiology / Ethnicity - psychology / Facial Expression / Female / Generative artificial intelligence (GenAI) / Human performance / Humanities and Social Sciences / Humans / Language / Large Language Models / Male / Minority & ethnic groups / multidisciplinary / Personality / Pilot projects / Psychiatric diagnosis / Reading the mind in the eyes test (RMET) / Recognition, Psychology / Science / Science (multidisciplinary) / Social interactions / Theory of mind / Young Adult

2026

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

by Refoua, Elad , Meinlschmidt, Gunther , Hadar Shoval, Dorit , Piterman, David , Elyoseph, Zohar , Geller, Alon

2026

Confirm

Do you wish to request the book?

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

by Refoua, Elad , Meinlschmidt, Gunther , Hadar Shoval, Dorit , Piterman, David , Elyoseph, Zohar , Geller, Alon

2026

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

Refoua, Elad,

Meinlschmidt, Gunther,

Hadar Shoval, Dorit,

Piterman, David,

Elyoseph, Zohar,

Geller, Alon

2026

Overview

Accurate emotion recognition is a foundational component of social cognition, yet human biases can compromise its reliability. The emergent capabilities of multimodal large language models (MLLMs) offer a potential avenue for objective analysis, but their performance has been tested mainly with ethnically homogenous stimuli. This study provides a systematic cross-ethnic evaluation of leading MLLMs on an emotion recognition task to assess their accuracy and consistency across diverse groups. We evaluated three leading MLLMs: ChatGPT-4, ChatGPT-4o, and Claude 3 Opus. Performance was tested twice using three “Reading the Mind in the Eyes Test” (RMET) versions featuring White, Black, and Korean faces. We analyzed accuracy against chance (25%) and compared scores to established human normative data for each ethnic version. ChatGPT-4o achieved performance significantly above chance levels across all tests ( p < .001), with large effect sizes indicating robust performance (Cohen’s h = 1.253–1.619; RD = 0.583–0.694). The model obtained a mean accuracy of 83.3% (30/36) on the White RMET, 94.4% (34/36) on the Black RMET, and 86.1% (31/36) on the Korean RMET, placing it in the 85th, 94th, and 90th percentiles of human norms, respectively. This high accuracy remained consistent across ethnic stimuli. In contrast, ChatGPT-4 performed near the human average, while Claude 3 Opus performed near chance level. These preliminary findings highlight the rapid evolution of MLLMs, highlighting a significant performance leap between consecutive versions. This study suggests that ChatGPT-4o demonstrated performance scores exceeding average human accuracy on this specific task in recognizing complex emotions from static images of the eye region, with its performance remaining consistent across different ethnic groups. While these results are notable, the pronounced performance gaps between models and the inherent limitations of the RMET task underscore the need for continuous validation and careful, ethical consideration to fully understand the capabilities and boundaries of this technology.

Share this book

Add to My Shelf

Publisher

Nature Publishing Group UK,Nature Publishing Group,Nature Portfolio

Subject

4014/477

/ 631/477

/ 639/705

/ 692/308

/ 692/700

/ Adult

/ Artificial intelligence