Catalogue Search | MBRL

Artificial Intelligence in Higher Education: Bridging or Widening the Gap for Diverse Student Populations?

by Hadar Shoval, Dorit in Academic Achievement , Access to Computers , Access to Education

2025

This study addresses a critical gap in understanding the differential effects of AI-based tools in higher education on diverse student populations, focusing on first-generation and minority students. Conducted as a case study in an introductory psychology course at a peripheral college, this research employed a mixed-methods approach, combining surveys (n = 110), in-depth semi-structured interviews (n = 20 selected to reflect class diversity), and the lecturer’s reflective journal. Data were analyzed using descriptive and inferential statistics (t-tests, Chi-square) and thematic analysis, with triangulation across data sources to examine how AI-based simulations influenced learning experiences and outcomes. The findings reveal that while AI enhanced content understanding and engagement across groups, it also highlighted and potentially widened educational gaps through an emerging “AI literacy divide.” This divide manifested in varying AI engagement patterns and differences in applying AI knowledge beyond the course, which was significantly more pronounced among majority and non-first-generation students compared to minority and first-generation peers. Qualitative data linked these disparities to prior technological exposure, cultural background, and academic self-efficacy. This study proposes an integrative framework highlighting AI literacy, AI engagement, and AI-enhanced cognitive flexibility as mediators between cultural/technological capital and AI adoption. The conclusions underscore the need for inclusive pedagogical strategies and institutional support to foster equitable AI adoption.

Journal Article

Share this book

Add to My Shelf

The externalization of internal experiences in psychotherapy through generative artificial intelligence: a theoretical, clinical, and ethical analysis

by Haber, Yuval , Pen, Oori , Gigi, Karny in Anxiety , Behavior modification , Chatbots

2025

Externalization techniques are well established in psychotherapy approaches, including narrative therapy and cognitive behavioral therapy. These methods elicit internal experiences such as emotions and make them tangible through external representations. Recent advances in generative artificial intelligence (GenAI), specifically large language models (LLMs), present new possibilities for therapeutic interventions; however, their integration into core psychotherapy practices remains largely unexplored. This study aimed to examine the clinical, ethical, and theoretical implications of integrating GenAI into the therapeutic space through a proof-of-concept (POC) of AI-driven externalization techniques, while emphasizing the essential role of the human therapist. To this end, we developed two customized GPTs agents: VIVI (visual externalization), which uses DALL-E 3 to create images reflecting patients' internal experiences (e.g., depression or hope), and DIVI (dialogic role-play-based externalization), which simulates conversations with aspects of patients' internal content. These tools were implemented and evaluated through a clinical case study under professional psychological guidance. The integration of VIVI and DIVI demonstrated that GenAI can serve as an \"artificial third\", creating a Winnicottian playful space that enhances, rather than supplants, the dyadic therapist-patient relationship. The tools successfully externalized complex internal dynamics, offering new therapeutic avenues, while also revealing challenges such as empathic failures and cultural biases. These findings highlight both the promise and the ethical complexities of AI-enhanced therapy, including concerns about data security, representation accuracy, and the balance of clinical authority. To address these challenges, we propose the SAFE-AI protocol, offering clinicians structured guidelines for responsible AI integration in therapy. Future research should systematically evaluate the generalizability, efficacy, and ethical implications of these tools across diverse populations and therapeutic contexts.

Journal Article

Share this book

Add to My Shelf

The Association Between Men’s Mental Health During COVID-19 and Deterioration in Economic Status

by Tannous-Haddad, Lubna , Alon-Tirosh, Michal , Hadar-Shoval, Dorit in Coronaviruses , COVID-19 , Depression - epidemiology

2022

This study investigated associations among economic status deterioration, mental health, and gender during the COVID-19 pandemic. A total of 1,807 participants completed an online questionnaire that included demographic variables and questions measuring three mental health variables: psychological distress (as measured by symptoms of depression, anxiety, and stress), adjustment disorder, and emotional eating. Results indicated that women reported higher mental health impairment than men. Men and women whose economic status significantly deteriorated because of the COVID-19 pandemic reported greater mental health impairment than those whose economic status did not significantly deteriorate. However, men whose economic status significantly deteriorated reported high mental health impairment (emotional eating and adjustment difficulties) similar to women in the same situation. This change in men’s reporting pattern suggests that the economic impact of COVID-19 severely impacted their mental health and affected how they view their masculinity, which, in turn, further impaired their mental health. As the COVID-19 outbreak has had a significant impact on mental health worldwide, it is important to identify individuals and groups who are at high risk of mental health impairment. The current study demonstrates that men’s distress, which is frequently complex to identify, can be detected using standardized measures and analyzing these according to changes in reporting patterns as opposed to simply examining means and frequencies. The results suggest that the COVID-19 crisis may provide an opportunity to understand more about mental health, in particular, that of men.

Journal Article

Share this book

Add to My Shelf

Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study

by Refoua, Elad , Hadar-Shoval, Dorit , Elyoseph, Zohar in Artificial Intelligence , Benchmarking , Chatbots

2024

Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Journal Article

Share this book

Add to My Shelf

The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy

by Haber, Yuval , Hadar-Shoval, Dorit , Levkovich, Inbar in Artificial Intelligence , Artificial Intelligence - ethics , e-Mental Health and Cyberpsychology

2024

This paper explores a significant shift in the field of mental health in general and psychotherapy in particular following generative artificial intelligence’s new capabilities in processing and generating humanlike language. Following Freud, this lingo-technological development is conceptualized as the “fourth narcissistic blow” that science inflicts on humanity. We argue that this narcissistic blow has a potentially dramatic influence on perceptions of human society, interrelationships, and the self. We should, accordingly, expect dramatic changes in perceptions of the therapeutic act following the emergence of what we term the artificial third in the field of psychotherapy. The introduction of an artificial third marks a critical juncture, prompting us to ask the following important core questions that address two basic elements of critical thinking, namely, transparency and autonomy: (1) What is this new artificial presence in therapy relationships? (2) How does it reshape our perception of ourselves and our interpersonal dynamics? and (3) What remains of the irreplaceable human elements at the core of therapy? Given the ethical implications that arise from these questions, this paper proposes that the artificial third can be a valuable asset when applied with insight and ethical consideration, enhancing but not replacing the human touch in therapy.

Journal Article

Share this book

Add to My Shelf

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values

by Haber, Yuval , Mizrachi, Yonathan , Hadar-Shoval, Dorit in Allied Health Personnel , Artificial intelligence , Burnout

2024

Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz's theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

Journal Article

Share this book

Add to My Shelf

Evaluation of cross-ethnic emotion recognition capabilities in multimodal large language models using the reading the mind in the eyes test

by Refoua, Elad , Meinlschmidt, Gunther , Hadar Shoval, Dorit in 4014/477 , 631/477 , 639/705

2026

Accurate emotion recognition is a foundational component of social cognition, yet human biases can compromise its reliability. The emergent capabilities of multimodal large language models (MLLMs) offer a potential avenue for objective analysis, but their performance has been tested mainly with ethnically homogenous stimuli. This study provides a systematic cross-ethnic evaluation of leading MLLMs on an emotion recognition task to assess their accuracy and consistency across diverse groups. We evaluated three leading MLLMs: ChatGPT-4, ChatGPT-4o, and Claude 3 Opus. Performance was tested twice using three “Reading the Mind in the Eyes Test” (RMET) versions featuring White, Black, and Korean faces. We analyzed accuracy against chance (25%) and compared scores to established human normative data for each ethnic version. ChatGPT-4o achieved performance significantly above chance levels across all tests ( p < .001), with large effect sizes indicating robust performance (Cohen’s h = 1.253–1.619; RD = 0.583–0.694). The model obtained a mean accuracy of 83.3% (30/36) on the White RMET, 94.4% (34/36) on the Black RMET, and 86.1% (31/36) on the Korean RMET, placing it in the 85th, 94th, and 90th percentiles of human norms, respectively. This high accuracy remained consistent across ethnic stimuli. In contrast, ChatGPT-4 performed near the human average, while Claude 3 Opus performed near chance level. These preliminary findings highlight the rapid evolution of MLLMs, highlighting a significant performance leap between consecutive versions. This study suggests that ChatGPT-4o demonstrated performance scores exceeding average human accuracy on this specific task in recognizing complex emotions from static images of the eye region, with its performance remaining consistent across different ethnic groups. While these results are notable, the pronounced performance gaps between models and the inherent limitations of the RMET task underscore the need for continuous validation and careful, ethical consideration to fully understand the capabilities and boundaries of this technology.

Journal Article

Share this book

Add to My Shelf

A controlled trial examining large Language model conformity in psychiatric assessment using the Asch paradigm

by Haber, Yuval , Shoval, Dorit Hadar , Gigi, Karny in Accuracy , Adolescent , Adult

2025

Background Despite significant advances in AI-driven medical diagnostics, the integration of large language models (LLMs) into psychiatric practice presents unique challenges. While LLMs demonstrate high accuracy in controlled settings, their performance in collaborative clinical environments remains unclear. This study examined whether LLMs exhibit conformity behavior under social pressure across different diagnostic certainty levels, with a particular focus on psychiatric assessment. Methods Using an adapted Asch paradigm, we conducted a controlled trial examining GPT-4o’s performance across three domains representing increasing levels of diagnostic uncertainty: circle similarity judgments (high certainty), brain tumor identification (intermediate certainty), and psychiatric assessment using children’s drawings (high uncertainty). The study employed a 3 × 3 factorial design with three pressure conditions: no pressure, full pressure (five consecutive incorrect peer responses), and partial pressure (mixed correct and incorrect peer responses). We conducted 10 trials per condition combination (90 total observations), using standardized prompts and multiple-choice responses. The binomial test and chi-square analyses assessed performance differences across conditions. Results Under no pressure, GPT-4o achieved 100% accuracy across all domains. Under full pressure, accuracy declined systematically with increasing diagnostic uncertainty: 50% in circle recognition, 40% in tumor identification, and 0% in psychiatric assessment. Partial pressure showed a similar pattern, with maintained accuracy in basic tasks (80% in circle recognition, 100% in tumor identification) but complete failure in psychiatric assessment (0%). All differences between no pressure and pressure conditions were statistically significant ( P <.05), with the most severe effects observed in psychiatric assessment (χ²₁=16.20, P <.001). Conclusions This study reveals that LLMs exhibit conformity patterns that intensify with diagnostic uncertainty, culminating in complete performance failure in psychiatric assessment under social pressure. These findings suggest that successful implementation of AI in psychiatry requires careful consideration of social dynamics and the inherent uncertainty in psychiatric diagnosis. Future research should validate these findings across different AI systems and diagnostic tools while developing strategies to maintain AI independence in clinical settings. Trial registration Not applicable.

Journal Article

Share this book

Add to My Shelf

The Feasibility of Large Language Models in Verbal Comprehension Assessment: Mixed Methods Feasibility Study

by Hadar-Shoval, Dorit , Lvovsky, Maya , Shimoni, Yoav in Adult , Artificial Intelligence , Chatbots

2025

Cognitive assessment is an important component of applied psychology, but limited access and high costs make these evaluations challenging. This study aimed to examine the feasibility of using large language models (LLMs) to create personalized artificial intelligence-based verbal comprehension tests (AI-BVCTs) for assessing verbal intelligence, in contrast with traditional assessment methods based on standardized norms. We used a within-participants design, comparing scores obtained from AI-BVCTs with those from the Wechsler Adult Intelligence Scale (WAIS-III) verbal comprehension index (VCI). In total, 8 Hebrew-speaking participants completed both the VCI and AI-BVCT, the latter being generated using the LLM Claude. The concordance correlation coefficient (CCC) demonstrated strong agreement between AI-BVCT and VCI scores (Claude: CCC=.75, 90% CI 0.266-0.933; GPT-4: CCC=.73, 90% CI 0.170-0.935). Pearson correlations further supported these findings, showing strong associations between VCI and AI-BVCT scores (Claude: r=.84, P<.001; GPT-4: r=.77, P=.02). No statistically significant differences were found between AI-BVCT and VCI scores (P>.05). These findings support the potential of LLMs to assess verbal intelligence. The study attests to the promise of AI-based cognitive tests in increasing the accessibility and affordability of assessment processes, enabling personalized testing. The research also raises ethical concerns regarding privacy and overreliance on AI in clinical work. Further research with larger and more diverse samples is needed to establish the validity and reliability of this approach and develop more accurate scoring procedures.

Journal Article

Share this book

Add to My Shelf

Applying language models for suicide prevention: evaluating news article adherence to WHO reporting guidelines

by Rabin, Eyal , Shoval, Dorit Hadar , Szpiler, Tal in 631/477 , 692/499 , 692/700

2025

The responsible reporting of suicide in media is crucial for public health, as irresponsible coverage can potentially promote suicidal behaviors. This study examined the capability of generative artificial intelligence, specifically large language models, to evaluate news articles on suicide according to World Health Organization (WHO) guidelines, potentially offering a scalable solution to this critical issue. The research compared assessments of 40 suicide-related articles by two human reviewers and two large language models (ChatGPT-4 and Claude Opus). Results showed strong agreement between ChatGPT-4 and human reviewers (ICC = 0.81–0.87), with no significant differences in overall evaluations. Claude Opus demonstrated good agreement with human reviewers (ICC = 0.73–0.78) but tended to estimate lower compliance. These findings suggest large language models’ potential in promoting responsible suicide reporting, with significant implications for public health. The technology could provide immediate feedback to journalists, encouraging adherence to best practices and potentially transforming public narratives around suicide.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter