Asset Details

MbrlCatalogueTitleDetail

Paper

Objective Metrics for Evaluating Large Language Models Using External Data Sources

Du, Haoze,

Li, Richard,

Gehringer, Edward

2025

Overview

Evaluating the performance of Large Language Models (LLMs) is a critical yet challenging task, particularly when aiming to avoid subjective assessments. This paper proposes a framework for leveraging subjective metrics derived from the class textual materials across different semesters to assess LLM outputs across various tasks. By utilizing well-defined benchmarks, factual datasets, and structured evaluation pipelines, the approach ensures consistent, reproducible, and bias-minimized measurements. The framework emphasizes automation and transparency in scoring, reducing reliance on human interpretation while ensuring alignment with real-world applications. This method addresses the limitations of subjective evaluation methods, providing a scalable solution for performance assessment in educational, scientific, and other high-stakes domains.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Large language models

/ Performance assessment

/ Performance evaluation

/ Subjective assessment