Catalogue Search | MBRL

A Survey on Evaluation Metrics for Machine Translation

by Seonmin Koo , Heuiseok Lim , Hyeonseok Moon in Algorithms , automatic evaluation metric , Automation

2023

The success of Transformer architecture has seen increased interest in machine translation (MT). The translation quality of neural network-based MT transcends that of translations derived using statistical methods. This growth in MT research has entailed the development of accurate automatic evaluation metrics that allow us to track the performance of MT. However, automatically evaluating and comparing MT systems is a challenging task. Several studies have shown that traditional metrics (e.g., BLEU, TER) show poor performance in capturing semantic similarity between MT outputs and human reference translations. To date, to improve performance, various evaluation metrics have been proposed using the Transformer architecture. However, a systematic and comprehensive literature review on these metrics is still missing. Therefore, it is necessary to survey the existing automatic evaluation metrics of MT to enable both established and new researchers to quickly understand the trend of MT evaluation over the past few years. In this survey, we present the trend of automatic evaluation metrics. To better understand the developments in the field, we provide the taxonomy of the automatic evaluation metrics. Then, we explain the key contributions and shortcomings of the metrics. In addition, we select the representative metrics from the taxonomy, and conduct experiments to analyze related problems. Finally, we discuss the limitation of the current automatic metric studies through the experimentation and our suggestions for further research to improve the automatic evaluation metrics.

Journal Article

Share this book

Add to My Shelf

Utilizing Artificial Intelligence Technologies in Saudi EFL Tertiary Level Classrooms

by Othman, Khalid Abdurrahman Jabir , AbdAlgane, Mohammed

2023

This study focuses on the employment of AI technology in regular, day-to-day activities, such as when Google Translate or Bing Translator are encouraged alongside various programs and applications. It also evaluates and empirically demonstrates the subjects of writing with AI technologies, computer-assisted language learning (CALL), machine translation (MT), and automatic evaluation systems (AESs) in order to offer solutions for enhanced communication training in Saudi Arabia's EFL system. Word tune is an artificial intelligence (AI)-driven writing assistant that can understand the writer's ideas and suggest alternative rewrites (e.g., shorten, expand). This program assists writers of English as a foreign language to maintain a steady flow and acquire useful English expressions. This research made use of questionnaires as a method for collecting data and then ran those responses through SPSS for analysis. The use of artificial intelligence (AI) technology in English as a foreign language (EFL) settings has been shown to facilitate the English language learning (ELT) process and to keep both teachers and students up to date on recent technological developments. This exploratory investigation demonstrated that all digital and AI-powered devices have the potential to assist in teaching and learning. Consequently, the pedagogical component of future education can be developed using an AI framework.

Journal Article

Share this book

Add to My Shelf

Enhancing Long-Term Action Quality Assessment: A Dual-Modality Dataset and Causal Cross-Modal Framework for Trampoline Gymnastics

by Feng, Chen , Huang, Jiahao , Chen, Zhide in action quality assessment , Annotations , Athletes

2025

Action quality assessment (AQA) plays a pivotal role in intelligent sports analysis, aiding athlete training and refereeing decisions. However, existing datasets and methods are limited to short-term actions, lacking comprehensive spatiotemporal modeling for complex, long-duration sequences like those in trampoline gymnastics. To bridge this gap, we introduce Trampoline-AQA, a novel dataset comprising 206 video clips from major competitions (2018–2024), featuring dual-modality (RGB and optical flow) data and rich annotations. Leveraging this dataset, we propose a framework comprising a Temporal Feature Enhancer (TFE) and a forward-looking causal cross-modal attention (FCCA) module, which improves action quality assessment by delivering more accurate and robust scoring for long-duration, high-speed routines, particularly under motion ambiguities. Our approach achieves a Spearman correlation of 0.938 on Trampoline-AQA and 0.882 on UNLV-Dive, demonstrating superior performance and generalization capability.

Journal Article

Share this book

Add to My Shelf

Toward an application of automatic evaluation system for central facial palsy using two simple evaluation indices in emergency medicine

by Taka-aki Nakada , Takayuki Okamoto , Yoichi Yoshida in 639/166/985 , 692/617/375/534 , Ambulance service

2024

A stroke is a medical emergency and thus requires immediate treatment. Paramedics should accurately assess suspected stroke patients and promptly transport them to a hospital with stroke care facilities; however, current assessment procedures rely on subjective visual assessment. We aim to develop an automatic evaluation system for central facial palsy (CFP) that uses RGB cameras installed in an ambulance. This paper presents two evaluation indices, namely the symmetry of mouth movement and the difference in mouth shape, respectively, extracted from video frames. These evaluation indices allow us to quantitatively evaluate the degree of facial palsy. A classification model based on these indices can discriminate patients with CFP. The results of experiments using our dataset show that the values of the two evaluation indices are significantly different between healthy subjects and CFP patients. Furthermore, our classification model achieved an area under the curve of 0.847. This study demonstrates that the proposed automatic evaluation system has great potential for quantitatively assessing CFP patients based on two evaluation indices.

Journal Article

Share this book

Add to My Shelf

Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis

by Grubišić, Ani , Gašpar, Angelina , Šarić-Grgić, Ines in Accuracy , Automatic , College students

2023

We present a rule-based approach to automatic factual question generation implemented in the Adaptive Courseware and Natural Language Tutor, a natural language-based intelligent tutoring system. Since machine-generated questions are intended for adaptive teaching, learning and assessment, their accuracy is of the utmost importance. However, the generation of high-quality questions is still challenging. The proposed approach relies on pre-processing techniques and syntactic and semantic feature extraction to transform declarative sentences and their segments into questions. The quality of questions, generated from domain specific texts, was evaluated by using mixed evaluation strategies: (1) human evaluation, (2) qualitative error analysis, (3) automatic evaluation, (4) human and automatic evaluation of machine-generated questions from paraphrases compared to a set of human-authored questions, (5) preliminary comparison to other approaches. The human evaluation involved two teachers of English as a foreign language who set up evaluation criteria (grammaticality, semantic accuracy, and answerability) and a group of 30 English language graduates. Student-generated questions were validated and used as reference questions for automatic evaluation based on similarity metrics (BLEU-4, METEOR, CHRF, NIST and ROUGE-L). Human and automatic evaluation results were satisfactory but improved significantly with the paraphrasing strategy. The preliminary comparison to other approaches showed that the proposed rule-based approach performed equally well despite its limitations.

Journal Article

Share this book

Add to My Shelf

The Application of Deep Learning for the Evaluation of User Interfaces

by Milicevic, Mario , Keselj, Ana , Zubrinic, Krunoslav in Adaptation , Algorithms , Artificial intelligence

2022

In this study, we tested the ability of a machine-learning model (ML) to evaluate different user interface designs within the defined boundaries of some given software. Our approach used ML to automatically evaluate existing and new web application designs and provide developers and designers with a benchmark for choosing the most user-friendly and effective design. The model is also useful for any other software in which the user has different options to choose from or where choice depends on user knowledge, such as quizzes in e-learning. The model can rank accessible designs and evaluate the accessibility of new designs. We used an ensemble model with a custom multi-channel convolutional neural network (CNN) and an ensemble model with a standard architecture with multiple versions of down-sampled input images and compared the results. We also describe our data preparation process. The results of our research show that ML algorithms can estimate the future performance of completely new user interfaces within the given elements of user interface design, especially for color/contrast and font/layout.

Journal Article

Share this book

Add to My Shelf

Automatic theranostics for long-term neurorehabilitation after stroke

by Chen, Fei , Hu, Xiaoling , Li, Zengyong in Aging Neuroscience , automatic evaluation , automatic rehabilitation management

2023

Journal Article

Share this book

Add to My Shelf

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

by Hailu, Tulu Tilahun , Fantaye, Tessfu Geteye , Yu, Junqing in automatic evaluation metrics , extrinsic evaluation , intrinsic evaluation

2020

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Journal Article

Share this book

Add to My Shelf

Conceptual Framework for Programming Skills Development Based on Microlearning and Automated Source Code Evaluation in Virtual Learning Environment

by Drlik, Martin , Stolinska, Anna , Smyrnova-Trybulska, Eugenia in Learning , Online instruction , Software

2021

Understanding how software works and writing a program are currently frequent requirements when hiring employees. The complexity of learning programming often results in educational failures, student frustration and lack of motivation, because different students prefer different learning paths. Although e-learning courses have led to many improvements in the methodology and the supporting technology for more effective programming learning, misunderstanding of programming principles is one of the main reasons for students leaving school early. Universities face a challenging task: how to harmonise students’ education, focusing on advanced knowledge in the development of software applications, with students’ education in cases where writing code is a new skill. The article proposes a conceptual framework focused on the comprehensive training of future programmers using microlearning and automatic evaluation of source codes to achieve immediate feedback for students. This framework is designed to involve students in the software development of virtual learning environment software that will provide their education, thus ensuring the sustainability of the environment in line with modern development trends. The paper’s final part is devoted to verifying the contribution of the presented elements through quantitative research on the introductory parts of the framework. It turned out that although the application of interactive features did not lead to significant measurable progress during the first semester of study, it significantly improved the results of students in subsequent courses focused on advanced programming.

Journal Article

Share this book

Add to My Shelf

Automatic Evaluation of English Translation Based on Multi-granularity Interaction Fusion

by Hu, Haize , Yang, Yonghe , Chen, Xibo in Artificial Intelligence , Complex Systems , Computational Intelligence

2025

The latest neural machine translation automatic evaluation method uses pre-trained context word vectors to extract semantic features and directly concatenates them into the neural network to predict translation quality. However, the direct operation can easily lead to a lack of interaction between features, and the layer-by-layer prediction is prone to losing fine-grained matching information. To address these issues, we propose a multi-granularity interactive fusion English translation automatic evaluation, which introduces middle and late information fusion methods. First, we use a bilinear attention distribution to capture high-order cross language feature interactions. By stacking multiple high-order interaction blocks and equipping them with an index linear unit without parameters for middle fusion in a parameter-free manner. Second, we use fine-grained accurate matching sentence shift distance and sentence-level cosine similarity for late fusion. The experimental results on the WMT’21 Metrics Task benchmark dataset show that the proposed method can effectively improve its correlation with human evaluation and achieve comparable performance with the best participating system.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter