Catalogue Search | MBRL

Cloze Tests in Measuring Reading Comprehension Levels

by Memiş, Muhammet , Kalyoncu, Muhittin in Analysis , Cloze Procedure , Correlation

2025

This study, conducted as correlational research, aims to objectively examine the validity of cloze tests in Turkish, which are commonly used to assess reading comprehension levels, general language proficiency, and the readability of written materials, and to evaluate the procedures for using these tests to measure reading comprehension. The study investigates the consistency between multiple-choice reading comprehension tests, frequently used in national exams for their functionality and objectivity, and cloze tests designed and scored using various methods. During the study, a total of eight measurement tools were administered to a sample group of 90 seventh-grade students. These tools consisted of four multiple-choice reading comprehension tests based on four distinct texts and four cloze tests, each systematically deleting a word at a different position within the text. Two scoring methods were applied to the cloze tests, one considering only the exact words as correct and the other accepting alternative words that preserved the meaning of the sentence. Within this scope, approximately 23,000 test items were presented to the students in the study group, and around 43,000 evaluations were conducted on these items. Data collected during the 2023-2024 academic year were analyzed using Pearson correlation analysis, revealing a significant positive relationship between cloze tests and multiple-choice tests. Tests scored by considering only the exact words demonstrated greater consistency, while the correlation decreased when context-preserving alternatives were accepted as correct. The highest correlation occurred when every sixth word was systematically deleted. Based on the findings, it is recommended that in cloze tests, exact words should be accepted as correct instead of context-preserving words, every 6th word should be systematically deleted, and the tests should be systematically integrated into measurement and evaluation practices. Bireylerin okuduğunu anlama düzeyini, genel anlamda dil becerisini ve yazılı materyallerin okunabilirliğini ölçmek gibi birçok amaç için kullanılan boşluk tamamlama testlerinin Türkçe için geçerli sonuçlar verip vermediğini nesnel bir düzlemde sorgulamak ve testlerin okuduğunu anlama düzeyinin ölçümünde kullanımına yönelik işe koşulan prosedürleri test etmek amacıyla yürütülen bu çalışma, korelasyonel bir araştırmadır. Bu kapsamda işlevselliği ve nesnelliği dolayısıyla okuduğunu anlama düzeyinin ölçümünde oldukça sık kullanılan ve ulusal sınavlarda başat ölçme aracı konumundaki çoktan seçmeli testler ile farklı prosedürler izlenerek oluşturulmuş ve puanlanmış boşluk tamamlama testleri arasındaki tutarlılık incelenmiştir. İnceleme sürecinde 4 farklı metne yönelik 4 çoktan seçmeli okuduğunu anlama testi ve her birinde farklı sıradaki bir sözcüğün düzenli şekilde silindiği 4 boşluk tamamlama testi olmak üzere 8 farklı ölçme aracı, 7. sınıf düzeyindeki öğrencilerden oluşan 90 kişilik çalışma grubuna uygulanmıştır. Boşluk tamamlama testlerinin puanlanmasında, birincisi orijinal sözcüklerin doğru kabul edildiği, ikincisi bağlamı yani cümlenin anlamını koruyan sözcüklerin doğru kabul edildiği iki farklı puanlama yöntemi kullanılmıştır. Bu kapsamda çalışma grubunda yer alan öğrencilere yaklaşık 23.000 soru maddesi sunulmuş, bu maddeler üzerinde yaklaşık 43.000 değerlendirme yapılmıştır. 2023-2024 eğitim öğretim döneminde yüz yüze elde edilen veriler üzerinde Pearson korelasyon analizi gerçekleştirilmiştir. Yapılan analizlere göre; genel olarak boşluk tamamlama testleri ile çoktan seçmeli okuduğunu anlama testleri arasında anlamlı ve pozitif bir ilişki olduğu, orijinal kelimelerin doğru kabul edildiği boşluk tamamlama testlerinin çoktan seçmeli okuduğunu anlama testleri ile daha tutarlı sonuçlar verdiği, bağlamı koruyan sözcüklerin doğru kabul edildiği durumlarda bu korelasyonun genellikle azaldığı ve çoktan seçmeli okuduğunu anlama testleri ile boşluk tamamlama testleri arasında en yüksek düzeyde korelasyonun düzenli olarak 6. sıradaki sözcüğün silindiği durumlarda elde edildiği sonuçlarına ulaşılmıştır. Ulaşılan sonuçlardan hareketle boşluk tamamlama testlerinde bağlamı koruyan sözcüklerden ziyade orijinal sözcüklerin doğru kabul edilmesi, düzenli olarak 6. sözcüğün silinmesi ve testlerin sistematik bir şekilde ölçme değerlendirme faaliyetlerine dâhil edilmesi şeklinde önerilerde bulunulmuştur.

Journal Article

Share this book

Add to My Shelf

An Effectiveness Study of Generative Artificial Intelligence Tools Used to Develop Multiple-Choice Test Items

by Sondergeld, Connor J. , Archer, James N. , May, Toni A. in Artificial Intelligence , Best Practices , Computational linguistics

2025

Generative artificial intelligence (GenAI) tools developed to support teaching and learning are widely available. Trustworthiness concerns, however, have prompted calls for researchers to study their effectiveness and for educators and educational researchers to be involved in their creation and piloting processes. This study investigated one type of GenAI created to support educators: multiple-choice question generators (MCQ GenAI). Among the nine MCQ GenAI tools investigated, a variety of useful options were available, but only one indicated teacher involvement and none mentioned testing experts in development processes. MCQ GenAI-created items (n = 270) were coded based on MCQ quality item-writing guidelines. Results showed 80.00% of items (n = 216) violated at least one guideline, with 73.70% (n = 199) likely to produce major measurement error (should not use without revision), 6.30% (n = 17) likely to elicit minor measurement error (consider modifying), and 20.00% (n = 54) acceptable (usable as created). Implications suggest multidisciplinary teams are needed in educational GenAI tool development.

Journal Article

Share this book

Add to My Shelf

The Accuracy of Estimating Parameters of Multiple-Choice Test Items, Following Item-Response Theory: A Simulation Study

by Freihat, Aiman Mohammad , Yassin, Omar Saleh Bani in estimation , item parameter , item-response theory

2025

Background/purpose. This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods. The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the parameters of the items. The researchers depended on the square root of the error's mean squares and their relative efficiency (RE). (1500) responses were generated under the assumption of a normal distribution, following the ability parameter. Several tests comprising (50) items each were generated under the assumption of distributions (normal for difficulty, regular for discrimination, regular for guessing), assuming that the tests are multiple-choice, using the Wingen V data generation V.3 program. The BILOG-MG software was used to estimate the item's parameters using the marginal maximum likelihood method. Then, the estimated parameters were compared to the actual parameters using two indicators (absolute difference, the square root of the squares mean of the error, and the relative efficiency index of the variances of the estimated parameters). Results. The study results showed that the three-parameter model was more accurate in estimating the difficulty parameter, followed by the single-parameter model and then the two-parameter model. Conclusion. The results showed that the three-parameter model was more accurate than the two-parameter model. Also, the results showed the guessing parameter is only related to the three-parameter model. The estimated guessing parameter was more accurate in the five-alternative tests, followed by the three-alternative tests and then the four-alternative tests.

Journal Article

Share this book

Add to My Shelf

From item writing to item completion: investigating multiple-choice reading test items through item writer’s intentions and test-takers’ reported processes

by Pham, Ngoc Bao Tram , Zeng, Yijing , Mohd-Said, Nur-Ehsan in Assessment , Cognition , Convergence

2026

Writing multiple-choice (MC) test items that accurately target specific reading constructs remains challenging and time-consuming. Despite careful item development, what test developers intend an item to measure may not correspond to the processes test-takers actually use when answering it. This exploratory study documented the original item writer’s option-level intentions when constructing MC items and examined the extent to which these intentions were corroborated by test-takers’ retrospective verbal reports of their test-taking processes. The documentation revealed that the relevant text portions and cognitive activities intended for each option within an MC item may vary. Triangulation of item writer intentions and reported test-taking processes showed stronger convergence for relevant text portions than for cognitive activities. Divergences between the two data sources were largely associated with test-taking strategies employed by participants. Importantly, documenting the item writer’s option-level intentions provided grounded explanations for why test-takers appeared to engage in different processes when selecting different options. These findings offer a new methodological direction for construct validation of MC items and suggest implications for the future development and evaluation of multiple-choice items in reading assessment.

Journal Article

Share this book

Add to My Shelf

The Grading Multiple Choice Tests System via Mobile Phone using Image Processing Technique

by Ketcham, Mahasak , Yimyam, Worawut in Addition , Algorithms , Answer Sheets

2018

Grading devices are expensive causing budget waste, in addition some are difficult to use. Therefore, an objective test grading system via Android mobile phone was developed to save cost and time in grading. The system uses image processing technique developed by Java. A camera on a mobile phone was used to capture the edge of answers and an equation of geometric simulation of digital camera sensor was applied to identify answers selected from calculation of pixel intensity in real time. The objective test grading system via Android mobile phone can work effectively and accurately more than 95%.

Journal Article

Share this book

Add to My Shelf

5 vs 4: A Quantitative Investigation into the Quality Metrics of Different Multiple-Choice Test Formats

by Sukkaew, Sarhistthep , Chumkaew, Supamas

2023

This study employed quantitative methods to address two primary objectives: 1) to compare the quality of 5-choice and 4-choice multiple-choice tests, and 2) to evaluate the discriminant power of these formats using test response theory with kernel smoothing. Data were collected from 1,966 students at Sukhothai Thammathirat Open University who took a 120-question multiple-choice exam during the second semester of 2019. Four test configurations were analyzed: the Initial Case utilized the original 5-choice format; Case 1 randomly omitted one option from the 5-choice test, excluding the correct answer; Case 2 randomly omitted one option, including the correct answer; and Case 3 adapted the options based on the test-taker’s proficiency level. The study employed Cronbach’s Alpha (denoted as raw_alpha) as a reliability metric, discovering varying levels of reliability across the four cases. The highest reliability was observed in Case 3 with a raw_alpha value of 0.87. There were no differences in the difficulty values or discriminatory power across all cases. The mean scores indicated that students generally performed better on the 4-choice tests in Cases 1-3 than on the original 5-choice format, referred to as the Initial Case. These findings have significant implications for test design, suggesting that 4-choice tests can achieve comparable reliability and discriminatory power to traditional 5-choice tests.

Journal Article

Share this book

Add to My Shelf

Answering Multiple-Choice Questions in Which Examinees Doubt What the True Answer Is among Different Options

by Peñalver San Cristóbal, Carmen , Villacampa, Tomás , Sánchez Lasheras, Fernando in Analysis , Candidates , Combinatorial analysis

2022

This research explores the results that an examinee would obtain if taking a multiple-choice questions test in which they have doubts as to what the true answer is among different options. This problem is analyzed by making use of combinatorics and analytical and sampling methodologies. The Spanish exam through which doctors become medical specialists has been employed as an example. Although it is difficult to imagine that there are candidates who respond randomly to all the questions of such an exam, it is common that they may doubt over what the correct answer is in some questions. The exam consists of a total of 210 multiple-choice questions with 4 answer options. The cut-off mark is calculated as one-third of the average of the 10 best marks in the exam. According to the results obtained, it is possible to affirm that in the case of doubting over two or three of the four possible answers in certain group questions, answering all of them will in most cases lead to obtaining a positive result. Moreover, in the case of doubting between two answer options in all the questions of the MIR test, it would be possible to exceed the cut-off mark.

Journal Article

Share this book

Add to My Shelf

Gender difference in willingness to guess after a failure

by Cipriani, Giam Pietro in Academic Failure , College Students , Foreign Countries

2018

A considerable literature in economics and psychology observes substantial gender differences in risk aversion, confidence, and responses to high pressure. In the educational measurement literature, it has been argued that these differences could disadvantage female students when taking multiple-choice tests, especially if there is a penalty for wrong answers. Using a dataset of multiple-choice exams, the author investigates this issue by analyzing the number of unanswered questions. Since most individuals take this exam repeatedly, differences after a failure also can be observed. The results in this article show that there are significant differences between men and women: in the second and third attempts women omit more questions than men. However, this is also the case in the first attempt after excluding the best students.

Journal Article

Share this book

Add to My Shelf

Analyzing Test-Taking Behavior: Decision Theory Meets Psychometric Theory

by Budescu, David V. , Bo, Yuanchao in Assessment , Behavioral Science and Psychology , Decision Theory

2015

We investigate the implications of penalizing incorrect answers to multiple-choice tests, from the perspective of both test-takers and test-makers. To do so, we use a model that combines a well-known item response theory model with prospect theory (Kahneman and Tversky, Prospect theory: An analysis of decision under risk, Econometrica 47:263–91, 1979 ). Our results reveal that when test-takers are fully informed of the scoring rule, the use of any penalty has detrimental effects for both test-takers (they are always penalized in excess, particularly those who are risk averse and loss averse) and test-makers (the bias of the estimated scores, as well as the variance and skewness of their distribution, increase as a function of the severity of the penalty).

Journal Article

Share this book

Add to My Shelf

ASSESSING THE VALIDITY AND RELIABILITY OF SCIENCE MULTIPLE CHOICE TEST USING RASCH DICHOTOMOUS MEASUREMENT MODEL

by Lay, Yoon Fah , Mohd Dzin, Najah Hazirah in Education , Measurement Techniques , Multiple choice

2021

Multiple choice tests are widely applied to assess students’ knowledge in science education. This study aimed at assessing the validity and reliability of Science Multiple-choice Test in Malaysia. The items for this test were formulated by the researcher together with a panel of science teachers and the head of the science department with close reference to Secondary School Standard Curriculum (KSSM) syllabus. The test consists of 50 multiple-choice items with four options. Rasch measurement model was adopted to evaluate the quality of the Science Multiple-choice Test in terms of reliability analysis, item polarity analysis (PTMEA-CORR), item fit analysis and Principal Component Analysis of Residuals (PCAR). The reliability analysis was performed using Cronbach’s Alpha, and the results of reliability and separation index respectively indicated good reliability level of the test items. In order to improve the validity of the test, two negatively worded items (Q39 and Q40) were removed. Lastly, the PCAR analysis showed the unexplained variance in the 1st contrast (5.4%) was found to be well controlled and was below the ceiling value of one-third of the variance explained by the item (18.7%). However, the positive value of the disattenuated correlations indicate no evidence of the presence of secondary dimension.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter