Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
34,118
result(s) for
"Test Construction"
Sort by:
Adapting educational and psychological tests for cross-cultural assessment
by
Charles D. Spielberger
,
Peter F. Merenda
,
Ronald K. Hambleton
in
Assessment & Testing
,
Aufsatzsammlung
,
Cross-Cultural
2005,2004,2012
Adapting Educational and Psychological Tests for Cross-Cultural Assessment critically examines and advances new methods and practices for adapting tests for cross-cultural assessment and research. The International Test Commission (ITC) guidelines for test adaptation and conceptual and methodological issues in test adaptation are described in detail, and questions of ethics and concern for validity of test scores in cross-cultural contexts are carefully examined. Advances in test translation and adaptation methodology, including statistical identification of flawed test items, establishing equivalence of different language versions of a test, and methodologies for comparing tests in multiple languages, are reviewed and evaluated. The book also focuses on adapting ability, achievement, and personality tests for cross-cultural assessment in educational, industrial, and clinical settings.
This book furthers the ITC's mission of stimulating research on timely topics associated with assessment. It provides an excellent resource for courses in psychometric methods, test construction, and educational and/or psychological assessment, testing, and measurement. Written by internationally known scholars in psychometric methods and cross-cultural psychology, the collection of chapters should also provide essential information for educators and psychologists involved in cross-cultural assessment, as well as students aspiring to such careers.
Contents: Preface. Part I: Cross-Cultural Adaptation of Educational and Psychological Tests: Theoretical and Methodological Issues. R.K. Hambleton, Issues, Designs, and Technical Guidelines for Adapting Tests Into Multiple Languages and Cultures. F.J.R. van de Vijver, Y.H. Poortinga, Conceptual and Methodological Issues in Adapting Tests. T. Oakland, Selected Ethical Issues Relevant to Test Adaptations. S.G. Sireci, L. Patsula, R.K. Hambleton, Statistical Methods for Identifying Flaws in the Test Adaptation Process. S.G. Sireci, Using Bilinguals to Evaluate the Comparability of Different Language Versions of a Test. L.L. Cook, A.P. Schmitt-Cascallar, Establishing Score Comparability for Tests Given in Different Languages. L.L. Cook, A.P. Schmitt-Cascallar, C. Brown, Adapting Achievement and Aptitude Tests: A Review of Methodological Issues. Part II: Cross-Cultural Adaptation of Educational and Psychological Tests: Applications to Achievement, Aptitude, and Personality Tests. C.T. Fitzgerald, Test Adaptation in a Large-Scale Certification Program. C.Y. Maldonado, K.F. Geisinger, Conversion of the Wechsler Adult Intelligence Scale Into Spanish: An Early Test Adaption Effort of Considerable Consequence. N.K. Tanzer, Developing Tests for Use in Multiple Languages and Cultures: A Plea for Simultaneous Development. F. Drasgow, T.M. Probst, The Psychometrics of Adaptation: Evaluating Measurement Equivalence Across Languages and Cultures. M. Beller, N. Gafni, P. Hanani, Constructing, Adapting, and Validating Admissions Tests in Multiple Languages: The Israeli Case. P.F. Merenda, Cross-Cultural Adaptation of Educational and Psychological Testing. C.D. Spielberger, M.S. Moscoso, T.M. Brunner, Cross-Cultural Assessment of Emotional States and Personality Traits.
Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?
by
Smits, Niels
,
Conijn, Judith M.
,
van der Ark, L. Andries
in
Clinical outcomes
,
Construction
,
Medicine
2018
Background Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. Objective To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? Illustrative example An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. Discussion The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.
Journal Article
The use of latent variable mixture models to identify invariant items in test construction
by
Lix, Lisa M.
,
Sajobi, Tolulope T.
,
Zumbo, Bruno D.
in
Construction
,
Ethnicity
,
Health sciences
2018
Purpose Patient-reported outcome measures (PROMs) are frequently used in heterogeneous patient populations. PROM scores may lead to biased inferences when sources of heterogeneity (e.g., gender, ethnicity, and social factors) are ignored. Latent variable mixture models (LVMMs) can be used to examine measurement invariance (MI) when sources of heterogeneity in the population are not known a priori. The goal of this article is to discuss the use of LVMMs to identify invariant items within the context of test construction. Methods The Draper-Lindely-de Finetti (DLD) framework for the measurement of latent variables provides a theoretical context for the use of LVMMs to identify the most invariant items in test construction. In an expository analysis using 39 items measuring daily activities, LVMMs were conducted to compare 1- and 2-class item response theory models (IRT). If the 2-class model had better fit, item-level logistic regression differential item functioning (DIF) analyses were conducted to identify items that were not invariant. These items were removed and LVMMs and DIF testing repeated until all remaining items showed MI. Results The 39 items had an essentially unidimensional measurement structure. However, a 1-class IRT model resulted in many statistically significant bivariate residuals, indicating suboptimal fit due to remaining local dependence. A 2-class LVMM had better fit. Through subsequent rounds of LVMMs and DIF testing, nine items were identified as being most invariant. Conclusions The DLD framework and the use of LVMMs have significant potential for advancing theoretical developments and research on item selection and the development of PROMs for heterogeneous populations.
Journal Article
Practices of EFL teachers in test construction
by
Tefera, Ebabu
,
Tewachew, Abebe
,
Shiferie, Kassie
in
Blooms taxonomy
,
Critical Thinking
,
EFL teachers' practices
2024
This study aimed to examine the practices of EFL teachers in test construction. It was conducted in six purposively selected secondary schools. The study addressed the following question: What are the practices of EFL teachers in test construction? The theoretical basis of the study was Bloom's taxonomy. The design of the study was a convergent parallel mixed design along with a pragmatism paradigm. Questionnaire, semistructured interviews, nonparticipant observation and document analysis were used to collect the data. On the bases of this, the findings revealed that EFL teachers did not receive test development training. To this effect, they skipped crucial steps of test construction: the planning stage, designing stage and tryout stage. Additionally, the results of the study indicated that grammar and reading were the focus areas of the tests. The constructed exams were found that there were no tasks involving critical thinking or problem solving. Furthermore, the archive analysis data showed that EFL teachers did not follow the trend of Bloom's taxonomy as a framework while designing tests. This study suggested that education professionals in zone and woreda should prepare for test development training in light of these findings.
Teachers' pedagogical and content knowledge dictate how teacher-made tests are constructed. The purpose of this study was to examine practices of EFL teachers in test construction at Debark secondary schools. It appears that teachers were losing sight of yardstick philosophies of teaching language and testing. Furthermore, they did not use Bloom's taxonomy framework in their actual practices. Consequently, teachers neglected to apply the necessary test development phases as a result of a lack of test development training. Because of this, it is preferable for them in zone and woreda to have access to test construction trainings.
Journal Article
An introduction to the use of evidence-centered design in test development
2014
The purpose of this article is to describe what Evidence-Centered Design (ECD) is and to explain why and how ECD is used in the design and development of tests. The article will be most useful for readers who have some knowledge of traditional test development practices, but who are unfamiliar with ECD. The article begins with descriptions of the major characteristics of ECD, adds a brief note on the origins of ECD, and discusses the relationship of ECD to traditional test development. Next, the article lists the important advantages of using ECD with an emphasis on the validity of the inferences made about test takers on the basis of their scores. The article explains the nature and purpose of the \"layers\" or stages of the ECD test design and development process: 1) domain analysis; 2) domain modeling; 3) conceptual assessment framework; 4) assessment implementation; and 5) assessment delivery. Some observations about my experience with the early application of ECD for those who plan to begin using ECD, a brief conclusion, and some recommendations for further reading end the article.
Journal Article
The Development and Validation of the Short Form of the Foreign Language Enjoyment Scale
by
DEWAELE, JEAN-MARC
,
GREIFF, SAMUEL
,
BOTES, ELOUISE
in
Appreciation
,
Convergent validity
,
Discriminant validity
2021
We used a data set with n = 1,603 learners of foreign languages (FL) to develop and validate the short form of the Foreign Language Enjoyment Scale (S-FLES). The data was split into 2 groups, and we used the first sample to develop the short-form measure. A 3-factor hierarchical model of foreign language enjoyment (FLE) was uncovered, with FLE as a higher-order factor and with teacher appreciation, personal enjoyment, and social enjoyment as 3 lower-order factors. We selected 3 items for each of the 3 lower-order factors of the S-FLES. The proposed 9-item S-FLES was validated in the second sample, and the fit statistics for the factor structure indicated close fit. Further evidence was found to support the internal consistency, convergent validity, and discriminant validity of the S-FLES. The S-FLES provides a valid and reliable short-form measure of FLE, which can easily be included in any battery of assessments examining individual differences in FL learning.
Journal Article
Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review
2017
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also called the distractors. Despite a vast body of literature on multiple-choice testing, the task of creating distractors has received much less attention. In this study, we provide an overview of what is known about developing distractors for multiple-choice items and evaluating their quality. Next, we synthesize the existing guidelines on how to use distractors and summarize earlier research on the optimal number of distractors and the optimal ordering of distractors. Finally, we use this comprehensive review to provide the most up-to-date recommendations regarding distractor development, analysis, and use, and in the process, we highlight important areas where further research is needed.
Journal Article
Early Detection of Dyslexia Risk
by
Foorman, Barbara R.
,
Fletcher, Jack M.
,
Schatschneider, Christopher
in
At risk populations
,
At Risk Students
,
Barriers
2021
Many states now mandate early screening for dyslexia, but vary in how they address these mandates. There is confusion about the nature of screening versus diagnostic assessments, risk versus diagnosis, concurrent versus predictive validity, and inattention to indices of classification accuracy as the basis for determining risk. To help define what constitutes a screening assessment, we summarize efforts to develop short (3–5 min), teacher-administered screens that used multivariate strategies for variable selection, item response theory to select items that are most discriminating at a threshold for predicting risk, and statistical decision theory. These methods optimize prediction and lower the burden on teachers by reducing the number of items needed to evaluate risk. A specific goal of these efforts was to minimize decision errors that would result in the failure to identify a child as at risk of dyslexia/reading problems (false negatives) despite the inevitable increase in identifications of children who eventually perform in the typical range (false positives). Five screens, developed for different periods during kindergarten, Grade 1, and Grade 2, predicted outcomes measured later in the same school year (Grade 2) or in the subsequent year (Grade 1). The results of this approach to development are applicable to other screening methods, especially those that attempt to predict those children at risk of dyslexia prior to the onset of reading instruction. Without reliable and valid early predictive screening measures that reduce the burden on teachers, early intervention and prevention of dyslexia and related reading problems will be difficult.
Journal Article
Development and Validation of the Camouflaging Autistic Traits Questionnaire (CAT-Q)
2019
There currently exist no self-report measures of social camouflaging behaviours (strategies used to compensate for or mask autistic characteristics during social interactions). The Camouflaging Autistic Traits Questionnaire (CAT-Q) was developed from autistic adults’ experiences of camouflaging, and was administered online to 354 autistic and 478 non-autistic adults. Exploratory factor analysis suggested three factors, comprising of 25 items in total. Good model fit was demonstrated through confirmatory factor analysis, with measurement invariance analyses demonstrating equivalent factor structures across gender and diagnostic group. Internal consistency (α = 0.94) and preliminary test–retest reliability (r = 0.77) were acceptable. Convergent validity was demonstrated through comparison with measures of autistic traits, wellbeing, anxiety, and depression. The present study provides robust psychometric support for the CAT-Q.
Journal Article