Abstract (EN):
Multiple-choice questions (MCQs) are regularly used in exams in order to assess students in health science disciplines. Despite this fact, MCQ items often have item-writing flaws, and few educators have formal instruction in writing MCQs. The major purpose of our study was to estimate the inter-rater agreement about item classification as either standard or flawed. In order to achieve this goal, four judges (2 teacher/2 students), blinded to all item performance data, independently classified each one of 920 test items from 10 examinations as either standard or flawed. If flawed the exact type of item flaw or flaws present in the question stem and respective options were recorded. In this study, the standard item was operationally defined as any item that did not violate one or more of the 31 principles noted in a review article which summarized current educational measurement recommendations concerning item writing. The Fleiss' Kappa was use to evaluate the inter-rater agreement between 4 judges previous the consensus process. In respect to the agreement about item classification as either standard or flawed was fair (kappa=0.3). Despite the agreement was substantial for the more prevalent principles, generally the results showed many disagreements among judges about item classification, previous the consensus process. In a future investigation it is important to assess if presence of flaw or flaws in the MCQ item have impact its quality, namely, if there are interference with difficulty and discrimination indices of the MCQ item.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
4