Saturday, April 11, 2020

Selected-response Tests

The selection of item formats is dictated by the instructional outcomes intended to be assessed. There are formats appropriate to measuring knowledge and simple understanding while there are those fit measuring deep understanding. Selected-response formats entail choosing the nearby best or most option to answer a problem. The greatest challenge for this item format is the construction of plausible options or distracters so not one stands out as attractively correct.

Binary Choice or Alternate Form

This type of test generally provides only two options. The table below shows varieties of structure using the alternate form as suggested by Nitko (2001).

    Table 1: Varieties of Binary Choice

All other varieties of binary-choice have prepositions as the item stimulus except for Yes-No type which uses direct questions. The veracity of propositional statements is judged by the students indicating whether they are true or false, correct or incorrect or whether they agree or disagree with the thought or idea expressed. Requiring students to change their answers particularly for statements that are considered to be incorrect undermines the learners 'ability to reason and increases the level of outcome that can be assessed. The ease of constructing binary-choice items makes this a common option when making items, particularly for knowledge level outcomes. Many of the propositions are content-based in nature and teachers can quickly interpret the items correctly. The challenge often lies not only in writing the propositions but also in preparing the key for correction.

Guidelines to construct a good binary-choice item as suggested by McMillan (2007) and Musial, et.al. (2009):


  1. Write the item so that the answer options are consistent with the logic in the sentence. (Align your options with the logic of your proposition, e.g. if after truth or falsehood, better not use yes-no or agree-disagree options).
    • Example: 
      • Poor: Four and 6 are factors of 24.   Yes   No
      • Good: Four and 6 are factors of 24.   Correct   Incorrect
  2. Focus on a single fact or idea in the item. (Adding more than one idea in the statement can make the item ambiguous. One idea may be correct and the other one incorrect).
    • Example:
      • Poor: T   F   Right to suffrage is given to citizens in a democratic country in order to enjoy economic gains.
      • Good: T   F   Citizens in a democratic society have the right of suffrage.
  3. Avoid long sentences. (Unnecessary long and wordy statements obscure the significant idea).
    • Example:
      • Poor: T   F   Criterion-referenced tests are interpreted based on a standard that determines whether students have reached an acceptable level or not.
      • Better: T   F   Standards are used to interpret criterion-referenced tests.
  4. Avoid insignificant or trivial facts or words. (Students commit errors not because they do not know but due to unnecessary facts).
    • Example:
      • T   F   Legumes, beans and nuts should be avoided by people who are suffering from gout whether inherited or not from their parents.
      • Better: T   F   Legumes, beans and nuts should be avoided by people with gout.
  5. Avoid negative statements. (Statements with not or no are confusing to young readers).
    • Example:
      • Poor: T   F   All European nations are not in favor of joining the European Union.
      • Better: T   F   All European nations are in favor of joining the European Union
  6. Avoid inadvertent clues to the answer. (Items using such words as never, always, all the time, all, etc. are most of the time false and are recognized by test-wise students).
    • Example:
      • Poor: T   F   Essay tests are never easy to score.
      • Better: T   F   Essay tests are difficult to score.
  7. Avoid using vague adjectives and adverbs. (Students interpret differently such adjectives and adverbs as typically, usually, occasionally, quite, etc. It often becomes a test of vocabulary when done).
    • Example:
      • Poor: T   F   People from cold countries typically drink wine every day.
      • Better: T   F   People from cold countries are fond of drinking wine.

Multiple-Choice Items

This format is widely used in classroom testing because of its versatility to assess various levels of understanding from the knowledge and simple understanding to deep understanding. McMillan (2007) believes that multiple-choice tests can determine whether students can use reasoning as a skill similar to binary-choice items, and use students' skills in performing problem-solving, decision-making or other reasoning tasks. Cognitively, demanding outcomes involving analysis and evaluation lend themselves to the use of multiple-choice items. This making of this test format may not be as easy as binary-choice. However, its advantage exceeds what binary-choice can offer. Aside from being able to assess various outcome levels, they are easy to score, less susceptible to guessing than alternate-choice and more familiar to students as they often encounter them in different testing events (Musial, et.al., 2009). The MC item stimulus consists of a stem containing the problem in the form of a direct question or an incomplete statement and the options which offer the alternatives from which to select the correct answer.

Guidelines for writing good MC items (McMillan, 2007; Miller, Linn & Gronlund, 2009; Popham, 2011)

  1. All the words of the stem should be relevant to the task. It means stating the problem succinct and clear so students understand what is expected to be answered.
  2. The stem should be meaningful by itself and should fully contain the problem, This should especially be observed when the stem uses an incomplete statement format. 
    • Example: The constitution is ___________.
      • A stem worded this way does not make definite the conceptual knowledge being assessed. One does not know what is being tested. Is it after the definition of the term, its significance or its history? To test whether a stem is effectively worded is to be able to answer it without the distracted. This stem can be improved by changing its format to a direct question or adding more information in the incomplete statement (see sample below). This way the test writer determines what knowledge competence to focus on and what appropriate distracters to use:
        • What does the constitution of an organization provide? (Direct-question format)
        • The constitution of an organization provides ______. (Incomplete-statement format)
  3. The stem should use a question with only one correct or clearly best answer. Ambiguity sets in when the stem allows for more than one best answer. Students will likely base their answers on personal experience instead of on facts. Consider this example. There could be more than one best answer here.
    • Example:
      • Poor: Which product of Thailand makes it economically stable? a. rice   b. dried fruits   c. dairy products   d. ready-to-wear
      • Improved: Which agricultural product of Thailand is most productive for export? a. rice   b. fish   c. fruits   d. vegetables
  4. The stem must express a complete thought.
    • Example:
      • Poor: The poem “The Raven”
        • a) was written by Edgar Alan Poe
        • b) was written by Elizabeth Browning
        • c) Was written by Omar Khayyam
        • d) Was written by Jose Garcia Villa
      • Better: The poem “The Raven” was written by
        • a) Edgar Alan Poe
        • b) Elizabeth Browning
        • c) Omar Khayyam
        • d) Jose Garcia Villa
    • The second example is better than the first since the stem contains a complete thought.
  5. Keep options short while putting most of the concepts in the stem
  6. The stem must contain only the information necessary to make the problem clear. Do not provide unnecessary details or worse provide a clue to the answer.
    • Example 1: The revolution in Phil. to oust President Marcos took place after a snap election in 1986. It happened at the dawn of Feb. 23. When did the revolution take place?
      • a) Before Valentine’s day
      • b) After Valentine’s day
      • c) On Valentine’s day
    • Example 2: When did the People’s Power Revolution in Phil. take place?
      • a) Feb., 1985
      • b) Feb., 1986
      • c) Feb., 1987
    • The first example does not measure knowledge of Philippine History instead, it focuses on knowledge of Valentine's day. Moreover, the stem provided the clue to the answer Feb 23 is after Feb 14. The second example provides a better choice than the previous one.
  7. Avoid negative statements or double-negative statements in the stem. This may confuse the test taker.
    • Example 1: It is not untrue that Magellan discovered the Phil
    • Example 2: It is true that Magellan discovered the Phil
    • The second example is better than the previous since example 1 contains a double-negative statement. 
  8. Make options equally attractive. This means that the correct answer should be made equally plausible as the "distractor" otherwise the answer will stand out like a sore thumb.
    • Example: The author of “The Raven” is
      • a) Jose Garcia Villa
      • b) Edgar Alan Poe
      • c) Genoveva Matute
      • d) Francisco Balagtas
    • In the example, all except letter (b) are Filipino authors. Since the poem is very foreign-sounding to the students the author must be a foreigner.
  9. Use the option "none of these" or "none of the above" only when there is only one correct answer.
  10. Ensure that items do not dwell too much on "knowledge" or rote learning. MC items when properly constructed can elicit high order responses. The example below shows that the item measures both comprehension and analysis.
    • Example: The volume of a sphere is given by v=4/3 (pi) r^3 where r=radius of the sphere? The volume will be:
      • a) Multiplied by a factor of 2
      • b) Multiplied by a factor of 3
      • c) Multiplied by a factor of 4
      • d) Multiplied by a factor of 8
  11. As much as possible avoid using "all of the above" as an option.
  12. All distracters should appear plausible to uninformed test takers. This is the key to making the item discriminating and therefore valid. The validity of the item suffers when there is a distracter that is obviously correct as option D or obviously wrong as option B in the following item.
    • Poor: What is matter?
      • a. Everything the surrounds us
      • b. All things bright and beautiful
      • Things we see and her
      • Anything that occupies space and has mass.
  13. Randomly assign correct answers to alternative positions. Item writers have a tendency to assign the correct answer to the third alternative as they run short of incorrect alternatives. Students then who have been used to taking multiple-choice tests choose option C when guessing for a greater chance of being correct. No deliberate order should be followed in assigning the correct answers (e.g. ABCDABCD or AACCBBDD) for ease in scoring. As much as possible have an equal number of correct answers distributed randomly in each of the distracters.

Ways to make distracters plausible given by Miller, Linn & Gronlund (2009).

  1. Use the students'most common errors
  2. Use important-sounding-words (e.g. significant, accurate) that are relevant to the item stem. But do not overdo it.
  3. Use words that have verbal associations with the item stem (e,g, politician, political)
  4. Use textbook language or other phraseology that has the appearance of truth
  5. Use incorrect answers that are likely to result from student misunderstanding or carelessness (e.g. forgets to convert feet to yards)
  6. Use distracters that are homogenous and similar in content to the correct option (e.g. all are inventors)
  7. Use distracters that are parallel in form and grammatically consistent with the item stem
  8. Make the distracters similar to the correct answer in length, vocabulary, sentence structure, and complexity of thought.
Caution: Distractors should distract the uninformed, but they should not result in trick questions that mislead knowledgeable students (do not insert not in a correct answer to make a distracter).

Varieties:
  1. Single Correct Answer
  2. Best Answer
  3. Negative Stem
  4. Multiple Response
  5. Combined Resources
  6. Stimulus-Material-Stem-Alternatives

Matching Type Test

The match type items may be considered as modified multiple-choice type items. This format consists of two parallel lists of words or phrases the students are tasked to pair. the first list which is to be matched is referred to as premises while the other list from which to choose its match based on a kind of association is the responses. The PREMISES are a list of words or phrases to be matched or associated with an appropriate word while the RESPONSES are a list of homogenous alternatives or options from which to select what will match the premise.

Illustrative Item 1
The first column describes events associated with Philippine presidents while the second column gives their names. In the space provided, write the letter of the president that matches the description.

source: de Guzman & Adamos, 2015
Illustrative Item 2 (for advance level)
Column A contains theoretical populations of how the universe came about. Match each one with the name of the theory given in Column B. Indicate the appropriate letter to the left of the number in Column A.

source: de Guzman & Adamos, 2015

Guidelines in constructing matching items (Kubiszyn and Borich, 2010)

  1. Keep the list of premises and the list of responses homogenous or belonging to a category. In illustration 1, the premises are events associated with Philippine presidents while the responses are all names of presidents. In illustration 2, Column A lists some theories in astronomy about how the universe has evolved and Column B lists the names of the theories. Homogeneity is a basic principle in matching item
  2. Keep the premises always in the first column and the responses in the second column. Since the premises are oftentimes descriptions of events, illustrations of principles, functions or characteristics, they appear longer than the responses which are most of the times are names, categories, objects, and parts. Ordering of the two columns this way saves reading time for the students since they will usually read one long premise once and select the appropriate match from a list of short words. If ordered the opposite way, the students will read short words as the premise then read through long descriptions to look for the correct answer. 
  3. Keep the lists in two columns unequal in number. The basic reason for this is to avoid guessing. The options in Column B are usually more than the premises in Column A. If two lists are equal in number (perfect match), students can strategically resort to wise elimination in finding the rest of the pairs. There are matching items, however, when the options are much less than the premises. This is recommended when the testing ability is to classify. For example, Column A will be a list of 10 animals that are to be classified and Column B could just be 4 categories of mammals. With this format, it is important to mention in the test directions that an option can be used more than once.
  4. Test directions always describe the basis for matching. "Match Column A with Column B" is a no-no in matching type. Describe clearly what is to be found in the two columns, how they are associated and how matching will be done. Invalid scores of students could be due to extraneous factors like the misinterpretation of how matching is to be done, misunderstanding in using given options (e.g. using an option only once when the teacher allows the use of an option more than once) and limiting the number of items to be answered when there are few options given.
  5. Keep the number of premises not more than eight (8) as shown in the two sample items. Fatigue sets in when there are too many items in a set and again, test validity suffers. If an item writer feels that there are many concepts to be tested, dividing them into sets is a better strategy. It is also suggested that a set of matching items should appear on a page only and not be carried on to the next page.
  6. Ambiguous lists should be avoided. This is especially true in the preparation of options for the second column. There should only be one option appropriately associated with a premise unless it is unequivocally mentioned that an option could be used more than once as mentioned in #4. This often occurs when matching events and places or events and names, descriptions and characters. 

Example of Parallel Concepts:

  1. terms and definitions
  2. objects/pictures and labels
  3. symbols and proper names
  4. causes and effects
  5. scenarios and responses
  6. principles and scenarios to which they apply
Some rules of thumb exist for how long it takes most students to answer various types of questions:
  • A true-false test item takes 15 seconds to answer unless the student is asked to provide the correct answer for false questions. Then the time increases to 30-45 seconds.
  • A seven-item matching exercise takes 60-90 seconds.
  • A four response multiple-choice test item that asks for an answer regarding a term, fact, definition, rule or principle (knowledge level item) takes 30 seconds. The same type of test item that is at the application level may take 60 seconds.
  • Any test item format that requires solving a problem, analyzing, synthesizing information or evaluating examples adds 30-60 seconds to a question.
  • Short-answer test items take 30-45 seconds.
  • An essay test takes 60 seconds for each point to be compared and contrasted.
References:
  1. De Guzman, E. & Adamos, J. (2015). Assessment of learning 1. Adriana Pub. Co., Inc:Q.C.
  2. McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.
  3. Miller, M., Linn, R., & Gronlund, N. (2009). Measurement & assessment in teaching, 10th ed. New Jersey: Pearson Education, Inc.
  4. Nitko, A. (2001). Educational assessment of students, 3rd ed. Upper Saddle River, New Jersey: Prentice-Hall, Inc.
  5. Padua, R.N. & Santos, R.G. (1997). Educational Evaluation & Measurement: theory, practice, and application. Katha Publishing: QC.
  6. Santos, R.G. (2007) Assessment of Learning 1. Lorimar:QC
Source: ReadabilityFormulas.com