THE TEST DEVELOPMENT PROCESS
The test construction process for classroom use applies similar steps in any instrument as shared by various authorities (Crocker and Algina, 1986; Miller, et. al., 2009; Russel and Airasian, 2012). This is illustrated in figure 1 below:
Figure 1. Test Development Process for Classroom Tests |
A. Planning Phase - In this phase, the purpose of the test is identified, learning outcomes are clearly specified, and the table of specifications is prepared to guide the test item construction.
B. Item Construction Phase - This is where items in the test are constructed following the guidelines and item format for the specified learning outcomes of instruction.
C. Review Phase - Prior to the administration of the test, the teacher examines the test items based on the alignment of the content and the behavior component of the instructional competencies. After the administration of the test, the teacher examines the test based on the analysis of students' performance in each item
WHAT IS A TABLE OF SPECIFICATIONS (TOS)?
A TOS, sometimes called a test blueprint, is a matrix where rows consist of the specific topic or competencies and columns are the objectives cast in terms of Bloom's Taxonomy. It is a table that helps teachers align objectives, instruction, and assessment (e.g., Notar, Zuelke, Wilson, & Yunker, 2004). This strategy can be used for a variety of assessment methods but is most commonly associated with constructing traditional summative tests. The TOS can help teachers map the amount of class time spent on each objective with the cognitive level at which each objective was taught thereby helping teachers to identify the types of items they need to include on their tests. There are different versions of these tables or blueprints (e.g., Linn & Gronlund, 2000; Mehrens & Lehman, 1973; Nortar et al., 2004), and the one presented here is one that we have found most useful in our own teaching. This tool can be modified to best meet your needs in developing classroom tests.
WHAT IS THE PURPOSE OF A TABLE OF SPECIFICATIONS?
In order to understand how to best modify a TOS to meet your needs, it is important to understand the goal of this strategy: improving the validity of a teacher’s evaluations based on a given assessment.
Validity is the degree to which the evaluations or judgments we make as teachers about our students can be trusted based on the quality of evidence we gathered (Wolming & Wilkstrom, 2010). It is important to understand that validity is not a property of the test constructed, but of the inferences, we make based on the information gathered from a test. When we consider whether or not the grades we assign to students are accurate we are questioning the validity of our judgment. When we ask these questions we can look to the kinds of evidence endorsed by researchers and theorists in educational measurement to support the claims we make about our students (AERA, APA, NCME, 1999). For classroom assessments, two sources of validity evidence are essential: evidence based on test content and evidence-based on the response process (APA, AERA, NCME, 1999).
Evidence-based on test content underscores the degree to which a test measures what it is designed to measure (Wolming & Wilkstrom, 2010). This means that your classroom tests must be aligned to the content (subject matter) taught in order for any of your judgments about student understanding and learning to be meaningful. Essentially, with test-content evidence, we are interested in knowing if the measured (tested/assessed) objectives reflect what you claim to have measured.
Response process evidence is the second source of validity evidence that is essential to classroom teachers. Response process evidence is concerned with the alignment of the kinds of thinking required of students during instruction and during assessment (testing) activities.
Sometimes the tests teachers administer have evidence for test content but not the response process. That is, while the content is aligned with instruction the test does not address the content at the same depth or level of meaning that was experienced in class. When students feel that they are being tricked or that the test is overly specific (nit-picky) there is probably an issue related to the response process at play. As test constructors, we need to concern ourselves with evidence of the response process. One way to do this is to consider whether the same kind of thinking is used during class activities and summative assessments. If the class activity focused on memorization then the final test should also focus on memorization and not on a thinking activity that is more advanced.
Table 1 provides two possible test items to assess the understanding of the digestion process. In Table 1, Item 1 assesses whether or not students can identify the organ in the digestion process. Item 2 assesses whether or not students can apply the concepts learned in the digestion process described in the scenario. Thus, these two items require different levels of thinking and understanding of the same content (i.e., recognizing/identifying vs. evaluating/applying). Evidence of the response process ensures that classroom tests assess the level of thinking that was required for students during their instructional experiences.
Table 1: Examples of items assessing different cognitive levels
Table 1: Examples of items assessing different cognitive levels
There are six levels of thinking as identified by Bloom in the 1950s and these levels were revised by a group of researchers in 2001 (Anderson et al). Thinking that emphasizes recall, memorization, identification, and comprehension, is typically considered to be at a lower level. Higher levels of thinking include processes that require learners to apply, analyze, evaluate, and synthesize.
When considering test items people frequently confuse the type of item (e.g., multiple-choice, true-false, essay, etc.) with the type of thinking that is needed to respond to it. All types of item formats can be used to assess thinking at both high and low levels depending on the context of the question. For example, an essay question might ask students to “Describe four causes of colon cancer.” On the surface, this looks like a higher-level question, and it could be. However, if students were taught “The four causes of the Colon Cancer were…” verbatim from a text, then this item is really just a low-level recall task. Thus, the thinking level of each item needs to be considered in conjunction with the learning experience involved. In order for teachers to make valid judgments about their students’ thinking and understanding then the thinking level of items needs to match the thinking level of instruction. The Table of Specifications provides a strategy for teachers to improve the validity of the judgments they make about their students from test responses by providing content and response process evidence.
EVIDENCE FOR TEST CONTENT
One approach to gathering evidence of test content for your classroom tests is to consider the amount of actual class time spent on each objective. Things that were discussed longer or in greater detail should appear in greater proportion on your test. This approach is particularly important for subject areas that teach a range of topics across a range of cognitive levels. In a given unit of study, there should be a direct relationship between the amount of class time spent on the objective and the portion of the final assessment testing that objective. If you only spent 10% of the instructional time on an objective, then the objective should only count for 10% of the assessment. A TOS provides a framework for making these decisions.
Table 2 reveals a two-way TOS (labeled A-E) that basically spells out WHAT to be tested (target learning outcomes) and HOW it will be tested (test format) to obtain the information needed. The information in column A is taken directly from the teacher’s lesson plans and curriculum guides. Using a TOS helps teachers to be accountable for the content they teach and the time they allocate to each objective (Nortar et al., 2004). The values in Column B refer to the number of minutes or hours or days spent for each objective while the numbers in Column D represent the total number of test items to be constructed in each objective. Column C is the different cognitive levels based on Bloom’s taxonomy. It contains an easy level (remembering & understanding) objectives, average level (applying & analyzing) objectives, and difficult levels (evaluating & creating) objectives. The percentage allotted in each level is arbitrary however recommended in basic education. Column E refers to the test format recommended for use.
Table 2: Sample Two-Way Table of Specifications for 4th Grade Summative Test in Science: Digestive System
Table 2: Sample Two-Way Table of Specifications for 4th Grade Summative Test in Science: Digestive System
Legend: NOI = number of items; IP = item
placement
|
Basic Steps in Constructing TOS
For example, a teacher would like to give a test which was taught for 10 days. The following steps were recommended:
- Step 1: Determine the objectives to be included in the test (see Table 2, column A)
- Step 2: Determine the number of minutes/hours/days allotted or spent on each objective. For example, out of 10 days, the teacher spent 1-day teaching objective 1...(see Table 2, column B)
- Step 3: Determine the total number of test items. This can be prerogative of the teacher or institutional. In the example below, the teacher decides to make a 20-item test.
- Step 4: Determine the number of items for each objective per level of difficulty, use this formula:
[(total # of items / total # of mins/hrs/days) x allotted time per objective
For example: (see Table 2, column C-NOI)
- Objective 1: (20 / 10) = 2 x 1 = 2
- Objective 3: (20 / 10) = 2 x 2 = 4
- Step 5: Make sure the vertical and horizontal sum of the items are correct.
- Step 6: Write the item test placement (see "IP" in Table 2) for each number of items. Test placement is the location of your test items in your test. Hence for objective 1, the 2 test items allocated for it is found in numbers 1-2 in the test. The Item placement can be arranged in sequence or in random. Numbering can also be done horizontally or vertically. For instance, you may randomly distribute the 3 items in the following locations: 1, 3, 5.
- Step 7: Determine the type of assessment method to be used (see Column E).
In the total of Column B of Table 2, you should note that for this test the teacher has decided to use 20 items. The number of items to include on any given test is a professional decision made by the teacher based on the number of objectives in the unit, his/her understanding of the students, the class time allocated for testing, and the importance of the assessment. Shorter assessments can be valid, provided that the assessment includes ample evidence on which the teacher can base inferences about students’ scores.
Typically, because longer tests can include a more representative sample of the instructional objectives and student performance, they generally allow for more valid inferences. However, this is only true when test items are of good quality. Furthermore, students are more likely to get fatigued with longer tests and perform less well as they move through the test. Therefore, we believe that the ideal test is one that students can complete in the time allotted, with enough time to brainstorm any writing portions, and to check their answers before turning in their completed assessment. McMillan (2007) suggests some rules of thumb in determining how many items are sufficient for a good sampling. A minimum of 10 items is needed to assess each knowledge learning target in a unit but which should represent a good cross-section of difficulty of items. However, if there are more specific learning targets to be tested, at least 5 items would be enough for each one to allow for criterion-referenced interpretation for mastery.
THE TOS IS A TOOL FOR EVERY TEACHER
References
- Crocker, L. and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt Rinehart and Winston
- Miller, M., Linn, R. & Gronlund, N. (2009). Measurement and assessment in teaching, 10th ed. New Jersey: Pearson Education, Inc.
- Russel, M. & Airasian, P. (2012). Classroom assessment: Concepts and applications. Dubuque, Iowa: McGraw Hill
- Notar, C., Zuelke, D., Wilson, J. & Yunker, B. (2004). The table of specifications: insuring accountability in teacher-made tests. Journal of Instructional Psychology, 31(2), 115-129.
- Linn, R. & Gronlund, N. (2000). Measurement and assessment in teaching. 8th ed. Upper Saddle River, New Jersey: Prentice-Hall, Inc.
- McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.
- Wolming, S. & Wikstrom, C. (2010). The concept of validity in theory and practice. Assessment in Education: Principles, Policy & Practice, 17, 117-132.