Thursday, March 21, 2024

Designing Performance Task using the GRASPS model

The students' attainment of the content standard may be evaluated through performance tasks. One model that may be used for this is GRASPS by McTighe and Wiggins (2005). GRASPS is an acronym that provides a structured approach for creating authentic performance tasks that engage students in meaningful learning experiences. Each letter in the acronym represents a key component of the task scenario.
G → real world Goal 
  • it defines the challenge, issue, or problem that students need to address
  • it sets the purpose for the task and guides students toward a specific outcome
R → real world Role 
  • it assigns students a specific position or perspective within a real-life context.
  • it helps students understand their purpose and responsibilities in the task.
  • give students a role that they might be taking in a familiar real-life situation
A → real world Audience 
  • identify the target audience whom students are solving the problem for or creating the product for
  • audience identifies the intended recipients or stakeholders of the student’s work.
  • it ensures that students consider the needs and expectations of a specific audience
S → real world Situation 
  • the situation describes the context or scenario in which the task takes place
  • it provides the context of the situation (contextualizes the challenge), relevant background information, and any additional factors that could impede the resolution of the problem
P → real world Product or Performances 
  • it specifies/explains the product or performance that students need to create or produce and its larger purpose
  • the purpose clarifies why this product or performance matters and how it aligns with real-world applications
S → Standards 
  • the standards and criteria for success outline the expectations and quality indicators for the task
  • they help students understand what constitutes successful completion of the task
  • it also informs students how their work will be assessed by the assumed audience
Example:
  1. Goal: Your goal is to convince the LGU to support your science investigatory project addressing an environmental problem in the community
  2. Role: You are an environmentalist advocating LGU support for a science investigatory project on an environmental problem in the community
  3. Audience: LGU
  4. Situation: The LGU is conducting a search for science investigatory projects that can best address any of the following problems in the community:
    • Fish kill in the river
    • Snail infestation in rice fields
    • Increasing incidence of pulmonary infections among children
  5. Product: A proposed science investigatory project applying the scientific method and addressing environmental problems in the community
  6. Standard: Your proposal will be judged based on the following:
    • Deep understanding of the problem
    • Application of the scientific method
    • Cost-effectiveness of the solution to the problem
In summary, GRASPS encourages educators to design performance tasks that mirror real-world situations, engage students, and connect learning to authentic contexts. By considering these elements, teachers can create meaningful and relevant learning experiences for their students.

Friday, May 22, 2020

Basic Concepts: Assessment, Measurement & Evaluation

Introduction

As shown in figure 1 on the right side, assessment is a cyclic process. It describes that program-level learning outcomes are developed from research and input from stakeholders. These are aligned with the institutional outcomes and mapped to the courses within the program through curriculum mapping. Course learning outcomes are assessed using appropriate tools and criteria. Assessment data are gathered, analyzed, and interpreted. Gaps are identified between desired learning outcomes and actual results. Data-driven action plans are then developed for program improvement. Changes in assessment tools, course materials, instructional methods, course prerequisites, or learning outcomes are effected. Goals and objectives are reviewed and refined in the following evaluation findings. This is called the feedback loop and the cycle begins again.

To understand the assessment process, we must know the answer to these questions: What is measurement, assessment & evaluation? Why do we need to assess? 

What is Measurement? 
    In Science, measurement is a comparison of an unknown quantity to a standard. There are appropriate measuring tools to gather numerical data on variables such as height, mass, time, temperature, among others. In the field of education, measurement is the process of determining the quantity of achievement of learners by means of appropriate measuring instruments. It answers the question “How much” or “How many” is/are achieving the target? For example, Nico’s score of 16 out of 20 items in a completion type quiz in Araling Panlipunan is a measure of his cognitive knowledge on a particular topic. This indicates that he got 80% of the items correctly. This is an objective way of measuring a student’s knowledge of the subject matter. A quantitative measure like a score of 30 out of 50 in a written test does not hold meaning unless interpreted. However, measurement stops once a numerical value is ascribed. Making a value judgment belongs to evaluation. The two types of measurement are objective and subjective. Objective measurement is a measurement that does not depend on the person taking the test. An objective test like multiple-choice is a good example of this. On the other hand, in subjective measurement, the responses rely on the perception of the test taker such as in the essay test.

What is Assessment?
    According to Miller, Linn & Gronlund (2009) assessment is a method utilized to gather information about student performance. Black and William (1998) gave a lengthier definition emphasizing the importance of feedback and signifying its purpose. They stated that assessment pertains to all “activities undertaken by teachers - and by their students in assessing themselves - that provide information to be used to modify the teaching and learning activities in which they are engaged”. It is a process by which information is obtained relative to some know objective or goal. This means that assessment data direct teaching in order to meet the needs of the students. It is an ongoing process aimed at understanding & improving student learning. It should be pointed out, however, that assessment is not just about collecting data. These data are processed, interpreted, and acted upon. 
    Tests are special forms of assessment. However, the term “testing” appears to have a negative connotation among educators and somewhat threatening to learners. Thus, the term “assessment” is preferably used. While a test gives a snapshot of a student’s learning, the assessment provides a bigger and more comprehensive picture. It should now be clear that all tests are assessments but not all assessments are tests.
    There are three interrelated purposes of assessment. Knowledge of these purposes and how they fit in the learning process can result in a more effective classroom assessment.
  • Assessment for Learning (AfL). This pertains to diagnostic and formative assessment tasks that are used to determine learning needs, monitor the academic progress of students during a unit or block of instruction, and guide instruction. Students are given on-going and immediate descriptive feedback concerning their performance. Based on assessment results, teachers can make adjustments when necessary in their teaching methods and strategies to support learning.
  • Assessment as Learning (AaL). This employs tasks that provide students with an opportunity to monitor and further their own learning – to think about their personal learning habits and how they can adjust their learning strategies to achieve their goals. It involves metacognitive processes like reflection and self-regulation to allow students to utilize their strengths and work on their weaknesses by directing and regulating their learning.
  • Assessment of Learning (AoL). This pertains to summative and done at the end of a unit, task, process, or period. Its purpose is to provide evidence of a student’s level of achievement in relation to curricular outcomes.
What is Evaluation? 
    According to Russell and Airasian ( 2012), evaluation is the process of determining the quality of achievement in terms of certain standards. It is also a process that is designed to provide information that will help teachers make judgments about a given situation. This means that assessment data gathered by the teacher have to be interpreted in order to make sound decisions about students and the teaching-learning process. 
Relationship between Evaluation, Test, & Measurement

The illustration at the right displays a graphical relationship among the concepts of measurement, test, and evaluation (Bachman, 1990). It shows that while tests provide quantitative measures, test results may be used for evaluation or otherwise. Likewise, there are non-tests that yield quantitative measures that can be used for evaluative purposes or research. It is clear in the diagram that tests are considered measurements simply because they yield numerical scores. They are forms of assessment because they provide information about the learner and his/her achievement. However, tests comprise only a subset of assessment tools. There are qualitative procedures like observations and interviews that are used in classroom assessment. They add more dimension to evaluation.

Area 1 is an evaluation that does not involve measurements or tests. An example is the use of qualitative descriptions to describe student performance. Observations are non-test procedures that can be used to diagnose learning problems among students. Area 2 refers to non-test measures for evaluation. The ranking used by teachers in assigning grades is an example of a non-test measure for evaluation. Area 3 is where all three converge. Teacher-made tests fall in this region. Area 4 pertains to non-evaluative test measures. Test scores used in correlational studies are examples of these. There had been researches conducted on the relationship of test score and motivation, test scores and family income, etc. Finally, area 5 pertains to non-evaluative non-test measures like assigning numerical codes to responses in a research study. An example would be nominal scales used in labeling educational attainment.

The relationship can be further explained as measurement focuses mainly on quantifying the variable, assessment bring qualitative descriptions and when value judgment is added to these, it becomes evaluation.
For example:












In the example, 65 marks measurement indicator, like above-average performance, identification of an area of improvement comes under assessment, and judgment of his performance in relation to half-yearly examination is evaluation. 

References:
  1. American Journal of Pharmaceutical Education, 69(2), 256-268.
  2. Anderson, H., Moore, D., Anaya, G., & Bird, E. (2005). Student learning outcomes assessment: A component of program assessment.
  3. De Guzman, E. S. & Adamos, J. L. (2015). Assessment of learning 1. QC: Adriana Publishing Co., Inc. 
  4. Krathwohl, D. (2002). A revision of Bloom’s taxonomy: An overview. Theory into Practice, 41(4), 212-218.
  5. McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.

Saturday, April 11, 2020

Selected-response Tests

The selection of item formats is dictated by the instructional outcomes intended to be assessed. There are formats appropriate to measuring knowledge and simple understanding while there are those fit measuring deep understanding. Selected-response formats entail choosing the nearby best or most option to answer a problem. The greatest challenge for this item format is the construction of plausible options or distracters so not one stands out as attractively correct.

Binary Choice or Alternate Form

This type of test generally provides only two options. The table below shows varieties of structure using the alternate form as suggested by Nitko (2001).

    Table 1: Varieties of Binary Choice

All other varieties of binary-choice have prepositions as the item stimulus except for Yes-No type which uses direct questions. The veracity of propositional statements is judged by the students indicating whether they are true or false, correct or incorrect or whether they agree or disagree with the thought or idea expressed. Requiring students to change their answers particularly for statements that are considered to be incorrect undermines the learners 'ability to reason and increases the level of outcome that can be assessed. The ease of constructing binary-choice items makes this a common option when making items, particularly for knowledge level outcomes. Many of the propositions are content-based in nature and teachers can quickly interpret the items correctly. The challenge often lies not only in writing the propositions but also in preparing the key for correction.

Guidelines to construct a good binary-choice item as suggested by McMillan (2007) and Musial, et.al. (2009):


  1. Write the item so that the answer options are consistent with the logic in the sentence. (Align your options with the logic of your proposition, e.g. if after truth or falsehood, better not use yes-no or agree-disagree options).
    • Example: 
      • Poor: Four and 6 are factors of 24.   Yes   No
      • Good: Four and 6 are factors of 24.   Correct   Incorrect
  2. Focus on a single fact or idea in the item. (Adding more than one idea in the statement can make the item ambiguous. One idea may be correct and the other one incorrect).
    • Example:
      • Poor: T   F   Right to suffrage is given to citizens in a democratic country in order to enjoy economic gains.
      • Good: T   F   Citizens in a democratic society have the right of suffrage.
  3. Avoid long sentences. (Unnecessary long and wordy statements obscure the significant idea).
    • Example:
      • Poor: T   F   Criterion-referenced tests are interpreted based on a standard that determines whether students have reached an acceptable level or not.
      • Better: T   F   Standards are used to interpret criterion-referenced tests.
  4. Avoid insignificant or trivial facts or words. (Students commit errors not because they do not know but due to unnecessary facts).
    • Example:
      • T   F   Legumes, beans and nuts should be avoided by people who are suffering from gout whether inherited or not from their parents.
      • Better: T   F   Legumes, beans and nuts should be avoided by people with gout.
  5. Avoid negative statements. (Statements with not or no are confusing to young readers).
    • Example:
      • Poor: T   F   All European nations are not in favor of joining the European Union.
      • Better: T   F   All European nations are in favor of joining the European Union
  6. Avoid inadvertent clues to the answer. (Items using such words as never, always, all the time, all, etc. are most of the time false and are recognized by test-wise students).
    • Example:
      • Poor: T   F   Essay tests are never easy to score.
      • Better: T   F   Essay tests are difficult to score.
  7. Avoid using vague adjectives and adverbs. (Students interpret differently such adjectives and adverbs as typically, usually, occasionally, quite, etc. It often becomes a test of vocabulary when done).
    • Example:
      • Poor: T   F   People from cold countries typically drink wine every day.
      • Better: T   F   People from cold countries are fond of drinking wine.

Multiple-Choice Items

This format is widely used in classroom testing because of its versatility to assess various levels of understanding from the knowledge and simple understanding to deep understanding. McMillan (2007) believes that multiple-choice tests can determine whether students can use reasoning as a skill similar to binary-choice items, and use students' skills in performing problem-solving, decision-making or other reasoning tasks. Cognitively, demanding outcomes involving analysis and evaluation lend themselves to the use of multiple-choice items. This making of this test format may not be as easy as binary-choice. However, its advantage exceeds what binary-choice can offer. Aside from being able to assess various outcome levels, they are easy to score, less susceptible to guessing than alternate-choice and more familiar to students as they often encounter them in different testing events (Musial, et.al., 2009). The MC item stimulus consists of a stem containing the problem in the form of a direct question or an incomplete statement and the options which offer the alternatives from which to select the correct answer.

Guidelines for writing good MC items (McMillan, 2007; Miller, Linn & Gronlund, 2009; Popham, 2011)

  1. All the words of the stem should be relevant to the task. It means stating the problem succinct and clear so students understand what is expected to be answered.
  2. The stem should be meaningful by itself and should fully contain the problem, This should especially be observed when the stem uses an incomplete statement format. 
    • Example: The constitution is ___________.
      • A stem worded this way does not make definite the conceptual knowledge being assessed. One does not know what is being tested. Is it after the definition of the term, its significance or its history? To test whether a stem is effectively worded is to be able to answer it without the distracted. This stem can be improved by changing its format to a direct question or adding more information in the incomplete statement (see sample below). This way the test writer determines what knowledge competence to focus on and what appropriate distracters to use:
        • What does the constitution of an organization provide? (Direct-question format)
        • The constitution of an organization provides ______. (Incomplete-statement format)
  3. The stem should use a question with only one correct or clearly best answer. Ambiguity sets in when the stem allows for more than one best answer. Students will likely base their answers on personal experience instead of on facts. Consider this example. There could be more than one best answer here.
    • Example:
      • Poor: Which product of Thailand makes it economically stable? a. rice   b. dried fruits   c. dairy products   d. ready-to-wear
      • Improved: Which agricultural product of Thailand is most productive for export? a. rice   b. fish   c. fruits   d. vegetables
  4. The stem must express a complete thought.
    • Example:
      • Poor: The poem “The Raven”
        • a) was written by Edgar Alan Poe
        • b) was written by Elizabeth Browning
        • c) Was written by Omar Khayyam
        • d) Was written by Jose Garcia Villa
      • Better: The poem “The Raven” was written by
        • a) Edgar Alan Poe
        • b) Elizabeth Browning
        • c) Omar Khayyam
        • d) Jose Garcia Villa
    • The second example is better than the first since the stem contains a complete thought.
  5. Keep options short while putting most of the concepts in the stem
  6. The stem must contain only the information necessary to make the problem clear. Do not provide unnecessary details or worse provide a clue to the answer.
    • Example 1: The revolution in Phil. to oust President Marcos took place after a snap election in 1986. It happened at the dawn of Feb. 23. When did the revolution take place?
      • a) Before Valentine’s day
      • b) After Valentine’s day
      • c) On Valentine’s day
    • Example 2: When did the People’s Power Revolution in Phil. take place?
      • a) Feb., 1985
      • b) Feb., 1986
      • c) Feb., 1987
    • The first example does not measure knowledge of Philippine History instead, it focuses on knowledge of Valentine's day. Moreover, the stem provided the clue to the answer Feb 23 is after Feb 14. The second example provides a better choice than the previous one.
  7. Avoid negative statements or double-negative statements in the stem. This may confuse the test taker.
    • Example 1: It is not untrue that Magellan discovered the Phil
    • Example 2: It is true that Magellan discovered the Phil
    • The second example is better than the previous since example 1 contains a double-negative statement. 
  8. Make options equally attractive. This means that the correct answer should be made equally plausible as the "distractor" otherwise the answer will stand out like a sore thumb.
    • Example: The author of “The Raven” is
      • a) Jose Garcia Villa
      • b) Edgar Alan Poe
      • c) Genoveva Matute
      • d) Francisco Balagtas
    • In the example, all except letter (b) are Filipino authors. Since the poem is very foreign-sounding to the students the author must be a foreigner.
  9. Use the option "none of these" or "none of the above" only when there is only one correct answer.
  10. Ensure that items do not dwell too much on "knowledge" or rote learning. MC items when properly constructed can elicit high order responses. The example below shows that the item measures both comprehension and analysis.
    • Example: The volume of a sphere is given by v=4/3 (pi) r^3 where r=radius of the sphere? The volume will be:
      • a) Multiplied by a factor of 2
      • b) Multiplied by a factor of 3
      • c) Multiplied by a factor of 4
      • d) Multiplied by a factor of 8
  11. As much as possible avoid using "all of the above" as an option.
  12. All distracters should appear plausible to uninformed test takers. This is the key to making the item discriminating and therefore valid. The validity of the item suffers when there is a distracter that is obviously correct as option D or obviously wrong as option B in the following item.
    • Poor: What is matter?
      • a. Everything the surrounds us
      • b. All things bright and beautiful
      • Things we see and her
      • Anything that occupies space and has mass.
  13. Randomly assign correct answers to alternative positions. Item writers have a tendency to assign the correct answer to the third alternative as they run short of incorrect alternatives. Students then who have been used to taking multiple-choice tests choose option C when guessing for a greater chance of being correct. No deliberate order should be followed in assigning the correct answers (e.g. ABCDABCD or AACCBBDD) for ease in scoring. As much as possible have an equal number of correct answers distributed randomly in each of the distracters.

Ways to make distracters plausible given by Miller, Linn & Gronlund (2009).

  1. Use the students'most common errors
  2. Use important-sounding-words (e.g. significant, accurate) that are relevant to the item stem. But do not overdo it.
  3. Use words that have verbal associations with the item stem (e,g, politician, political)
  4. Use textbook language or other phraseology that has the appearance of truth
  5. Use incorrect answers that are likely to result from student misunderstanding or carelessness (e.g. forgets to convert feet to yards)
  6. Use distracters that are homogenous and similar in content to the correct option (e.g. all are inventors)
  7. Use distracters that are parallel in form and grammatically consistent with the item stem
  8. Make the distracters similar to the correct answer in length, vocabulary, sentence structure, and complexity of thought.
Caution: Distractors should distract the uninformed, but they should not result in trick questions that mislead knowledgeable students (do not insert not in a correct answer to make a distracter).

Varieties:
  1. Single Correct Answer
  2. Best Answer
  3. Negative Stem
  4. Multiple Response
  5. Combined Resources
  6. Stimulus-Material-Stem-Alternatives

Matching Type Test

The match type items may be considered as modified multiple-choice type items. This format consists of two parallel lists of words or phrases the students are tasked to pair. the first list which is to be matched is referred to as premises while the other list from which to choose its match based on a kind of association is the responses. The PREMISES are a list of words or phrases to be matched or associated with an appropriate word while the RESPONSES are a list of homogenous alternatives or options from which to select what will match the premise.

Illustrative Item 1
The first column describes events associated with Philippine presidents while the second column gives their names. In the space provided, write the letter of the president that matches the description.

source: de Guzman & Adamos, 2015
Illustrative Item 2 (for advance level)
Column A contains theoretical populations of how the universe came about. Match each one with the name of the theory given in Column B. Indicate the appropriate letter to the left of the number in Column A.

source: de Guzman & Adamos, 2015

Guidelines in constructing matching items (Kubiszyn and Borich, 2010)

  1. Keep the list of premises and the list of responses homogenous or belonging to a category. In illustration 1, the premises are events associated with Philippine presidents while the responses are all names of presidents. In illustration 2, Column A lists some theories in astronomy about how the universe has evolved and Column B lists the names of the theories. Homogeneity is a basic principle in matching item
  2. Keep the premises always in the first column and the responses in the second column. Since the premises are oftentimes descriptions of events, illustrations of principles, functions or characteristics, they appear longer than the responses which are most of the times are names, categories, objects, and parts. Ordering of the two columns this way saves reading time for the students since they will usually read one long premise once and select the appropriate match from a list of short words. If ordered the opposite way, the students will read short words as the premise then read through long descriptions to look for the correct answer. 
  3. Keep the lists in two columns unequal in number. The basic reason for this is to avoid guessing. The options in Column B are usually more than the premises in Column A. If two lists are equal in number (perfect match), students can strategically resort to wise elimination in finding the rest of the pairs. There are matching items, however, when the options are much less than the premises. This is recommended when the testing ability is to classify. For example, Column A will be a list of 10 animals that are to be classified and Column B could just be 4 categories of mammals. With this format, it is important to mention in the test directions that an option can be used more than once.
  4. Test directions always describe the basis for matching. "Match Column A with Column B" is a no-no in matching type. Describe clearly what is to be found in the two columns, how they are associated and how matching will be done. Invalid scores of students could be due to extraneous factors like the misinterpretation of how matching is to be done, misunderstanding in using given options (e.g. using an option only once when the teacher allows the use of an option more than once) and limiting the number of items to be answered when there are few options given.
  5. Keep the number of premises not more than eight (8) as shown in the two sample items. Fatigue sets in when there are too many items in a set and again, test validity suffers. If an item writer feels that there are many concepts to be tested, dividing them into sets is a better strategy. It is also suggested that a set of matching items should appear on a page only and not be carried on to the next page.
  6. Ambiguous lists should be avoided. This is especially true in the preparation of options for the second column. There should only be one option appropriately associated with a premise unless it is unequivocally mentioned that an option could be used more than once as mentioned in #4. This often occurs when matching events and places or events and names, descriptions and characters. 

Example of Parallel Concepts:

  1. terms and definitions
  2. objects/pictures and labels
  3. symbols and proper names
  4. causes and effects
  5. scenarios and responses
  6. principles and scenarios to which they apply
Some rules of thumb exist for how long it takes most students to answer various types of questions:
  • A true-false test item takes 15 seconds to answer unless the student is asked to provide the correct answer for false questions. Then the time increases to 30-45 seconds.
  • A seven-item matching exercise takes 60-90 seconds.
  • A four response multiple-choice test item that asks for an answer regarding a term, fact, definition, rule or principle (knowledge level item) takes 30 seconds. The same type of test item that is at the application level may take 60 seconds.
  • Any test item format that requires solving a problem, analyzing, synthesizing information or evaluating examples adds 30-60 seconds to a question.
  • Short-answer test items take 30-45 seconds.
  • An essay test takes 60 seconds for each point to be compared and contrasted.
References:
  1. De Guzman, E. & Adamos, J. (2015). Assessment of learning 1. Adriana Pub. Co., Inc:Q.C.
  2. McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.
  3. Miller, M., Linn, R., & Gronlund, N. (2009). Measurement & assessment in teaching, 10th ed. New Jersey: Pearson Education, Inc.
  4. Nitko, A. (2001). Educational assessment of students, 3rd ed. Upper Saddle River, New Jersey: Prentice-Hall, Inc.
  5. Padua, R.N. & Santos, R.G. (1997). Educational Evaluation & Measurement: theory, practice, and application. Katha Publishing: QC.
  6. Santos, R.G. (2007) Assessment of Learning 1. Lorimar:QC
Source: ReadabilityFormulas.com 

Monday, March 23, 2020

Types of Rubrics

What is a Rubric?

A rubric is not a form of assessment but are criteria that cover the essence of a performance that is judged with them. It defines expected performance, standards of quality, levels of accomplishment. It also articulates the gradation of quality for each criterion. It is a guide for assigning scores to alternative assessment products. 

What questions do rubrics answer?

  1. By what criteria should the performance be judged? 
  2. Where should you look and what should you look for to judge a  successful performance? 
  3. What does the range in quality performance look like? 
  4. How do you determine validly, reliably, and fairly what score should be given to a student and what that score means? 
  5. How should the different levels of quality be described and distinguished from one another? 

Types of Rubric

1. General Rubric. This rubric contains criteria that can be used across similar performances or tasks.
Advantage: can use the same rubric across different tasks
Disadvantage: feedback may not be specific enough.
Example: Writing Task in Content Areas

2. Task-Specific Rubric. This type of rubric generally provides specific feedback along several dimensions. It can only be used for a single task
Example: Boat Designing

3. Holistic Rubric. This rubric does not list separate levels of performance for each criterion. It generally provides a single score based on an overall impression of a student’s performance on a task.
Example: Oral Recitation

4. Analytic Rubric. This rubric articulates levels of performance for each criterion. A separate score is given for each criterion considered important for the assessed performance.
Example: Oral Recitation

Steps in creating rubrics




Step 1: show examples of good and not so good work;
Step 2: identify qualities that define good work from the model;
Step 3: describe the best and worst level of quality; 
Step 4: using the agreed criteria and levels of quality, evaluate the models presented in step 1 together with students; 
Step 5: give the student their task and occasionally stop them for self and peer assessment; 
Step 6: always give students time to revise their work based on the feedback they get in step 5; and
Step 7: use the same rubric students used to assess their work

Tips in designing a rubric

  • use clear, precise and concise language
  • specify the levels of quality through the responses such as: “yes,” “yes but,” “no but,” and “no”
Example: Rubric for Evaluating a Scrapbook (Andrade, 2007)

The well-designed rubric includes:

  • performance dimensions that are critical to successful task completion
  • criteria that reflect all the important outcomes of the performance task
  • a rating  scale that provides a usable easily interpreted score
  • criteria that reflect concrete references, in a clear language understandable to students, parent, and other teachers

Wednesday, March 4, 2020

Affective Learning Competencies

Overview

Affect describes a number of non-cognitive variables such as a person's attitude, interests, and values. Student affect is important and teachers can help their students acquire positive attitudes. According to Popham (2003), the reason why it is important to assess affect are:
  1. educators should be interested in assessing affective variables because these variables are excellent predictors of students' future behavior;
  2. teachers should assess affect to remind themselves that there's more to being a successful teacher than helping students obtain high scores on achievement tests;
  3. information regarding students' affect can help teachers teach more effectively on a day-to-day basis
Tanner (2001) posits that aptitudes and attitudes are related to the academic achievement of learners. Information about learners' experiences with a subject or n activity is only part of what is needed as input in order to explain their performance. 

Importance of Affective Targets

Researches have established a clear link between affect and cognitive learning (Omrod, 2004). Students are more proficient in problem-solving if they enjoy what they do. Students who are in a good mood and emotionally involved are more likely to pay attention to information, remember it meaningfully and apply it.

Though the linkage of effect and learning of students has been well-established, there remains very little systematic assessment of affect that is applied in classroom instruction. Motivation and involvement of students in learning activities are affected by students' attitudes toward learning, respect for others, and concern for others. Though these factors are known to teachers, yet most teachers do not utilize any kind of formal affective assessment. Possible reasons are:
  • school routines are organized based on subject areas, and
  • assessment of affective targets is fraught with difficulties
Cognitive subject matter targets are agreed on as desirable for all students. This places affect in a position of importance but still secondary to cognitive learning. It also makes difficult to determine which affective targets are appropriate for all students. It is simply not easy to define attitudes, values, and interests.

Affective Traits & Learning Targets

Positive affective traits and skills are essential for:
  • effective learning;
  • being an involved and productive member of our society;
  • preparing for occupational & vocational satisfaction and productivity (for example work habits, willingness to learn, interpersonal skills);
  • maximizing the motivation to learn at present and in the future;
  • preventing students from dropping out of school
The word affective refers to a variety of traits and dispositions that are different from our knowledge, reasoning, and skills (Hohn, 1995). Technically, this term means the emotions or feelings that one has toward someone or something. Nevertheless, attitudes, values, self-concept, citizenship, and other traits are usually considered to be non-cognitive. Most kinds of student affect involve both emotion and cognitive beliefs. Shown in the table below are different affective traits and its corresponding description:

TRAIT
DESCRIPTION
Attitude
Predisposition to respond favorably or unfavorably to specified situations, concepts, objects, institutions, or persons
Interests
Personal preference for certain kinds of activities
Values
Importance, worth, or usefulness of modes or conduct and end states of existence
Opinions
Beliefs about specific occurrences and situations
Preferences
Desire to select one object over another
Motivation
Desire and willingness to be engaged in behavior including the intensity of involvement
Academic Self-Concept
Self-perception of competence in school and learning
Self-Esteem
Attitudes toward oneself; a degree of self-respect, worthiness, or desirability of self-concept
Locus of Control
Self-perception of whether success and failure is controlled by the student or by external influences
Emotional Development
Growth, change, and awareness of emotions and the ability to regulate emotional expression
Social Relationships
Nature of interpersonal interactions and functioning in a group setting
Altruism
Willingness and propensity to help others
Moral Development
Attainment of ethical principles that guide decision-making and behavior
Classroom Development
Nature of feeling tones and interpersonal relationship in a class

Attitude Targets

McMillan (1980) defines attitude as internal states that influence what students are likely to do. The internal state can in some degree determine positive or negative or favorable or unfavorable reactions toward an object, situation, person, or group of objects, the general environment, or group of persons. It does not refer to behaviors, what a student knows, right or wrong in a moral or ethical sense, or characteristics such as race, age or socioeconomic status.

Forsyth (1999) found out that attitudes consist of the following components:
  • an affective component of positive or negative feelings
  • a cognitive component describing worth or value (thoughts)
  • a behavioral component indicating a willingness or desire to engage in particular actions
The affective component consists of the emotion or feeling associated with an object or a person. A strong and stable attitude is manifested when all three components are consistent. This means that if a student like Science (affective component), the student thinks it is valuable (cognitive component), and reads Science related materials at home (behavioral component), it translates that the student has a very strong positive attitude. On the other hand, it is likely that for many students, these components will contradict one another. For example, a certain student may not like English very much but thinks that English is important. The question is, what would be her attitude towards English? That would depend on what components of the attitude is being measured. If it is only affective component then the attitude would be negative; but if it the cognitive component, it would translate to a positive attitude.

Tuesday, October 8, 2019

Planning the Test

THE TEST DEVELOPMENT PROCESS

The test construction process for classroom use applies similar steps in any instrument as shared by various authorities (Crocker and Algina, 1986; Miller, et. al., 2009; Russel and Airasian, 2012). This is illustrated in figure 1 below:
Figure 1. Test Development Process for Classroom Tests

A. Planning Phase - In this phase, the purpose of the test is identified, learning outcomes are clearly specified, and the table of specifications is prepared to guide the test item construction.

B. Item Construction Phase - This is where items in the test are constructed following the guidelines and item format for the specified learning outcomes of instruction.

C. Review Phase - Prior to the administration of the test, the teacher examines the test items based on the alignment of the content and the behavior component of the instructional competencies. After the administration of the test, the teacher examines the test based on the analysis of students' performance in each item 

WHAT IS A TABLE OF SPECIFICATIONS (TOS)? 

A TOS, sometimes called a test blueprint, is a matrix where rows consist of the specific topic or competencies and columns are the objectives cast in terms of Bloom's Taxonomy. It is a table that helps teachers align objectives, instruction, and assessment (e.g., Notar, Zuelke, Wilson, & Yunker, 2004). This strategy can be used for a variety of assessment methods but is most commonly associated with constructing traditional summative tests. The TOS can help teachers map the amount of class time spent on each objective with the cognitive level at which each objective was taught thereby helping teachers to identify the types of items they need to include on their tests. There are different versions of these tables or blueprints (e.g., Linn & Gronlund, 2000; Mehrens & Lehman, 1973; Nortar et al., 2004), and the one presented here is one that we have found most useful in our own teaching. This tool can be modified to best meet your needs in developing classroom tests.

WHAT IS THE PURPOSE OF A TABLE OF SPECIFICATIONS? 

In order to understand how to best modify a TOS to meet your needs, it is important to understand the goal of this strategy: improving the validity of a teacher’s evaluations based on a given assessment. 
Validity is the degree to which the evaluations or judgments we make as teachers about our students can be trusted based on the quality of evidence we gathered (Wolming & Wilkstrom, 2010). It is important to understand that validity is not a property of the test constructed, but of the inferences, we make based on the information gathered from a test. When we consider whether or not the grades we assign to students are accurate we are questioning the validity of our judgment. When we ask these questions we can look to the kinds of evidence endorsed by researchers and theorists in educational measurement to support the claims we make about our students (AERA, APA, NCME, 1999). For classroom assessments, two sources of validity evidence are essential: evidence based on test content and evidence-based on the response process (APA, AERA, NCME, 1999).
Evidence-based on test content underscores the degree to which a test measures what it is designed to measure (Wolming & Wilkstrom, 2010). This means that your classroom tests must be aligned to the content (subject matter) taught in order for any of your judgments about student understanding and learning to be meaningful. Essentially, with test-content evidence, we are interested in knowing if the measured (tested/assessed) objectives reflect what you claim to have measured.
Response process evidence is the second source of validity evidence that is essential to classroom teachers. Response process evidence is concerned with the alignment of the kinds of thinking required of students during instruction and during assessment (testing) activities.
Sometimes the tests teachers administer have evidence for test content but not the response process. That is, while the content is aligned with instruction the test does not address the content at the same depth or level of meaning that was experienced in class. When students feel that they are being tricked or that the test is overly specific (nit-picky) there is probably an issue related to the response process at play. As test constructors, we need to concern ourselves with evidence of the response process. One way to do this is to consider whether the same kind of thinking is used during class activities and summative assessments. If the class activity focused on memorization then the final test should also focus on memorization and not on a thinking activity that is more advanced.
Table 1 provides two possible test items to assess the understanding of the digestion process. In Table 1, Item 1 assesses whether or not students can identify the organ in the digestion process. Item 2 assesses whether or not students can apply the concepts learned in the digestion process described in the scenario. Thus, these two items require different levels of thinking and understanding of the same content (i.e., recognizing/identifying vs. evaluating/applying). Evidence of the response process ensures that classroom tests assess the level of thinking that was required for students during their instructional experiences.

Table 1: Examples of items assessing different cognitive levels 













LEVELS OF THINKING

There are six levels of thinking as identified by Bloom in the 1950s and these levels were revised by a group of researchers in 2001 (Anderson et al). Thinking that emphasizes recall, memorization, identification, and comprehension, is typically considered to be at a lower level. Higher levels of thinking include processes that require learners to apply, analyze, evaluate, and synthesize.
When considering test items people frequently confuse the type of item (e.g., multiple-choice, true-false, essay, etc.) with the type of thinking that is needed to respond to it. All types of item formats can be used to assess thinking at both high and low levels depending on the context of the question. For example, an essay question might ask students to “Describe four causes of colon cancer.” On the surface, this looks like a higher-level question, and it could be. However, if students were taught “The four causes of the Colon Cancer were…” verbatim from a text, then this item is really just a low-level recall task. Thus, the thinking level of each item needs to be considered in conjunction with the learning experience involved. In order for teachers to make valid judgments about their students’ thinking and understanding then the thinking level of items needs to match the thinking level of instruction. The Table of Specifications provides a strategy for teachers to improve the validity of the judgments they make about their students from test responses by providing content and response process evidence.

EVIDENCE FOR TEST CONTENT

One approach to gathering evidence of test content for your classroom tests is to consider the amount of actual class time spent on each objective. Things that were discussed longer or in greater detail should appear in greater proportion on your test. This approach is particularly important for subject areas that teach a range of topics across a range of cognitive levels. In a given unit of study, there should be a direct relationship between the amount of class time spent on the objective and the portion of the final assessment testing that objective. If you only spent 10% of the instructional time on an objective, then the objective should only count for 10% of the assessment. A TOS provides a framework for making these decisions.
Table 2 reveals a two-way TOS (labeled A-E) that basically spells out WHAT to be tested (target learning outcomes) and HOW it will be tested (test format) to obtain the information needed. The information in column A is taken directly from the teacher’s lesson plans and curriculum guides. Using a TOS helps teachers to be accountable for the content they teach and the time they allocate to each objective (Nortar et al., 2004). The values in Column B refer to the number of minutes or hours or days spent for each objective while the numbers in Column D represent the total number of test items to be constructed in each objective. Column C is the different cognitive levels based on Bloom’s taxonomy. It contains an easy level (remembering & understanding) objectives, average level (applying & analyzing) objectives, and difficult levels (evaluating & creating) objectives. The percentage allotted in each level is arbitrary however recommended in basic education. Column E refers to the test format recommended for use.

Table 2: Sample Two-Way Table of Specifications for 4th Grade Summative Test in Science: Digestive System
Legend: NOI = number of items; IP = item placement

Basic Steps in Constructing TOS

For example, a teacher would like to give a test which was taught for 10 days.  The following steps were recommended:
  • Step 1: Determine the objectives to be included in the test  (see Table 2, column A)
  • Step 2: Determine the number of minutes/hours/days allotted or spent on each objective. For example, out of 10 days, the teacher spent 1-day teaching objective 1...(see Table 2, column B)
  • Step 3: Determine the total number of test items. This can be prerogative of the teacher or institutional. In the example below, the teacher decides to make a 20-item test.
  • Step 4: Determine the number of items for each objective per level of difficulty, use this formula:
[(total # of items / total # of mins/hrs/days) x allotted time per objective
For example: (see Table 2, column C-NOI)
  • Objective 1: (20 / 10) = 2 x 1 = 2
  • Objective 3: (20 / 10) = 2 x 2 = 4 
  • Step 5: Make sure the vertical and horizontal sum of the items are correct.
  • Step 6: Write the item test placement (see "IP" in Table 2) for each number of items. Test placement is the location of your test items in your test. Hence for objective 1, the 2 test items allocated for it is found in numbers 1-2 in the test. The Item placement can be arranged in sequence or in random. Numbering can also be done horizontally or vertically. For instance, you may randomly distribute the 3 items in the following locations: 1, 3, 5.
  • Step 7: Determine the type of assessment method to be used (see Column E).
HOW MANY ITEMS SHOULD BE ON YOUR SUMMATIVE TEST?
In the total of Column B of Table 2, you should note that for this test the teacher has decided to use 20 items. The number of items to include on any given test is a professional decision made by the teacher based on the number of objectives in the unit, his/her understanding of the students, the class time allocated for testing, and the importance of the assessment. Shorter assessments can be valid, provided that the assessment includes ample evidence on which the teacher can base inferences about students’ scores.
Typically, because longer tests can include a more representative sample of the instructional objectives and student performance, they generally allow for more valid inferences. However, this is only true when test items are of good quality. Furthermore, students are more likely to get fatigued with longer tests and perform less well as they move through the test. Therefore, we believe that the ideal test is one that students can complete in the time allotted, with enough time to brainstorm any writing portions, and to check their answers before turning in their completed assessment. McMillan (2007) suggests some rules of thumb in determining how many items are sufficient for a good sampling. A minimum of 10 items is needed to assess each knowledge learning target in a unit but which should represent a good cross-section of difficulty of items. However, if there are more specific learning targets to be tested, at least 5 items would be enough for each one to allow for criterion-referenced interpretation for mastery.

THE TOS IS A TOOL FOR EVERY TEACHER

The cornerstone of classroom assessment practices is the validity of the judgments about students’ learning and knowledge (Wolming & Wilkstrom, 2010). A TOS is one tool that teachers can use to support their professional judgment when creating or selecting a test for use with their students. The TOS can be used in conjunction with lesson and unit planning to help the teacher make clear the connections between planning, instruction, and assessment.

References

  1. Crocker, L. and Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt Rinehart and Winston
  2. Miller, M., Linn, R. & Gronlund, N. (2009). Measurement and assessment in teaching, 10th ed. New Jersey: Pearson Education, Inc.
  3. Russel, M. & Airasian, P. (2012). Classroom assessment: Concepts and applications. Dubuque, Iowa: McGraw Hill
  4. Notar, C., Zuelke, D., Wilson, J. & Yunker, B. (2004). The table of specifications: insuring accountability in teacher-made tests. Journal of Instructional Psychology, 31(2), 115-129.
  5. Linn, R. & Gronlund, N. (2000). Measurement and assessment in teaching. 8th ed. Upper Saddle River, New Jersey: Prentice-Hall, Inc.
  6. McMillan, J. (2007). Classroom assessment: Principle and practice for effective standards-based instruction, 4th ed. USA: Pearson Education, Inc.
  7. Wolming, S. & Wikstrom, C. (2010). The concept of validity in theory and practice. Assessment in Education: Principles, Policy & Practice, 17, 117-132.