Sunday, August 23, 2015

MODULE 6 - ORGANISING ASSESSMENT

1.1          Introduction
The first consideration in test planning is to determine the type of test to be prepared. This will help clarify what is to be measured and will aid in stating the test specifications in such precise terms that test items can be constructed to call forth the desired performance. If the test planning is carefully done, constructing relevant test items is greatly simplified.


1.2          Determining The Purpose Of The Test
Tests can be used in an instructional program to assess entry behavior (placement test), monitor learning progress (formative test), diagnose learning difficulties (diagnostic test) and measure performance at the end of instruction (summative test). Each type of test used, typically requires some modification in test design. Although the specific makeup of any test depends on the particular situation in which it is to be used, it is possible to identify common characteristic of the various test types.

The material in Table 1provides a good general description of the four basic types:

TYPE OF TEST
FUNCTION OF TEST
SAMPLING CONSIDERATIONS
ITEM CHARACTERISTICS

Measure prerequisite entry skills
Include each prerequisite entry behavior
Typically, items are easy
Placement
Determine entry performance on course objectives
Select representative sample of course objectives
Typically, items have a wide range of difficulty
Formative
Provide feedback to students and teacher on learning progress
Include all unit objectives if possible (or those most essential)
Items match difficulty of unit objectives
Diagnostic
Determine causes of recurring learning difficulties
Include sample of tasks based on common sources of learning error
Typically, items are easy and are used to pinpoint specific causes of error
Summative
Assign grades, or certify mastery, at end of instruction
Select representative sample of course objectives
Typically, items have a wide range of difficulty
TABLE 1: Characteristics of Four Types of Knowledge Assessment




Adapted from P.W. Airasian and G.F Madaus, “Functional Types of Student Evaluation,” Measurement and Evaluation in Guidance, 4 (1972): 221-233

Test types we have discussed, but it must be recognized that the categories overlap to some degree. In some instances, a particular test may be designed to serve more than one function. For example, an end-of-unit formative test may be used to provide feedback to students, to pinpoint sources of learning error, and to certify mastery of unit objectives. Similarly, sampling considerations and item characteristics may need to be modified to fit a particular test use or a specific type of instruction. Despite the lack of discrete categories, however, the table highlights the variety of functions that achievement tests can serve and the basic framework for planning knowledge tests that are designed to be of maximum usefulness.


1.3          Identifying And Defining The Intended Learning Outcomes
The learning outcomes measured by a test should faithfully reflect the objectives of instruction. Thus, the first order of business is to identify those instructional objectives that are to be measured by the test and then make certain that they are stated in a manner that is useful for testing. This is easier said than done. It is especially difficult if a clearly defined set of instructional objectives is not available to begin with, as is usually the case. One useful guide for approaching this test is the Taxonomy of Educational Objectives (see Bloom et al., 1956). This is a comprehensive system that classifies objectives within each of three domains: (1) cognitive, (2) affective, and (3) psychomotor. The cognitive domain of the taxonomy is concerned with intellectual outcomes, the affective domain with interests and attitudes, and the psychomotor domain with motor skills (see Gronlund & Linn, 1990, for a summary of each). Since our concern here is with knowledge testing, we shall focus primarily on the cognitive domain.








 

Cognitive Domain of the Taxonomy


Intellectual outcomes in the cognitive domain are divided into two major classes: (1) knowledge and (2) intellectual abilities and skills. These are further subdivided into six main areas as follows:

1. KNOWLEDGE


1.00     KNOWLEDGE (Remembering previously learned material)
1.10               Knowledge of specifics
1.11       Knowledge of terms
1.12       Knowledge of specific facts

1.20               Knowledge of ways and means of dealing with specifics
1.21       Knowledge of conventions
1.22       Knowledge of trends and sequences
1.23       Knowledge of classifications and categories
1.24       Knowledge of criteria
1.25       Knowledge of methodology

1.30               Knowledge of the universals and abstractions in a field
1.31       Knowledge of principles and generalizations
1.32       Knowledge of theories and structures



2. INTELLECTUAL ABILITIES AND SKILLS


2.00     COMPREHENSION (Grasping the meaning of material)
2.10               Translation (Converting from one form to another)
2.20               Interpretation (Explaining or summarizing material)
2.30           Extrapolation (Extending the meaning beyond the data)

3.00     APPLICATION (Using information in concrete situations)

4.00     ANALYSIS (Breaking down material into its parts)
4.10               Analysis of elements (Identifying the parts)
4.20               Analysis of relationships (Identifying the relationship)
4.30               Analysis of organizational principles (Identifying the organization)

5.00     SYNTHESIS (Putting parts together into a whole)
5.10               Production of a unique communication
5.20               Production of a plan or proposed set of operations
5.30           Derivation of a set of abstract relations

6.00     EVALUATION (Judging the value of a thing for a given purpose using definite criteria)
6.10               Judgments in terms of internal evidence
6.20           Judgments in terms of external criteria1

1Reprinted from Benjamin S. Bloom, ed., and others, Taxonomy of Educational Objectives: Cognitive Domain (New York: David McKay Co., Inc., 1956), pp. 201-207. Reprinted with permission of the publisher.

As can be seen in above outline, the outcomes are arranged in order of increasing complexity. They begin with the relatively simple recall of factual information, proceed to the lowest level of understanding (comprehension), and then advance through the increasingly complex levels of application, analysis, synthesis, and evaluation. The subdivisions within each area are also listed in order of increasing complexity. This scheme for classifying student behavior is thus hierarchical: that is, the more complex behaviors include the simpler behaviors listed in the lower categories.

The cognitive domain of the taxonomy is especially useful in planning the achievement test. It focuses on a comprehensive and apparently complete list of mental processes to be considered when identifying learning outcomes, it provides a standard vocabulary for describing and classifying learning outcomes, and it serves as a guide for stating learning outcomes in terms of specific student performance.

Although the cognitive domain of the taxonomy provides a valuable guide for identifying learning outcomes, not all of the areas listed under this domain will be covered in a particular test or even in a particular course. Moreover, the classification scheme is neutral concerning the relative importance of the learning outcomes listed. Thus, it is the instructors who must decide which learning outcomes will guide their teaching and testing, and how much emphasis each outcome will receive. The taxonomy serves merely as a convenient checklist of outcomes that prevents relevant areas of student performance from being overlooked during the planning of an achievement test.


1.4          Stating The General Learning Outcomes
The learning outcomes to be measured by a test are most useful in test construction when they are stated as a terminal performance that is observable. That is, they should clearly indicate the student performance to be demonstrated at the end of the learning experience. The following list of learning outcomes for a unit in the planning of an achievement test illustrates this type of statement. It should be noted that these statements include only objectives that can be tested and that they are stated as general outcomes. Before being used for test construction, each one would need to be further defined in terms of specific learning outcomes.

At the end of this unit on knowledge test planning, the student will demonstrate that he or she:
1.            Knows the meaning of common terms
2.            Knows specific facts about test planning
3.            Knows the basic procedures for planning knowledge test.
4.            Comprehends the relevant principles of testing.
5.            Applies the principles in test planning.

These statements of general learning outcomes have been deliberately kept free of specific course content so that with only slight modification they can be used with various units of study. As we shall see later, the test specifications provide a means of relating intended outcomes to specific subject matter topics.
This list of general outcomes could, of course, be expanded by making the statements more specific, and in some cases it may be desirable to do so. The number of general learning outcomes to use is somewhat arbitrary, but somewhere between 5 and 15 items provide a list that is both useful and manageable. Typically, a shorter list is satisfactory for a unit of study, while a more comprehensive list is needed for summative testing at the end of a course.

1.5          Defining The General Outcomes In Specific Terms
When a satisfactory list of general learning outcomes has been identified and clearly stated, the next step is to list the specific types of student performance that are to be accepted as evidence that the outcomes have been achieved. For example, what specific types of performance will show that a student ‘knows the meaning of common terms’ or ‘comprehends the relevant principles of testing’? For these two areas, the specific learning outcomes may be listed as follows:

  1. Knows the meaning of common terms.
1.1      Identifies the correct definitions of terms.
1.2      Identifies the meaning of terms when used in context.
1.3      Distinguishes between terms on basic  meaning.
1.4      Selects the most appropriate terms when describing testing procedures.

  1. Comprehends the relevant principles of testing
2.1      Describes each principle in his or her own words.
2.2      Matches a specific example to each principle.
2.3      Explains the relevance of each principle to the major steps in test planning.
2.4      Predicts the most probable effect of violating each of the principles.
2.5      Formulates a test plan that is in harmony with the principles.

Note that the terms used to describe the specific learning outcomes indicate student performance that can be demonstrated to an outside observer. That is, they are observable responses that can be called forth by test items. The key terms are listed below to emphasize what is meant by defining learning outcomes in specific performance terms.

Identifies                                              Matches
Distinguishes between                        Explains
Selects                                                Predicts
Describes                                            Formulates

Action verbs such as these indicate precisely what the student is able to do to demonstrate achievement. Such vague and indefinite terms as ‘learns’, ‘sees’, ‘realizes’, and ‘is familiar with’ should be avoided, since they do not clearly indicate the terminal performance to be measured.

TAXONOMY CATEGORIES
SAMPLE VERBS FOR STATING
SPECIFIC LEARNING OUTCOMES
Knowledge
Identifies, names, defines, describes, lists, matches, selects, outlines
Comprehension
Classifies, explains, summarizes, converts, predicts, distinguishes between
Application
Demonstrates, computes, solves, modifies, arranges, operates, relates
Analysis
Differentiates, diagrams, estimates, separates, infers, orders, subdivides
Synthesis
Combines, creates, formulates, designs, composes, constructs, rearranges, revises
Evaluation
Judges, criticizes, compares, justifies, concludes, discriminates, supports

TABLE 2: Illustrative Action Verbs for Defining Objectives in the Cognitive Domain of the Taxonomy

Sample action verbs for stating specific learning outcomes at each level of the cognitive domain of the taxonomy are presented in Table 2. Although certain action verbs may be used at several different levels (e.g. identifies ), the table provides a useful guide for defining intended outcomes in performance terms. For more comprehensive lists of action verbs, see Gronlund and Linn (1990) listed at the end of this chapter.

In defining the general learning outcomes in specific performance terms, it is typically impossible to list all of the relevant types of performance. The proportion that need be listed depends to a large extent on the nature of the test. In planning a test that is to be used to describe which learning tasks a student has mastered (criterion-referenced test), we should like as comprehensive a list as possible. For a test that is used to rank students in order of achievement (norm-referenced test), however, it is usually satisfactory to include a sufficient number of specific types of performance to clarify what the typical student is like who has achieved the intended outcomes.



1.6          Building A Table Of Specifications

(a)          Preparing Table Of Specifications

Preparing a table of specifications involves: (1) selecting the learning outcomes to be tested, (2) outlining the subject matter, and (3) making a two-way chart. The two-way chart describes the sample of items to be included in a test.

i.      Selecting the Learning Outcomes to Be Tested. The learning outcomes for a particular course will depend on the specific nature of the course, the objectives attained in previous courses, the philosophy of the school, the special needs of the students, and a host of other local factors that have a bearing on the instructional program. Despite the variation from course to course, most lists of instructional objectives will include learning outcomes in the following areas: (1) knowledge, (2) intellectual abilities and skills, (3) general skills (laboratory, performance, communications, work-study), and (4) attitudes, interest, and appreciations. It is in the first two areas covered by the cognitive domain of the taxonomy that achievement testing is most useful. Learning outcomes in the other areas are typically evaluated by rating scales, checklists, anecdotal records, inventories, and similar non test evaluation procedures. Thus, the first step is to separate from the list of learning outcomes those that are testable by paper-and-pencil tests. The selected list of learning outcomes should, of course, be defined in specific terms, as described in the previous section. Clarifying the specific types of performance to be called forth by the test will aid in constructing test terms that are most relevant to the intended learning outcomes.

ii.     Outlining the Subject Matter. The stated learning outcomes specify how students are expected to react to the subject matter of a course. Although it is possible to include both the student performance and the specific subject matter the student is to react to in the same statement, it is usually desirable to list them separately. The reason for this is that the student can react in the same way to many different areas of subject matter, and he or she can react in many different ways to the same area of subject matter. For example, when we state that a student can “define a term in his or her own words”, “recall a specific fact”, or “identify an example of a principle”, these types of performance can be applied to almost any area of subject matter. Similarly, in studying the taxonomy of educational objectives, we may expect students merely to recall the categories in it, or we could require them to explain the principles on which it is organized, to summarize its usefulness in test planning, to classify a given set of learning outcomes with it, or to use it in the actual construction of a test. Since particular types of student performance can overlap a variety of subject matter areas, and vice versa, it is more convenient to list each aspect of performance and subject matter separately and then to relate them in the table of specifications.

A.           Role of testing in the instructional process
1.    Instructional decisions and test types
2.    Influence of tests on learning and instruction

B.           Principles of knowledge testing
1.    Relation to instructional objectives.
2.    Representative sampling
3.    Relevance of items to outcomes
4.    Relevance of test to use of results
5.    Reliability of results
6.    Improvement of learning

C.           Planning the test
1.    Determining the purpose of the test
2.    Identifying the intended learning outcomes
3.    Preparing the test specifications
4.    Constructing relevant test items

In using the topics in this book for illustrative purposes, there is no implication that the content outline should be limited to the material in a particular book. An achievement tests is typically designed to measure all of the course content, including that covered in class discussion, outside reading, and other special assignments. Our example here is meant to illustrate the approximate amount of detail and not the source of the topics to be included.

iii.    Making the Two-Way Chart. When the learning outcomes have been selected and clearly defined and the course content outlined, the two-way chart should be prepared. This is called a table of specifications. It relates outcomes to content and indicates the relative weight to be given to each of the various areas. As noted earlier, the purpose of the table is to provide assurance that the test will measure a representative sample of the learning outcomes and the subject matter topics to be measured.

An example of a table of specifications for a norm-referenced summative test on the first two chapters of this book is given in Table 3. Note that only the general learning outcomes relevant to these chapters and only the major subject matter categories have been included. A more detailed table may be desirable for test purposes, but this is sufficient for illustration.

The numbers in each cell of the table indicate the number of test items to be devoted to that area. For example, 15 items in the test will measure knowledge of ‘terms’; 4 of them pertain to the ‘role of tests in instruction’, 4 to ‘principles of testing’, 4 to ‘norm referenced versus criterion referenced’, and 3 to ‘planning the test’. The number of items assigned to each cell is determined by the weight given to each learning outcome and each subject matter area.
A number of factors will enter into assigning relative weights to each learning outcome and each content area. How important is each area in the total learning experience? How much time was devoted to each area during instruction? Which outcomes have the greater retention and transfer value? What relative importance do








OUTCOMES
KNOWS
COMPREHENDS
APPLIES
TOTAL NUMBER
CONTENT
TERMS
FACTS
PROCEDURES
PRINCIPLES
PRINCIPLES
OF ITEM
Role of Tests in Instruction
4
4

2

10
Principles of Testing
4
3
2
6
5
20
Norm Referenced versus Criterion Referenced
4
3
3


10
Planning the Test
3
5
5
2
5
20
Total Number of Items
15
15
10
10
10
60

TABLE 3: Table of Specifications for a Knowledge Test

Curriculum specialists assign to each area? These and similar criteria must be considered. In the final analysis, however, the weights assigned in the table should faithfully reflect the emphasis given to ‘planning the test’ (20 items) as was given to ‘norm referenced versus criterion referenced’ (10 items). Similarly, it is assumed that knowledge outcomes were given approximately two-thirds of the emphasis during instruction (40 items) and that comprehension and application outcomes were each given approximately one-sixth of the total emphasis (10 items each).
In summary, preparing a table of specifications includes the following steps:

1.    Identify the learning outcomes and content areas to be measured by the test
2.    Weigh the learning outcomes and content areas in terms of their relative importance.
3.    Build the table in accordance with these relative weights by distributing the test items proportionately among the relevant cells of the table.

The resulting two-way table indicates the type of test needed to measure the learning outcomes and course content in a balanced manner. Thus, the table of specifications serves the test maker like a blueprint. It specifies the number and the nature of the items in the test, and it hereby provides a guide for item writing.
(b)          Making Performance Assessments

There are a several types of student achievement that cannot be adequately measured with the typical selection-type or supply-type test items. Intended learning outcomes that stress actual performance require that we either judge the effectiveness of the procedure used (e.g., speech making, laboratory skills, motor or physical skills) or judge the product resulting from the performance (e.g., theme, graph, drawing, wood product). In some cases, we may need to observe and judge both the procedure (e.g., correct form in typing) and the product (e.g., typed letter) to obtain a complete assessment of the performance skill. Typically, correct procedure receives greater emphasis at the beginning of instruction, and the quality of the product is stressed after the correct procedure has been sufficiently mastered.

Performance assessment provide a systematic way of evaluating those skill outcomes that cannot be adequately measured by the typical objective or essay test. Skill outcomes are important in many different types of courses. For example, science courses are typically concerned with laboratory skills, mathematics courses are concerned with various types of practical problem-solving skills, English and foreign-language courses are concerned with communication skills, and social studies courses are concerned with such skills as map and graph construction and operating effectively in a group. In addition, skill outcomes are emphasized heavily in art and music courses, industrial education, business education, agricultural education, home economics courses, and physical education. Thus, in most instructional areas performance assessment provides a useful adjunct to the more commonly used paper-and-pencil measures of knowledge. Although measures of knowledge can tell us whether students know what to do in a particular situation, performance assessments are needed to evaluate their actual performance skills.







Specifying the performance outcomes

If the intended learning outcomes have been pre - specified for the instruction, it is simply a matter of selecting those that require the use of performance assessment. If performance outcomes are not available, they should be identified and defined for the areas of performance to be assessed. Performance outcomes commonly use verbs such as ‘identify’, ‘construct’, and ‘demonstrate’ (and their synonyms). A brief description of these verbs and some illustrative objectives for performance outcomes are shown in Table 4.

The specification of performance outcomes typically include a job or task analysis to identify the specific factors that  are most critical in the performance.


















ACTION VERBS
ILLUSTRATIVE
INSTRUCTIONAL OBJECTIVES
IDENTIFY: selects the correct objects, part of the object, procedure, or property (typical verbs: identify, locate, select,)
Select the proper tool.
Identify the parts of a typewriter.
Choose correct laboratory equipment.
Select the most relevant statistical procedure.
Locate an automobile malfunction.
Identify a musical selection.
Identify the experimental equipment
 Need, touch, pick up, mark, describe
Identify a specimen under the microscope.
CONSTRUCT: make a product to fit a given set of specifications (typical verbs: construct, assemble, build, design, draw, make, prepare)
Draw a diagram for an electrical circuit.
Design a pattern for making a dress.
Assemble equipment for an experimental study.
Prepare a circle graph.
Construct a weather map.
Prepare an experimental design.
Build a coffee table.
DEMONSTRATE: performs a set of operations or procedures (typical verbs: demonstrate, drive, measure, operate, perform, repair, set up)
Drive an automobile.
Measure the volume of a liquid.
Operate a filmstrip projector.
Perform a modern dance step.
Repair a malfunctioning TV set.
Set up laboratory equipment.
Demonstrate taking a patient’s temperature.
Demonstrate the procedure for tuning an automobile.
TABLE 4: Typical Action Verbs and Illustrative Instructional Objectives for Performance Outcomes.



PREPARING KNOWLEDGE AND PERFORMANCE ASSESSMENT

2.1       Prepare Knowledge Assessment Document

i.          Multiple-Choice Items

The multiple-choice item can be used to measure knowledge outcomes and various types of complex learning outcomes. The single-item format is probably most widely used for measuring knowledge, comprehension, and application outcomes. The interpretive exercise consisting of a series of multiple-choice items based on introductory material (e.g., paragraph, picture, or graph) is especially useful for measuring analysis, interpretation, and other complex learning outcomes. The interpretive exercise will be described in the following chapter. Here, we confine the discussion to the use of single, independent and multiple-choice items.

a.    Knowledge Items

Knowledge items typically measure the degree to which previously learned material has been remembered. The items focus on the simple recall of information and can be concerned with the measurement of terms, facts, or other specific aspects of knowledge.

            Examples

Outcome: Identifies the meaning of a term
            Reliability means the same as:
*A.       consistency.
  B        relevancy.
  C       representativeness.
  D       usefulness

Outcome: Identifies the order of events.
            What is the first step in constructing an achievement test?
  A        Decide on test length.
*B        Identify the intended learning outcomes.
  C       Prepare a table of specifications.
  D       Select the item types to use.

The wide variety of knowledge outcomes that can be measured with multiple-choice items is best shown by illustrating some of the types of questions that can be asked in various knowledge categories. Sample questions stated as incomplete multiple-choice stems are presented in the accompanying box.

The series of questions shown in the box, of course, provides only a sample of the many possible questions that could be asked. Also the questions are stated in rather general terms. The stems for multiple-choice items need to be more closely related to the specific learning outcome being measured.



1.11        Knowledge of Terminology
What word means the same as _________?
Which statement best defines the term ______?
In this sentence, what is the meaning of the word_______?

1.12        Knowledge of Specific Facts
Where would you find _______?
Who first discovered_______?
What is the name of ________?

1.21           Knowledge of Conventions
What is the correct form for ________?
Which statement indicates correct usage of ______?
Which of the following rules applies to _______?

1.22           Knowledge of Trends and Sequences
Which of the following best describes the trend of _________?
What is the most important cause of ________?
Which of the following indicates the proper order of _______?

1.23           Knowledge of Classifications and Categories
What are the main types of ______?
What are the major classifications of _______?
What are the characteristics of _________?

1.24           Knowledge of Criteria
Which of the following is a criterion for judging _________?
What is the most important criterion for selecting ________?
What criteria are used to classify _________?

1.25           Knowledge of Methodology
What method is used for _________?
What is the best way to ________?
What would be the first step in making _______?

1.31           Knowledge of Principles and Generalizations
Which statement best expresses the principle of _______?
Which statement best summarizes the belief that _______?
Which of the following principles best explains _________?

1.32           Knowledge of Theories and Structures
Which statement is most consistent with the theory of ________?
Which of the following best describes the structure of _______?
What evidence best supports the theory of ________?

*Based on Taxonomy of Educational Objectives


TABLE 5 : Illustrative Knowledge Questions*




b.    Comprehension Items

Comprehension items typically measure at the lowest level of understanding. They determine whether the students have grasped the meaning of the material without requiring them to apply it. Comprehension can be measured by requiring students to respond in various ways but it is important that the items contain some novelty. The following test items illustrate the measurement of common types of learning outcomes at the comprehension level.

Examples

Outcome: Identifies an example of a term.
               Which one of the following statements contains a specific determiner?
  A     America is a continent.
  B     America was discovered in 1492.
*C     America has some big industries.
  D    America’s population is increasing.

Outcome: Interprets the meaning of an idea.
The statement that ‘test reliability is a necessary but not a sufficient condition of test validity’ means that:
  A     a reliable test will have a certain degree of validity.
*B     a valid test will have a certain degree of reliability.
  C    a reliable test may be completely invalid and a valid test completely unreliable.

c.    Application Items

Application items also measure understanding, but typically at a higher level than that of comprehension. Here, the students must demonstrate that they not only grasp the meaning of information but can also apply it to concrete situations that are new to them. Thus, application items determine the extent to which students can transfer their learning and use it effectively in solving new problem. Such items may call for the application of various aspects of knowledge, such as facts, concepts, principles, rules, methods, and theories. Both comprehension and application items are adaptable to practically all areas of subject matter, and they provide the basic means of measuring understanding.

The following examples illustrate the use of multiple-choice items for measuring learning outcomes at the application level.

Examples

Outcomes: Distinguishes between properly stated outcomes.
Which of the following learning outcomes is properly stated in terms of student performance?
  A     Develops  an appreciation of the importance of testing.
*B     Explains the purpose of test specifications.
  C    Learns how to write good test items.
  D    Realizes the importance of validity.
Outcome: Improves defective test items.
Directions: Read the following test item and then indicate the best change to make to improve the item.


Which one of the following types of learning outcomes is most difficult to evaluate objectively?
1          A concept.
2          An application.
3          An appreciation.
4          None of the above.

The best change to make in the previous item would be to:
  A     change the stem to incomplete statement form.
  B     use letters instead of numbers for each alternative.
  C    remove the indefinite articles ‘a’ and ‘an’ from the alternatives.
*D     replace ‘none of the above’ with ‘an interpretation’.

When writing application items, care must be taken to select problems that the students have not encountered previously and therefore cannot solve on the basis of general knowledge alone.

Some of many learning outcomes at the comprehension level that can be measured by multiple-choice items are illustrated by the incomplete questions in the accompanying box.

 


Comprehension Questions
Which of the following is an example of _______?
What is the main thought expressed by ________?
What are the main differences between________?
What are the common characteristics of _______?
Which of the following is another form of ______?
Which of the following best explains ________?
Which of the following best summarizes _______?
Which of the following best illustrates _________?
What do you predict would happen if __________?
What trend do you predict in ________?

Application Questions
Which of the following methods is best for ______?
What steps should be followed in applying _______?
Which situation would require the use of ________?
Which principle would be best for solving ________?
What procedure is best for improving ________?
What procedure is best for constructing ________?
What procedure is best for correcting __________?
Which of the following is the best plan for _______?
Which of the following provides the proper sequence for _______?
What is the most probable effect of _______?

TABLE 6 : Illustrative Comprehension And Application Questions


Rules for writing multiple-choice items.

1.            The multiple-choice item is the most highly regarded and useful selection-type item.
2.            The multiple-choice item consists of a stem and a set of alternative answers (options or choices).
3.            The multiple-choice item can be designed to measure various intended learning outcomes, ranging from simple to complex.
4.            Knowledge items typically measure the simple remembering of material.
5.            Comprehension items measure the extent to which students have grasped the meaning of material.
6.            Application items measure whether students can use information in concrete situations.
7.            Items designed to measure achievement beyond the knowledge level must contain some novelty.
8.            The stem of a multiple-choice item should present a single clearly formulated problem that is related to an important learning outcome.
9.            The intended answer should be correct or clear, as agreed upon by authorities.
10.          The distracters (incorrect alternatives) should be plausible enough to lead the uninformed away from the correct answer.
11.          The items should be written in simple, clear language that is free of nonfunctioning content.
12.          The items should be free of irrelevant sources of difficulty (e.g., ambiguity) that might prevent an informed examinee from answering correctly.
13.          The items should be free of irrelevant clues (e.g. verbs, associations) that might enable an uninformed examinee to answer correctly.
14.          The item format should provide for efficient responding and follow the normal rules of grammar.
15.          The rules of item writing provide a framework for preparing effective multiple-choice items, but experience in item writing may result in modifications to fit particular situations.



ii.        TRUE-FALSE Questions

True-false items are typically used to measure the ability to identify whether statements of fact are correct. The basic format is simply a declarative statement that the student must judge as true or false. There are modifications of this basic form in which the student must respond “yes” or “no”, “agree” or “disagree”, “right” or ‘wrong”, “fact” or “opinion”, and the like. Such variations are usually given the more general name of alternative-response items. In any event, this item type is characterized by the fact that only two responses are possible.

Example
T      *F       True-false items are classified as a supply-type item.

In some cases the student is asked to judge each statement as true or false, and then to change the false statements so that they are true. When this is done, a portion of each statement is underlined to indicate the part that can be changed. In the example given, for instance, the words ‘supply-type’ would be underlined. The key parts of true statements, of course, must also be underlined.

Another variation is the cluster-type true-false format. In this case, a series of items is based on a common stem.

Example
Which of the following terms indicate observable student performance? Circle Y for  yes and N for no.
 
*Y          N       1.         Explains
*Y          N       2.         Identifies
  Y        *N        3.         Learns
*Y          N       4.         Predicts
  Y        *N        5.         Realizes

This item format is especially useful for replacing multiple-choice items that have more than one correct answer. Such items are impossible to score satisfactorily. This is avoided with the cluster-type item because it makes each alternative a separate scoring unit of one point. In our example, the student must record whether each term does or does not indicate observable student performance. Thus, this set of items provides an even better measure of the ‘ability to distinguish between performance and non-performance terms’ than would the single answer multiple-choice item. This is a good illustration of the procedure discussed earlier - that is, starting with multiple-choice items and switching to other item types when more effective measurement will result.

Despite the limitations of the true-false item, there are situations where it should be used. Whenever there are only two possible responses, the true-false item, or some adaptation of it, is likely to provide the most effective measure. Situations of this type include a simple “yes” or “no” response in classifying objects, determining whether a rule does or does not apply, distinguishing fact from opinion, and indicating whether arguments are relevant or irrelevant. As we indicated earlier, the best procedure is to use the true-false, or alternative-response item only when this item type is more appropriate than the multiple-choice form.

iii.       Matching Items

The matching item is simply a variation of the multiple-choice form. A good practice is to switch to the matching format only when it becomes apparent that the same alternatives are being repeated in several multiple-choice items.

Examples

Which test item is least useful for educational diagnosis?
  A     Multiple-choice item
*B     True-false item
  C    Short-answer item.

Which test item measures the greatest variety of learning outcomes?
*A     Multiple-choice item
  B     True-false item
  C    Short-answer item.




Which test item is difficult to score objectively?
  A     Multiple-choice item
  B     True-false item
*C     Short-answer item.

Which test item provides the highest score by guessing?
  A     Multiple-choice item
*B     True-false item
  C    Short-answer item.

By switching to a matching format, we can eliminate the repetition of the alternative answers and present the same items in a more compact form. The matching format consists of a series of stems, called premises, and a series of alternative answers, called responses. These are arranged in columns with directions that set the rules for matching. The following example illustrates how our multiple-choice items can be converted to matching form.

Example
Directions: Column A contains a list of characteristics of test items. On the line to the left of each statement, write the letter of the test item in Column B that best fits the statement. Each response in Column B may be used once, more than once, or not at all.

COLUMN A
COLUMN B
(B)

(A)

(C)

(B)
1.    Least useful for educational diagnosis
2.    Measures greatest variety of learning outcomes
3.    Most difficult to score objectively.
4.    Provides the highest score by guessing.
A.        Multiple-choice item
B.        True-false item
C.       Short-answer item






The conversion to matching item illustrated here is probably the most defensible use of this item type. All too frequently, matching items consist of a disparate collection of premises, each of which has only one or two plausible answers. This can be avoided by starting with multiple-choice items and switching to the matching format only when it provides a more compact and efficient means of measuring the same achievement. In our example, we could have also expanded the item by adding other similar premises and responses.

iv.       The Interpretive Questions

Complex learning outcomes can frequently be more effectively measured by basing a series of test items on a common selection of introductory material. This may be a paragraph, a table, a chart, a graph, a map, or a picture. The test items that follow the introductory material may be designed to call forth any type of intellectual ability or skill that can be measured objectively. This type of exercise is commonly called an interpretive exercise and both multiple-choice items and alternative response items are widely used to measure interpretation of the introductory material.

The following example illustrates the use of multiple-choice items. Note that this item type makes it possible to measure a variety of learning outcomes with the same selection of introductory material. In this particular case, item 1 measures the ability to recognize unstated assumptions, item 2 the ability to identify the meaning of a term, and item 3 the ability to identify relationships.

Example

Directions: Read the following comments a teacher made about testing. Then answer the question that follows the comments by circling the letter of the best answer.

“Student go to school to learn, not to take tests. In addition, tests cannot be used to indicate a student’s absolute level of learning. All tests can do is rank students in order of achievement, and this relative ranking is influenced by guessing, bluffing, and the subjective opinions of the teacher doing the scoring. The teaching-learning process would benefit if we did away with tests and depended on student self-evaluation”.




1.        Which one of the following unstated assumptions is this teacher making?
  A        Students go to school to learn.
  B        Teachers use essay tests primarily.
*C        Tests make no contribution to learning.
  D       Tests do not indicate a student’s absolute level of learning.

2.        Which one of the following types of tests is this teacher primarily talking about?
  A        Diagnostic test.
  B        Formative test.
  C       Pre-test
*D        Summative test.
3.        Which one of the following propositions is most essential to the final conclusion?
*A        Effective self-evaluation does not require the use of tests.
  B        Tests place students in rank order only.
  C       Tests scores are influenced by factors other than achievement.
  D       Students do not go to school to take tests.

The next example uses a modified version of the alternative version of the alternative-response form. This is frequently called a key-type item because a common set of alternatives is used in responding to each question. Note that the key-type item is devoted entirely to the measurement of one learning outcome. In this example, the item measures the ability to recognize warranted and unwarranted inferences.

Example

Direction: Paragraph A contains a description of the testing practices of Mr. Smith, a high school teacher. Read the description and each of the statements that follow it. Mark each statement to indicate the type of INFERENCE that can be drawn about it from the material in the paragraph. Place the appropriate letter in front of each statement using the following KEY:
T—if the statement may be INFERRED as TRUE.
F—if the statement may be INFERRED as UNTRUE
N—if NO ‘INFERRED’ may be drawn about it from the paragraph.

PARAGRAPH A

Approximately one week before a test is to be given, Mr. Smith carefully goes through the textbook and constructs multiple-choice items based on the material in the book. He always uses the exact wording of the textbook for the correct answer so that there will be no question concerning its correctness. He is careful to include some test items from each chapter. After the text is given, he lists the scores from high to low on the blackboard and tells each student his or her score. He does not return the test papers to the students, but he offers to answer any question they might have about the test. He puts the items from each test into a test file, which he is building for future use.

STATEMENTS ON PARAGRAPH A

(T)       1.   Mr. Smith’s tests measure a limited range of learning outcomes.
(F)       2.   Some of Mr. Smith’s test items measure at the understanding level.
(N)       3.   Mr. Smith’s tests measure a balanced sample of subject matter.
(N)       4.   Mr. Smith uses the type of test item that is best for his purpose.
(T)       5.   Students can determine where they rank in the distribution of scores on Mr. Smith’s tests.
(F)       6.   Mr. Smith’s testing practices are likely to motivate students to overcome their weaknesses.




SUMMARY OF POINTS

1.            A good practice is to start with multiple-choice items and switch to other selection-type items when more appropriate.
2.            The true-false, or alternative-response item is appropriate when there are only two possible alternatives.
3.            The true-false item is used primarily to measure knowledge of specific facts, although there are some notable exceptions.
4.            Each true-false statement should contain only one central idea, be concisely stated, be free of clues and irrelevant sources of difficulty, and have an answer on which experts would agree.
5.            Modifications of the true-false item are especially useful for measuring the ability to ‘distinguish between fact and opinion’ and ‘identify cause-effect relations’.
6.            Modifications of the true-false item can be used in interpretive exercises to measure various types of complex learning outcomes.
7.            The matching item is a variation of the multiple-choice form and is appropriate when it provides a more compact and efficient means of measuring the same achievement.
8.            The matching item consists of a list of  premises and a list of the responses to be related to the premises.
9.            A good matching item is based on homogeneous material, contains a brief list of premises and an uneven number of responses (more or less) that can be used more than once, and has the brief responses in the right-hand column.
10.          The directions for a matching item should indicate the basis for matching and that each response can be used more than once.
11.          The interpretive exercise consists of a series of selection-type items based on some type of introductory material (e.g. paragraph, table, chart, graph, map, or picture).
12.          The interpretive exercise uses both multiple-choice and alternative-response items to measure a variety of complex learning outcomes.
13.          The introductory material used in an interpretive exercise must be relevant to the outcomes to be measured, at the proper reading level, and as brief as possible.
14.          The test items used in an interpretive exercise should call for the intended type of interpretation, and the answers to the items should be dependent on the introductory material.
15.          The test items used in an interpretive exercise should be in harmony with the rules for constructing that item type.

v.        Short-Answer Items

The short-answer (or completion) item requires the examinee to supply the appropriate words, numbers, or symbols to answer a question or complete a statement.

Example

What are the incorrect responses in a multiple-choice item called? (Distracters)
The incorrect responses in a multiple-choice item are called distracters.

This item type also includes computational problems and any other simple item form that requires supplying the answer rather than selecting it. Except for its use in computational problems, the short-answer item is used primarily to measure simple knowledge outcomes.

The short-answer item appears to be easy to write and use but there are two major problems in constructing short-answer items. First, it is extremely difficult to phrase the question or incomplete statement so that only one answer is correct. In the example we have noted, for instance, a student might respond with any one of a number of answers that could be defended as appropriate. The student might write “incorrect alternatives”, “wrong answers”, “inappropriate options”, “decoys”, “foils”, or some other equally descriptive response. Second, there is the problem of spelling. If credit is given only when the answer is spelled correctly, the poor spellers will be prevented from showing their true level of achievement and the test scores will become an un-interpretable mixture of knowledge and spelling skill. On the other hand, if attempts are made to ignore spelling during the scoring process, there is still the problem of deciding whether a badly spelled word represents the intended answer. This, of course, introduces an element of subjectivity which tends to make the scores less dependable as measures of achievement.

vi.       Essay Questions

The most notable characteristic of the essay question is the freedom of response it provides. As with the short-answer item, students must produce their own answers. With the essay question, however, they are free to decide how to approach the problem, what factual information to use, how to organize the answer, and what degree of emphasis to give each aspect of the response. Thus, the essay question is especially useful for measuring the ability to organize, integrate, and express ideas.

SELECTION-TYPE ITEMS
ESSAY QUESTIONS
Learning Outcomes Measured
Good for measuring outcomes at the knowledge, comprehension, and application levels of learning; inadequate for organizing and expressing ideas.
Inefficient for measuring knowledge outcomes; best for ability to organize, integrate, and express ideas.
Sampling of Content
The use of a large number of items results in broad coverage which makes representative sampling of content feasible.
The use of a small number of items limits coverage which makes representative sampling of content infeasible.
Preparation of Items
Preparation of good items is difficult and time consuming.
Preparation of good items is difficult but easier than selection-type items.
Scoring
Objective, simple, and highly reliable.
Subjective, difficult, and less reliable.
Factors Distorting Scores
Reading ability and guessing.
Writing ability and bluffing.
Probable Effect on Learning
Encourages students to remember, interpret, and use the ideas of others
Encourages students to organize, integrate, and express their own ideas.
TABLE 7: Summary of Comparison between Selection-Type Items and Essay Questions


SUMMARY OF POINTS

1.            Use supply-type items whenever producing the answer is an essential element in the learning outcome (e.g., defines terms, instead of identifies meaning of terms).
2.            Supply-type items include short-answer items, restricted-response essay, and extended-response essay.
3.            The short-answer item can be answered by a word, number, symbol, or brief phrase.
4.            The short-answer item is limited primarily to measuring simple knowledge outcomes.
5.            Each short-answer item should be so carefully written that there is only one possible answer, the entire item can be read before coming to the answer space, and there are no extraneous clues to the answer,
6.            In scoring short-answer items, give credit for all correct answers and score for spelling separately.
7.            Essay questions are most useful for measuring the ability to organize, integrate, and express ideas.
8.            Essay questions are inefficient for measuring knowledge outcomes because they provide limited sampling, are influenced by extraneous factors (e.g., writing skills, bluffing, grammar, spelling, handwriting), and scoring is subjective and unreliable.
9.            Restricted-response essay questions can be more easily written and scored, but due to limitations on the responses they are less useful for measuring the higher-level outcomes (e.g., integration of diverse material).
10.          Extended-response essay questions provide the freedom to select, organize, and express ideas in the manner that seems most appropriate; therefore, they are especially useful for measuring such outcomes.
11.          Essay questions should be written to measure complex learning outcomes, to present a clear task, and to contain only those restrictions needed to call forth the intended response and provide for adequate scoring.
12.          Essay answers should be scored by focusing on the intended response, by using a model answer or set of criteria as a guide, by scoring question by question, and by ignoring the writer’s identity. If an important decision is to be based on the result, two or more competent scores should be used.


2.2          Prepare Performance Assessment Documents.

Performance assessments can be classified by the type of situation or setting used. The following classification system closely approximates the degree of realism present in the situation and includes the following types: (1) paper-and-pencil performance, (2) identification test, (3) structured performance test, (4) simulated performance, and (5) work sample. Although these categories overlap to some degree, they are useful in describing and illustrating the various approaches used in performance assessment.


i.          Paper and Pencil Performance

Paper-and pencil performance differs from the more traditional paper-and-pencil test by placing greater emphasis on the application of knowledge and skill in a simulated setting. These paper-and-pencil applications might result in desired terminal learning outcomes, or they might serve as an intermediate step to performance that involves a higher degree of realism (for example, the actual use of equipment).

In a number of instances, paper-and-pencil performance can provide a product of educational significance. A source in test construction, for example, might require students to perform activities such as the following:

Construct a set of test specifications for a unit of instruction.
Construct test items that fit a given set of specifications.
Construct a checklist for evaluating an achievement test.

The action verb ‘construct’ is frequently used in paper-and-pencil performance testing. For instance, students might be asked to construct a weather map, bar graph, diagram of an electrical circuit, floor plan, design for an article of clothing, poem, short story, or plan for an experiment. In such cases, the paper-and-pencil product is a result of both knowledge and skill, and it provides a performance measure that is valued in its own right.

In other cases, paper-and-pencil performance might simply provide a first step toward hands-on performance. For example, before using a particular measuring instrument, such as a micrometer, it might be desirable to have students read various settings from pictures of the scale. Although the ability to read the scale is not a sufficient condition for accurate measurement, it is a necessary one. In this instance, paper-and-pencil performance would be favored because it is a more convenient method of testing a group of students. Using paper-and-pencil performance as a precursor to hands-on performance might be favored for other reasons. For example, if the performance is complicated and the equipment is expensive, demonstrating competence on paper-and-pencil situations could avoid subsequent accidents or damage to equipment. Similarly, in the health sciences, skill in diagnosing and prescribing for hypothetical patients could avoid later harm to real patients.


ii.        Identification Test

The identification test includes a wide variety of test situations representing various degrees of realism. In some cases, a student may be asked simply to identify a tool or piece of equipment and to indicate its function. A more complex test situation might present the student with a particular performance task (e.g. locating a short in an electrical circuit) and ask him or her to identify the tools, equipment, and procedures needed in performing the task. An even more complex type of identification test might involve listening to the operation of a malfunctioning identifying the most probable cause of the malfunction.

Although identification tests are widely used in industrial education, they are by no means limited to that area. The biology teacher might have students identify specimens that are placed at various stations around the room, or identify the equipment and procedures needed to conduct a particular experiment. Similarly, chemistry students might be asked to identify ‘unknown’ substances, foreign-language students to identify correct pronunciation, mathematics students to identify correct problem-solving procedures, English students to identify the ‘best expression’ to be used in writing, and social studies students to identify various leadership roles as they are acted out in a group. Identifying correct procedures is also important, of course, in art, music, physical education, and such vocational areas as agriculture, business, education, and home economics.

The identification test is sometimes used as an indirect measure of performance skill. The experienced plumber, for example, is expected to have a broader knowledge of the tools and equipment used in plumbing than the inexperienced plumber. Thus, a tool identification test might be used to eliminate the least skilled in a group of applications for a position as plumber. More commonly, the identification test is used as an instructional device to prepare students for actual performance in real or simulated situations.









iii.       Structured Performance Test

A structured performance test provides for an assessment under standard, controlled conditions. It might involve such things as making prescribed measurements, adjusting a microscope, following safety procedures in starting a machine, or locating a malfunction in electronic equipment. The performance situation is structured and presented in a manner that requires all individuals to respond to the same set of tasks.

The construction of a structured performance test follows somewhat the same pattern used in constructing other types of achievement tests but there are some added complexities. The best situation can seldom be fully controlled and standardized, they typically take more time to prepare and administer, and they are frequently more difficult to score. To increase the likelihood that the test situation will be standard for all individuals, instructions should be used that describe the test situation, the required performance, and the conditions under which the performance is to be demonstrated. Instructions for locating a malfunction in electronic equipment, for example, would typically include the following:


1.            Nature and purpose of the test
2.            Equipment and tools provided
3.            Testing procedure:
a.            Type and condition of equipment
b.            Description of required performance
c.            Time limits and other conditions
4.            Method of judging performance

When using performance tests, it may be desirable to set performance standards that indicate the minimum level of acceptable performance. These might be concerned with accuracy (e.g. measure temperature to the nearest two tenths of a degree), the proper sequencing of step (e.g. adjust a microscope following the proper sequence of steps), total compliance with rules (e.g. check all safety guards before starting a machine), or speed of performance (e.g., locate a malfunction in electronic equipment in three minutes). Some common standards for judging performance are shown in the accompanying box.

Performance standards are, of course, frequently used in combination. A particular performance may require correct form, accuracy, and speed. How much weight to give to each depends on the stage of instruction as well as the nature of the performance. In assessing laboratory measurement skills, for example, correct procedure and accuracy might be stressed early in the instruction.


SOME COMMON STANDARDS FOR JUDGING PERFORMANCE


Type

Rate

Error

Time

Precision

Quantity

Quality
(rating)
Percentage
Correct
Steps
Required
Use of material

Safety

 

Examples


Solve ten ‘addition’ problems in two minutes.
Type 40 words per minute.
No more than two errors per typed page.
Count to 20 in Spanish without error
Set up laboratory equipment in five minutes.
Locate an equipment malfunction in three minutes
Measure a line within one eighth of an inch.
Read a thermometer within two tenths of a degree.
Complete 20 laboratory experiments.
Locate 15 relevant references.
Write a neat, well-spaced business letter.
Demonstrate correct form in diving.
Solve 85 percent of the math problems.
Spell correctly 90 percent of the words in the word list.
Diagnose a motor malfunction in five steps.
Locate a computer error using proper sequence of steps.
Build a bookcase with less than 10 percent waste.
Cut out a dress pattern with less than 10 percent waste.
Check all safety guards before operating machine.
Drive automobile without breaking any safety rules.





speed of performance delayed  until the later stages of instruction. The particular situation might also influence the importance of the dimension. In evaluating typing skill, for example, speed might be stressed in typing routine business letters, whereas accuracy would be emphasized in typing statistical tables for economic reports.


iv.       Simulated Performance

Simulated performance is an attempt to match the performance in a real situation – either in whole or in part. In physical education, for example, swinging a bat at an imaginary ball, shadow boxing, and demonstrating various swimming or tennis strokes are simulated performances. In science, vocational, and business courses, skill activities are frequently designed to simulate portions of actual job performance. In mathematics, the use of calculators in solving lifelike problems represents simulated performance. Similarly, in social studies, student role playing of a jury trial, a city council meeting, or a job interview provides the instructor with opportunities to evaluate the simulated performance of an assigned task. In some cases, specially designed equipment is used for instructional and evaluative purposes. In both driver training and flight training, for example, students are frequently trained and tested on simulators. Such simulators may prevent personal injury or damage to expensive equipment during the early stages of skill development. Simulators are also used in various types of vocational training program.

In some situations, simulated performance testing might be used as the final assessment of a performance skill. This would be the case in assessing student’s laboratory performance in chemistry, for example. In many situations, however, skill in a simulated setting simply indicates readiness to attempt actual performance. The student in driver training who has demonstrated driving skill in the simulator, for example, is now ready to apply his or her skill in the actual operation of an automobile.








v.        Work Sample

Of the various types of performance assessments, the work sample incorporates the highest degree of realism. It requires the student to perform actual tasks that are representative of the total performance to be measured. The sample tasks typically include the most crucial elements of the total performance, and are performed under controlled conditions. In being tested for automobile driving skill, for example, the student is required to drive over a standard course that includes the most common problem situations likely to be encountered in normal driving. The performance on the standard course is then used as evidence of the ability to drive an automobile under typical operating conditions.

Performance assessments in business education and industrial education are frequently of the work-sample type. When students are required to take and transcribe shorthand notes from dictation, type a business letter, or operate a computer in analyzing business data, a work-sample assessment is being employed. Similarly, in industrial education, a work-sample approach is being used when students are required to complete a metal-working or woodworking project that includes all of the steps likely to be encountered in an actual job situation (steps such as designing, ordering materials, and constructing). Still other examples are the operation of machinery, the repair of equipment, and the performance of job-oriented laboratory tasks. The work-sample approach to assessing performance is widely used in occupations involving performance skills, and many of these situations can be duplicated in the school setting.


vi.       Portfolios

To obtain a broader sample of student performance and one that represents more typical behavior, a portfolio of work may be assembled for performance assessment. For example, a portfolio of drawings may be used to evaluate artistic skill. In some cases, a portfolio of all classroom performance products may be assembled and evaluated as a whole.

Questions concerning what should be put in the portfolio and how it should be evaluated depend on the intended learning outcomes and the use to be made of the results. A portfolio of writing samples, for example, may be used to measure particular writing skills in order to evaluate progress and diagnose areas needing improvement. Or, a portfolio of writing samples may be used to provide a comprehensive measure of different types of writing (e.g., letter, essay, fiction) to determine the extent to which writing skills can be applied to various situations. In any event, the objectives must be clear so that plans can be made by using prescribed exercises or by accumulating collections of student’s regular class work over time.

Evaluation of portfolio products typically is based on holistic scoring, analytic scoring, or a combination of the two. Holistic scoring is based on an overall impression of the product rather than a consideration of the individual elements. The global judgment is made by assigning a numerical score to each product. Typically, between 4 and 8 points are used, and an even number of points is favored to avoid a ‘middle dumping ground’. Evaluation consists of quickly examining the product and assigning the number that matches the general impression of the product. In the case of a writing assessment, for example, the reader will read each writing sample quickly for overall impression and place it in one of the piles ranging from 4 to 1. It is assumed that good writing is more than a sum of the individual elements that go into writing and that holistic scoring will capture this total impression of the work.

Analytic scoring requires a judgment for each significant characteristic of the product. In evaluating writing skills, for example, such things as organization, vocabulary, style, ideas, and mechanics might be judged separately. Typically, a checklist or rating scale is used to focus attention on each characteristic and to provide a place for recording judgments.

For most instructional purposes, both holistic and analytical scoring are useful. One gives the global judgment of the product and the other provides diagnostic information useful for improving performance. Where both are used, the global judgment should be made first to keep some specific elements from distorting the general impression of the product.

The use of portfolios for performance assessment includes the following steps:

1.        Decide on the learning outcomes to be assessed and the use to be made of the results.
2.        Determine the nature of the samples of work (e.g., writing, drawing, tapes) and the method of collecting (e.g., prescribed exercises, routine class work).
3.        Prepare exercise that define the performance tasks, or describe the nature of the routine class work to be collected.
4.        Select the method of scoring and prepare the scale instruments for judging performance.

 

 

Example


Performance Question

Marking Scheme
Changing a wheel on a motor car
Serial
Sub task
Yes
No
1
2
3
4

5
6

7
8
9
10

11
12
13
14
15
16
17
18
Stop car on hard, level surface
Apply hand brake
Switch off engine
Position warning triangle approximately 20 metres to rear of car
Remove spare wheel and tools
Place jack in position at nearest jacking point to wheel to be changed
Remove hub cap
Loosen wheel nuts
Jack up wheel to approx 3 cm above the ground
Remove wheel nuts—top one last (and place in hub cap)
Remove wheel
Place spare wheel in position
Replace wheel nuts—top one first
Tighten all nuts diagonally
Lower jack so that wheel rests on ground
Tighten all nuts fully
Replace hub cap
Return tools, warning triangle and replaced wheel to storage compartment

*
*
*











*
*

*

NB
Item indicated * are critical and all must be performed correctly to pass the test





Plan Assessment Session

1.        Test planning should be guided by the purpose of the test and the nature of the learning tasks to be measured.
2.        Test planning should include stating the instructional objectives as intended learning outcomes and defining them in terms of student performance.
3.        Test specifications that describe the set of tasks to be measured should be prepared before writing or selecting test items.
4.        Test specifications typically consist of a two fold table of specifications, but a more limited set of specifications may be useful for formative testing.
5.        The types of test items used in a test should be determined by how directly they measure the intended learning outcomes and how effective they are as measuring instruments.
6.        Each test item should provide a task that matches the student performance described in a specific learning outcome.
7.        The functioning content of test items can be improved by eliminating irrelevant barriers and unintended clues during item writing.
8.        For mastery testing and criterion-referenced interpretation, the difficulty of a test item should match the difficulty of the learning task to be measured.
9.        For survey testing and norm-referenced interpretation, item difficulty may be altered to provide a larger spread of scores, but care must be taken not to introduce irrelevant difficulty (e.g. by using obscure material).
10.      An achievement test should be short enough to permit all students to attempt all items during the testing time available.
11.      A test should contain a sufficient number of test items for each type of interpretation to be made. Interpretations based on fewer than 10 items should be considered highly tentative.
12.      Validity and reliability are the two most important characteristics of achievement testing and should be ‘built in’ during test construction.
13.      An achievement test will provide valid and reliable results if it measures a representative sample of instructionally relevant tasks and provides scores that are relatively free of measurement errors.
14.      Following a general set of guidelines during item writing will result in higher quality items that contribute to the validity and reliability of the test results.



QUESTION 

  1.    Which of the following is an example of performance item?

A.           Construct
B.           Fear
C.           Realize
D.           Think

2.           Specific Learning Outcomes: Identify procedural steps in planning for a test.
Which one of the following steps should be completed first in planning for knowledge assessment?

A.           Select the types of test items to use.
B.           Decide on the length of the test.
C.           Define the intended learning outcomes.
D.           Prepare the test specifications.

3.           Specific Learning Outcomes: Identify examples of properly stated learning outcomes.
Which one of the following learning outcomes is properly stated in performance terms?

A.           Student realizes the importance of tests in teaching.
B.           Student has acquired the basic principles of knowledge testing.
C.           Student demonstrates a desire for more experience in test construction.
D.           Student predicts the most probable effect of violating a test construction principle.

REFERENCES
1.            Calhoun Robinson , Managing The Learning Process in Business Educations , Colonial Press, USA,1992
2.            Roger Buckley & Jim Caple , The Theory & Practice of Training, Leagan Page, London,2000

3.            Norman E. Grenlund, How To Make Achievement Test And Assessment, Allyn & Balm,1993

No comments:

Post a Comment