1.1
Introduction
The first consideration in test planning is to
determine the type of test to be prepared. This will help clarify what is to be
measured and will aid in stating the test specifications in such precise terms
that test items can be constructed to call forth the desired performance. If
the test planning is carefully done, constructing relevant test items is
greatly simplified.
1.2
Determining The Purpose Of The Test
Tests
can be used in an instructional program to assess entry behavior (placement
test), monitor learning progress (formative test), diagnose learning
difficulties (diagnostic test) and measure performance at the end of
instruction (summative test). Each type of test used, typically requires some
modification in test design. Although the specific makeup of any test depends
on the particular situation in which it is to be used, it is possible to
identify common characteristic of the various test types.
The
material in Table 1provides a good general description of the four basic types:
TYPE OF TEST
|
FUNCTION OF TEST
|
SAMPLING CONSIDERATIONS
|
ITEM CHARACTERISTICS
|
|
Measure prerequisite entry skills
|
Include each prerequisite entry behavior
|
Typically, items are easy
|
Placement
|
Determine entry performance on course
objectives
|
Select representative sample of course
objectives
|
Typically, items have a wide range of
difficulty
|
Formative
|
Provide feedback to students and teacher on
learning progress
|
Include all unit objectives if possible (or
those most essential)
|
Items match difficulty of unit objectives
|
Diagnostic
|
Determine causes of recurring learning
difficulties
|
Include sample of tasks based on common
sources of learning error
|
Typically, items are easy and are used to
pinpoint specific causes of error
|
Summative
|
Assign grades, or certify mastery, at end
of instruction
|
Select representative sample of course
objectives
|
Typically, items have a wide range of
difficulty
|
TABLE 1: Characteristics of Four Types
of Knowledge Assessment
Adapted from P.W.
Airasian and G.F Madaus, “Functional Types of Student Evaluation,” Measurement
and Evaluation in Guidance, 4 (1972): 221-233
Test
types we have discussed, but it must be recognized that the categories overlap
to some degree. In some instances, a particular test may be designed to serve
more than one function. For example, an end-of-unit formative test may be used
to provide feedback to students, to pinpoint sources of learning error, and to
certify mastery of unit objectives. Similarly, sampling considerations and item
characteristics may need to be modified to fit a particular test use or a
specific type of instruction. Despite the lack of discrete categories, however,
the table highlights the variety of functions that achievement tests can serve
and the basic framework for planning knowledge tests that are designed to be of
maximum usefulness.
1.3
Identifying And Defining The Intended
Learning Outcomes
The learning outcomes measured by a test should
faithfully reflect the objectives of instruction. Thus, the first order of
business is to identify those instructional objectives that are to be measured
by the test and then make certain that they are stated in a manner that is
useful for testing. This is easier said than done. It is especially difficult
if a clearly defined set of instructional objectives is not available to begin
with, as is usually the case. One useful guide for approaching this test is the
Taxonomy of Educational Objectives (see Bloom et al., 1956). This is a
comprehensive system that classifies objectives within each of three domains:
(1) cognitive, (2) affective, and (3) psychomotor. The cognitive domain of the
taxonomy is concerned with intellectual outcomes, the affective domain with
interests and attitudes, and the psychomotor domain with motor skills (see
Gronlund & Linn, 1990, for a summary of each). Since our concern here is
with knowledge testing, we shall focus primarily on the cognitive domain.
Cognitive
Domain of the Taxonomy
Intellectual outcomes in the cognitive domain
are divided into two major classes: (1) knowledge and (2) intellectual
abilities and skills. These are further subdivided into six main areas as
follows:
1.
KNOWLEDGE
1.00 KNOWLEDGE (Remembering previously learned
material)
1.10
Knowledge of specifics
1.11 Knowledge
of terms
1.12 Knowledge
of specific facts
1.20
Knowledge of ways and means of dealing with
specifics
1.21 Knowledge
of conventions
1.22 Knowledge
of trends and sequences
1.23 Knowledge
of classifications and categories
1.24 Knowledge
of criteria
1.25 Knowledge
of methodology
1.30
Knowledge of the universals and abstractions
in a field
1.31 Knowledge
of principles and generalizations
1.32 Knowledge
of theories and structures
2.
INTELLECTUAL ABILITIES AND SKILLS
2.00 COMPREHENSION
(Grasping the meaning of material)
2.10
Translation (Converting from one form to
another)
2.20
Interpretation (Explaining or summarizing
material)
2.30 Extrapolation
(Extending the meaning beyond the data)
3.00 APPLICATION (Using information in concrete
situations)
4.00 ANALYSIS (Breaking down material into its
parts)
4.10
Analysis of elements (Identifying the parts)
4.20
Analysis of relationships (Identifying the
relationship)
4.30
Analysis of organizational principles
(Identifying the organization)
5.00 SYNTHESIS (Putting parts together into a
whole)
5.10
Production of a unique communication
5.20
Production of a plan or proposed set of
operations
5.30 Derivation
of a set of abstract relations
6.00 EVALUATION (Judging the value of a thing
for a given purpose using definite criteria)
6.10
Judgments in terms of internal evidence
6.20 Judgments
in terms of external criteria1
1Reprinted
from Benjamin S. Bloom, ed., and others, Taxonomy of Educational Objectives:
Cognitive Domain (New York: David McKay Co., Inc., 1956), pp. 201-207.
Reprinted with permission of the publisher.
As can be seen in above outline, the outcomes
are arranged in order of increasing complexity. They begin with the relatively
simple recall of factual information, proceed to the lowest level of
understanding (comprehension), and then advance through the increasingly
complex levels of application, analysis, synthesis, and evaluation. The
subdivisions within each area are also listed in order of increasing
complexity. This scheme for classifying student behavior is thus hierarchical:
that is, the more complex behaviors include the simpler behaviors listed in the
lower categories.
The cognitive domain of the taxonomy is
especially useful in planning the achievement test. It focuses on a
comprehensive and apparently complete list of mental processes to be considered
when identifying learning outcomes, it provides a standard vocabulary for
describing and classifying learning outcomes, and it serves as a guide for
stating learning outcomes in terms of specific student performance.
Although the cognitive domain of the taxonomy
provides a valuable guide for identifying learning outcomes, not all of the
areas listed under this domain will be covered in a particular test or even in
a particular course. Moreover, the classification scheme is neutral concerning
the relative importance of the learning outcomes listed. Thus, it is the
instructors who must decide which learning outcomes will guide their teaching and
testing, and how much emphasis each outcome will receive. The taxonomy serves
merely as a convenient checklist of outcomes that prevents relevant areas of
student performance from being overlooked during the planning of an achievement
test.
1.4
Stating The General Learning Outcomes
The
learning outcomes to be measured by a test are most useful in test construction
when they are stated as a terminal performance that is observable. That
is, they should clearly indicate the student performance to be demonstrated at
the end of the learning experience. The following list of learning outcomes for
a unit in the planning of an achievement test illustrates this type of
statement. It should be noted that these statements include only objectives
that can be tested and that they are stated as general outcomes. Before being
used for test construction, each one would need to be further defined in terms
of specific learning outcomes.
At the end of this unit on
knowledge test planning, the student will demonstrate that he or she:
1.
Knows the meaning of common terms
2.
Knows specific facts about test planning
3.
Knows the basic procedures for planning
knowledge test.
4.
Comprehends the relevant principles of
testing.
5.
Applies the principles in test planning.
These
statements of general learning outcomes have been deliberately kept free of
specific course content so that with only slight modification they can be used
with various units of study. As we shall see later, the test specifications
provide a means of relating intended outcomes to specific subject matter
topics.
This
list of general outcomes could, of course, be expanded by making the statements
more specific, and in some cases it may be desirable to do so. The number of
general learning outcomes to use is somewhat arbitrary, but somewhere between 5
and 15 items provide a list that is both useful and manageable. Typically, a
shorter list is satisfactory for a unit of study, while a more comprehensive
list is needed for summative testing at the end of a course.
1.5
Defining The General Outcomes In
Specific Terms
When
a satisfactory list of general learning outcomes has been identified and
clearly stated, the next step is to list the specific types of student
performance that are to be accepted as evidence that the outcomes have been achieved.
For example, what specific types of performance will show that a student ‘knows
the meaning of common terms’ or ‘comprehends the relevant principles of
testing’? For these two areas, the specific learning outcomes may be listed as
follows:
- Knows
the meaning of common terms.
1.1 Identifies
the correct definitions of terms.
1.2 Identifies
the meaning of terms when used in context.
1.3 Distinguishes
between terms on basic meaning.
1.4 Selects
the most appropriate terms when describing testing procedures.
- Comprehends
the relevant principles of testing
2.1 Describes
each principle in his or her own words.
2.2 Matches
a specific example to each principle.
2.3 Explains
the relevance of each principle to the major steps in test planning.
2.4 Predicts
the most probable effect of violating each of the principles.
2.5 Formulates
a test plan that is in harmony with the principles.
Note that the terms used to describe the specific
learning outcomes indicate student performance that can be demonstrated to an
outside observer. That is, they are observable responses that can be
called forth by test items. The key terms are listed below to emphasize what is
meant by defining learning outcomes in specific performance terms.
Identifies Matches
Distinguishes between Explains
Selects Predicts
Describes Formulates
Action
verbs such as these indicate precisely what the student is able to do to
demonstrate achievement. Such vague and indefinite terms as ‘learns’, ‘sees’,
‘realizes’, and ‘is familiar with’ should be avoided, since they do not clearly
indicate the terminal performance to be measured.
TAXONOMY
CATEGORIES
|
SAMPLE
VERBS FOR STATING
SPECIFIC
LEARNING OUTCOMES
|
Knowledge
|
Identifies, names, defines, describes,
lists, matches, selects, outlines
|
Comprehension
|
Classifies, explains, summarizes, converts,
predicts, distinguishes between
|
Application
|
Demonstrates, computes, solves, modifies,
arranges, operates, relates
|
Analysis
|
Differentiates, diagrams, estimates,
separates, infers, orders, subdivides
|
Synthesis
|
Combines, creates, formulates, designs,
composes, constructs, rearranges, revises
|
Evaluation
|
Judges, criticizes, compares, justifies,
concludes, discriminates, supports
|
TABLE 2: Illustrative Action Verbs for Defining
Objectives in the Cognitive Domain of the Taxonomy
Sample
action verbs for stating specific learning outcomes at each level of the
cognitive domain of the taxonomy are presented in Table 2. Although certain
action verbs may be used at several different levels (e.g. identifies ), the
table provides a useful guide for defining intended outcomes in performance
terms. For more comprehensive lists of action verbs, see Gronlund and Linn
(1990) listed at the end of this chapter.
In defining the general learning outcomes in specific
performance terms, it is typically impossible to list all of the relevant types
of performance. The proportion that need be listed depends to a large extent on
the nature of the test. In planning a test that is to be used to describe
which learning tasks a student has mastered (criterion-referenced test), we should
like as comprehensive a list as possible. For a test that is used to rank
students in order of achievement (norm-referenced test), however, it is usually
satisfactory to include a sufficient number of specific types of performance to
clarify what the typical student is like who has achieved the intended
outcomes.
1.6
Building A Table Of Specifications
(a)
Preparing
Table Of Specifications
Preparing
a table of specifications involves: (1) selecting the learning outcomes to be
tested, (2) outlining the subject matter, and (3) making a two-way chart. The
two-way chart describes the sample of items to be included in a test.
i. Selecting the Learning
Outcomes to Be Tested. The learning outcomes for a particular
course will depend on the specific nature of the course, the objectives
attained in previous courses, the philosophy of the school, the special needs
of the students, and a host of other local factors that have a bearing on the
instructional program. Despite the variation from course to course, most lists
of instructional objectives will include learning outcomes in the following
areas: (1) knowledge, (2) intellectual abilities and skills, (3) general skills
(laboratory, performance, communications, work-study), and (4) attitudes,
interest, and appreciations. It is in the first two areas covered by the
cognitive domain of the taxonomy that achievement testing is most useful.
Learning outcomes in the other areas are typically evaluated by rating scales,
checklists, anecdotal records, inventories, and similar non test evaluation
procedures. Thus, the first step is to separate from the list of learning
outcomes those that are testable by paper-and-pencil tests. The selected list
of learning outcomes should, of course, be defined in specific terms, as
described in the previous section. Clarifying the specific types of performance
to be called forth by the test will aid in constructing test terms that are
most relevant to the intended learning outcomes.
ii. Outlining the Subject Matter. The stated learning
outcomes specify how students are expected to react to the subject matter of a
course. Although it is possible to include both the student performance and the
specific subject matter the student is to react to in the same statement, it is
usually desirable to list them separately. The reason for this is that the
student can react in the same way to many different areas of subject matter,
and he or she can react in many different ways to the same area of subject
matter. For example, when we state that a student can “define a term in his or
her own words”, “recall a specific fact”, or “identify an example of a
principle”, these types of performance can be applied to almost any area of
subject matter. Similarly, in studying the taxonomy of educational objectives,
we may expect students merely to recall the categories in it, or we could
require them to explain the principles on which it is organized, to summarize
its usefulness in test planning, to classify a given set of learning outcomes
with it, or to use it in the actual construction of a test. Since particular
types of student performance can overlap a variety of subject matter areas, and
vice versa, it is more convenient to list each aspect of performance and
subject matter separately and then to relate them in the table of
specifications.
A.
Role
of testing in the instructional process
1. Instructional
decisions and test types
2. Influence
of tests on learning and instruction
B.
Principles
of knowledge testing
1. Relation
to instructional objectives.
2. Representative
sampling
3. Relevance
of items to outcomes
4. Relevance
of test to use of results
5. Reliability
of results
6. Improvement
of learning
C.
Planning
the test
1. Determining
the purpose of the test
2. Identifying
the intended learning outcomes
3. Preparing
the test specifications
4. Constructing
relevant test items
In
using the topics in this book for illustrative purposes, there is no
implication that the content outline should be limited to the material in a
particular book. An achievement tests is typically designed to measure all of
the course content, including that covered in class discussion, outside
reading, and other special assignments. Our example here is meant to illustrate
the approximate amount of detail and not the source of the topics to be
included.
iii. Making the Two-Way Chart. When the learning outcomes
have been selected and clearly defined and the course content outlined, the
two-way chart should be prepared. This is called a table of specifications.
It relates outcomes to content and indicates the relative weight to be given to
each of the various areas. As noted earlier, the purpose of the table is to
provide assurance that the test will measure a representative sample of the
learning outcomes and the subject matter topics to be measured.
An
example of a table of specifications for a norm-referenced summative test on
the first two chapters of this book is given in Table 3. Note that only the
general learning outcomes relevant to these chapters and only the major subject
matter categories have been included. A more detailed table may be desirable
for test purposes, but this is sufficient for illustration.
The
numbers in each cell of the table indicate the number of test items to be
devoted to that area. For example, 15 items in the test will measure knowledge
of ‘terms’; 4 of them pertain to the ‘role of tests in instruction’, 4 to
‘principles of testing’, 4 to ‘norm referenced versus criterion referenced’,
and 3 to ‘planning the test’. The number of items assigned to each cell is
determined by the weight given to each learning outcome and each subject matter
area.
A
number of factors will enter into assigning relative weights to each learning
outcome and each content area. How important is each area in the total learning
experience? How much time was devoted to each area during instruction? Which
outcomes have the greater retention and transfer value? What relative
importance do
OUTCOMES
|
KNOWS
|
COMPREHENDS
|
APPLIES
|
TOTAL NUMBER
|
||
CONTENT
|
TERMS
|
FACTS
|
PROCEDURES
|
PRINCIPLES
|
PRINCIPLES
|
OF ITEM
|
Role
of Tests in Instruction
|
4
|
4
|
|
2
|
|
10
|
Principles
of Testing
|
4
|
3
|
2
|
6
|
5
|
20
|
Norm
Referenced versus Criterion Referenced
|
4
|
3
|
3
|
|
|
10
|
Planning
the Test
|
3
|
5
|
5
|
2
|
5
|
20
|
Total
Number of Items
|
15
|
15
|
10
|
10
|
10
|
60
|
TABLE 3: Table of Specifications for a
Knowledge Test
Curriculum
specialists assign to each area? These and similar criteria must be considered.
In the final analysis, however, the weights assigned in the table should
faithfully reflect the emphasis given to ‘planning the test’ (20 items) as was
given to ‘norm referenced versus criterion referenced’ (10 items). Similarly,
it is assumed that knowledge outcomes were given approximately two-thirds of
the emphasis during instruction (40 items) and that comprehension and
application outcomes were each given approximately one-sixth of the total emphasis
(10 items each).
In
summary, preparing a table of specifications includes the following steps:
1. Identify
the learning outcomes and content areas to be measured by the test
2. Weigh
the learning outcomes and content areas in terms of their relative importance.
3. Build
the table in accordance with these relative weights by distributing the test
items proportionately among the relevant cells of the table.
The
resulting two-way table indicates the type of test needed to measure the
learning outcomes and course content in a balanced manner. Thus, the table of
specifications serves the test maker like a blueprint. It specifies the number
and the nature of the items in the test, and it hereby provides a guide for
item writing.
(b)
Making
Performance Assessments
There
are a several types of student achievement that cannot be adequately measured
with the typical selection-type or supply-type test items. Intended learning
outcomes that stress actual performance require that we either judge the
effectiveness of the procedure used (e.g., speech making, laboratory skills,
motor or physical skills) or judge the product resulting from the performance
(e.g., theme, graph, drawing, wood product). In some cases, we may need to
observe and judge both the procedure (e.g., correct form in typing) and the
product (e.g., typed letter) to obtain a complete assessment of the performance
skill. Typically, correct procedure receives greater emphasis at the beginning
of instruction, and the quality of the product is stressed after the correct
procedure has been sufficiently mastered.
Performance
assessment provide a systematic way of evaluating those skill outcomes that
cannot be adequately measured by the typical objective or essay test. Skill
outcomes are important in many different types of courses. For example, science
courses are typically concerned with laboratory skills, mathematics courses are
concerned with various types of practical problem-solving skills, English and
foreign-language courses are concerned with communication skills, and social
studies courses are concerned with such skills as map and graph construction
and operating effectively in a group. In addition, skill outcomes are
emphasized heavily in art and music courses, industrial education, business
education, agricultural education, home economics courses, and physical
education. Thus, in most instructional areas performance assessment provides a
useful adjunct to the more commonly used paper-and-pencil measures of
knowledge. Although measures of knowledge can tell us whether students know
what to do in a particular situation, performance assessments are needed to
evaluate their actual performance skills.
Specifying the performance outcomes
If the intended learning outcomes have been
pre - specified for the instruction, it is simply a matter of selecting those
that require the use of performance assessment. If performance outcomes are not
available, they should be identified and defined for the areas of performance
to be assessed. Performance outcomes commonly use verbs such as ‘identify’,
‘construct’, and ‘demonstrate’ (and their synonyms). A brief description of
these verbs and some illustrative objectives for performance outcomes are shown
in Table 4.
The specification of performance outcomes
typically include a job or task analysis to identify the specific factors
that are most critical in the
performance.
ACTION
VERBS
|
ILLUSTRATIVE
INSTRUCTIONAL
OBJECTIVES
|
IDENTIFY:
selects the correct objects, part of the object, procedure, or property (typical
verbs: identify, locate, select,)
|
Select the proper tool.
Identify the parts of a typewriter.
Choose correct laboratory equipment.
Select the most relevant statistical procedure.
Locate an automobile malfunction.
Identify a musical selection.
Identify the experimental equipment
Need, touch,
pick up, mark, describe
Identify a specimen under the microscope.
|
CONSTRUCT:
make a product to fit a given set of specifications (typical verbs:
construct, assemble, build, design, draw, make, prepare)
|
Draw a diagram for an electrical circuit.
Design a pattern for making a dress.
Assemble equipment for an experimental study.
Prepare a circle graph.
Construct a weather map.
Prepare an experimental design.
Build a coffee table.
|
DEMONSTRATE:
performs a set of operations or procedures (typical verbs: demonstrate,
drive, measure, operate, perform, repair, set up)
|
Drive an automobile.
Measure the volume of a liquid.
Operate a filmstrip projector.
Perform a modern dance step.
Repair a malfunctioning TV set.
Set up laboratory equipment.
Demonstrate taking a patient’s temperature.
Demonstrate the procedure for tuning an automobile.
|
TABLE 4: Typical Action Verbs and Illustrative
Instructional Objectives for Performance Outcomes.
PREPARING KNOWLEDGE AND PERFORMANCE
ASSESSMENT
2.1 Prepare Knowledge Assessment Document
i.
Multiple-Choice
Items
The
multiple-choice item can be used to measure knowledge outcomes and various
types of complex learning outcomes. The single-item format is probably most
widely used for measuring knowledge, comprehension, and application outcomes.
The interpretive exercise consisting of a series of multiple-choice items based
on introductory material (e.g., paragraph, picture, or graph) is especially
useful for measuring analysis, interpretation, and other complex learning
outcomes. The interpretive exercise will be described in the following chapter.
Here, we confine the discussion to the use of single, independent and
multiple-choice items.
a. Knowledge
Items
Knowledge items typically measure the degree
to which previously learned material has been remembered. The items focus on
the simple recall of information and can be concerned with the measurement of
terms, facts, or other specific aspects of knowledge.
Examples
Outcome:
Identifies the meaning of a term
Reliability means the same as:
*A. consistency.
B relevancy.
C representativeness.
D usefulness
Outcome:
Identifies the order of events.
What is the first step in
constructing an achievement test?
A Decide
on test length.
*B Identify the intended learning outcomes.
C Prepare
a table of specifications.
D Select
the item types to use.
The wide variety of knowledge outcomes that
can be measured with multiple-choice items is best shown by illustrating some
of the types of questions that can be asked in various knowledge categories.
Sample questions stated as incomplete multiple-choice stems are presented in
the accompanying box.
The series of questions shown in the box, of
course, provides only a sample of the many possible questions that could be
asked. Also the questions are stated in rather general terms. The stems for
multiple-choice items need to be more closely related to the specific learning
outcome being measured.
1.11 Knowledge of Terminology
What word means the
same as _________?
Which statement best
defines the term ______?
In this sentence,
what is the meaning of the word_______?
1.12 Knowledge
of Specific Facts
Where would you find
_______?
Who first
discovered_______?
What is the name of
________?
1.21
Knowledge of Conventions
What is the correct
form for ________?
Which statement
indicates correct usage of ______?
Which of the
following rules applies to _______?
1.22
Knowledge of Trends and Sequences
Which of the
following best describes the trend of _________?
What is the most
important cause of ________?
Which of the
following indicates the proper order of _______?
1.23
Knowledge of Classifications and
Categories
What are the main
types of ______?
What are the major
classifications of _______?
What are the
characteristics of _________?
1.24
Knowledge of Criteria
Which of the
following is a criterion for judging _________?
What is the most
important criterion for selecting ________?
What criteria are
used to classify _________?
1.25
Knowledge of Methodology
What method is used
for _________?
What is the best way
to ________?
What would be the
first step in making _______?
1.31
Knowledge of Principles and
Generalizations
Which statement best
expresses the principle of _______?
Which statement best
summarizes the belief that _______?
Which of the
following principles best explains _________?
1.32
Knowledge of Theories and Structures
Which statement is
most consistent with the theory of ________?
Which of the
following best describes the structure of _______?
What evidence best
supports the theory of ________?
*Based on Taxonomy of Educational Objectives
TABLE 5 : Illustrative Knowledge
Questions*
b. Comprehension
Items
Comprehension items
typically measure at the lowest level of understanding. They determine whether
the students have grasped the meaning of the material without requiring them to
apply it. Comprehension can be measured by requiring students to respond in
various ways but it is important that the items contain some novelty.
The following test items illustrate the measurement of common types of learning
outcomes at the comprehension level.
Examples
Outcome: Identifies an
example of a term.
Which one of the following
statements contains a specific determiner?
A America is a continent.
B America was discovered in 1492.
*C America has
some big industries.
D America ’s population is increasing.
Outcome: Interprets the
meaning of an idea.
The statement that
‘test reliability is a necessary but not a sufficient condition of test validity’
means that:
A a
reliable test will have a certain degree of validity.
*B a valid test will have a certain degree of reliability.
C a
reliable test may be completely invalid and a valid test completely unreliable.
c.
Application
Items
Application items also measure understanding,
but typically at a higher level than that of comprehension. Here, the students
must demonstrate that they not only grasp the meaning of information but can
also apply it to concrete situations that are new to them. Thus, application
items determine the extent to which students can transfer their learning and
use it effectively in solving new problem. Such items may call for the
application of various aspects of knowledge, such as facts, concepts,
principles, rules, methods, and theories. Both comprehension and application
items are adaptable to practically all areas of subject matter, and they
provide the basic means of measuring understanding.
The following examples illustrate the use of
multiple-choice items for measuring learning outcomes at the application level.
Examples
Outcomes:
Distinguishes between properly stated outcomes.
Which of the
following learning outcomes is properly stated in terms of student performance?
A Develops an appreciation of the importance of testing.
*B Explains the purpose of test specifications.
C Learns
how to write good test items.
D Realizes
the importance of validity.
Outcome:
Improves defective test items.
Directions:
Read the following test item and then indicate the best change to make to
improve the item.
Which
one of the following types of learning outcomes is most difficult to evaluate
objectively?
1
A concept.
2
An application.
3
An appreciation.
4
None of the above.
The
best change to make in the previous item would be to:
A change
the stem to incomplete statement form.
B use
letters instead of numbers for each alternative.
C remove
the indefinite articles ‘a’ and ‘an’ from the alternatives.
*D replace ‘none of the above’ with ‘an
interpretation’.
When writing application items, care must be taken to
select problems that the students have not encountered previously and therefore
cannot solve on the basis of general knowledge alone.
Some of many learning outcomes at the
comprehension level that can be measured by multiple-choice items are
illustrated by the incomplete questions in the accompanying box.
Comprehension
Questions
Which
of the following is an example of _______?
What
is the main thought expressed by ________?
What
are the main differences between________?
What
are the common characteristics of _______?
Which
of the following is another form of ______?
Which
of the following best explains ________?
Which
of the following best summarizes _______?
Which
of the following best illustrates _________?
What
do you predict would happen if __________?
What
trend do you predict in ________?
Application
Questions
Which
of the following methods is best for ______?
What
steps should be followed in applying _______?
Which
situation would require the use of ________?
Which
principle would be best for solving ________?
What
procedure is best for improving ________?
What
procedure is best for constructing ________?
What
procedure is best for correcting __________?
Which
of the following is the best plan for _______?
Which
of the following provides the proper sequence for _______?
What
is the most probable effect of _______?
TABLE 6 : Illustrative Comprehension And Application Questions
Rules for writing
multiple-choice items.
1.
The multiple-choice item is the most highly
regarded and useful selection-type item.
2.
The multiple-choice item consists of a stem
and a set of alternative answers (options or choices).
3.
The multiple-choice item can be designed to
measure various intended learning outcomes, ranging from simple to complex.
4.
Knowledge items typically measure the simple
remembering of material.
5.
Comprehension items measure the extent to
which students have grasped the meaning of material.
6.
Application items measure whether students
can use information in concrete situations.
7.
Items designed to measure achievement beyond
the knowledge level must contain some novelty.
8.
The stem of a multiple-choice item should
present a single clearly formulated problem that is related to an important
learning outcome.
9.
The intended answer should be correct or
clear, as agreed upon by authorities.
10.
The distracters (incorrect alternatives)
should be plausible enough to lead the uninformed away from the correct answer.
11.
The items should be written in simple, clear
language that is free of nonfunctioning content.
12.
The items should be free of irrelevant
sources of difficulty (e.g., ambiguity) that might prevent an informed examinee
from answering correctly.
13.
The items should be free of irrelevant clues
(e.g. verbs, associations) that might enable an uninformed examinee to answer
correctly.
14.
The item format should provide for efficient
responding and follow the normal rules of grammar.
15.
The rules of item writing provide a framework
for preparing effective multiple-choice items, but experience in item writing
may result in modifications to fit particular situations.
ii.
TRUE-FALSE
Questions
True-false
items are typically used to measure the ability to identify whether statements
of fact are correct. The basic format is simply a declarative statement that
the student must judge as true or false. There are modifications of this basic
form in which the student must respond “yes” or “no”, “agree” or “disagree”,
“right” or ‘wrong”, “fact” or “opinion”, and the like. Such variations are
usually given the more general name of alternative-response items. In
any event, this item type is characterized by the fact that only two responses
are possible.
Example
T *F True-false items are classified as a
supply-type item.
In
some cases the student is asked to judge each statement as true or false, and
then to change the false statements so that they are true. When this is done, a
portion of each statement is underlined to indicate the part that can be
changed. In the example given, for instance, the words ‘supply-type’ would be
underlined. The key parts of true statements, of course, must also be
underlined.
Another
variation is the cluster-type true-false format. In this case, a series of
items is based on a common stem.
Example
Which
of the following terms indicate observable student performance? Circle Y
for yes and N for no.
*Y N 1. Explains
*Y N 2. Identifies
Y *N 3. Learns
*Y N 4. Predicts
Y *N 5. Realizes
This
item format is especially useful for replacing multiple-choice items that have
more than one correct answer. Such items are impossible to score
satisfactorily. This is avoided with the cluster-type item because it makes
each alternative a separate scoring unit of one point. In our example, the
student must record whether each term does or does not indicate observable
student performance. Thus, this set of items provides an even better measure of
the ‘ability to distinguish between performance and non-performance terms’ than
would the single answer multiple-choice item. This is a good illustration of the
procedure discussed earlier - that is, starting with multiple-choice items and
switching to other item types when more effective measurement will result.
Despite
the limitations of the true-false item, there are situations where it should be
used. Whenever there are only two possible responses, the true-false item, or
some adaptation of it, is likely to provide the most effective measure.
Situations of this type include a simple “yes” or “no” response in classifying
objects, determining whether a rule does or does not apply, distinguishing fact
from opinion, and indicating whether arguments are relevant or irrelevant. As
we indicated earlier, the best procedure is to use the true-false, or
alternative-response item only when this item type is more appropriate than the
multiple-choice form.
iii.
Matching
Items
The
matching item is simply a variation of the multiple-choice form. A good
practice is to switch to the matching format only when it becomes apparent that
the same alternatives are being repeated in several multiple-choice items.
Examples
Which
test item is least useful for educational diagnosis?
A Multiple-choice
item
*B True-false item
C Short-answer
item.
Which
test item measures the greatest variety of learning outcomes?
*A Multiple-choice item
B True-false
item
C Short-answer
item.
Which
test item is difficult to score objectively?
A Multiple-choice
item
B True-false
item
*C Short-answer item.
Which
test item provides the highest score by guessing?
A Multiple-choice
item
*B True-false item
C Short-answer
item.
By
switching to a matching format, we can eliminate the repetition of the
alternative answers and present the same items in a more compact form. The
matching format consists of a series of stems, called premises, and a series
of alternative answers, called responses. These are arranged in columns
with directions that set the rules for matching. The following example
illustrates how our multiple-choice items can be converted to matching form.
Example
Directions: Column A contains a list of
characteristics of test items. On the line to the left of each statement, write
the letter of the test item in Column B that best fits the statement. Each
response in Column B may be used once, more than once, or not at all.
COLUMN A
|
COLUMN B
|
|
(B)
(A)
(C)
(B)
|
1.
Least useful for educational diagnosis
2.
Measures greatest variety of learning outcomes
3.
Most difficult to score objectively.
4.
Provides the highest score by guessing.
|
A.
Multiple-choice item
B.
True-false item
C.
Short-answer item
|
The
conversion to matching item illustrated here is probably the most defensible
use of this item type. All too frequently, matching items consist of a
disparate collection of premises, each of which has only one or two plausible
answers. This can be avoided by starting with multiple-choice items and
switching to the matching format only when it provides a more compact and
efficient means of measuring the same achievement. In our example, we could
have also expanded the item by adding other similar premises and responses.
iv.
The
Interpretive Questions
Complex
learning outcomes can frequently be more effectively measured by basing a
series of test items on a common selection of introductory material. This may
be a paragraph, a table, a chart, a graph, a map, or a picture. The test items
that follow the introductory material may be designed to call forth any type of
intellectual ability or skill that can be measured objectively. This type of
exercise is commonly called an interpretive exercise and both
multiple-choice items and alternative response items are widely used to measure
interpretation of the introductory material.
The
following example illustrates the use of multiple-choice items. Note that this
item type makes it possible to measure a variety of learning outcomes with the
same selection of introductory material. In this particular case, item 1
measures the ability to recognize unstated assumptions, item 2 the ability to
identify the meaning of a term, and item 3 the ability to identify
relationships.
Example
Directions: Read the following comments a
teacher made about testing. Then answer the question that follows the comments
by circling the letter of the best answer.
“Student
go to school to learn, not to take tests. In addition, tests cannot be used to
indicate a student’s absolute level of learning. All tests can do is rank
students in order of achievement, and this relative ranking is influenced by
guessing, bluffing, and the subjective opinions of the teacher doing the
scoring. The teaching-learning process would benefit if we did away with tests
and depended on student self-evaluation”.
1.
Which one of the following unstated
assumptions is this teacher making?
A Students
go to school to learn.
B Teachers
use essay tests primarily.
*C Tests make no contribution to learning.
D Tests
do not indicate a student’s absolute level of learning.
2.
Which one of the following types of tests is
this teacher primarily talking about?
A Diagnostic
test.
B Formative
test.
C Pre-test
*D Summative test.
3.
Which one of the following propositions is
most essential to the final conclusion?
*A Effective self-evaluation does not
require the use of tests.
B Tests
place students in rank order only.
C Tests
scores are influenced by factors other than achievement.
D Students
do not go to school to take tests.
The
next example uses a modified version of the alternative version of the
alternative-response form. This is frequently called a key-type item
because a common set of alternatives is used in responding to each question.
Note that the key-type item is devoted entirely to the measurement of one
learning outcome. In this example, the item measures the ability to recognize
warranted and unwarranted inferences.
Example
Direction: Paragraph A
contains a description of the testing practices of Mr. Smith, a high school
teacher. Read the description and each of the statements that follow it. Mark
each statement to indicate the type of INFERENCE that can be drawn about it
from the material in the paragraph. Place the appropriate letter in front of
each statement using the following KEY:
T—if the statement may be INFERRED as TRUE.
F—if the statement may be INFERRED as UNTRUE
N—if NO ‘INFERRED’ may be drawn about it from
the paragraph.
PARAGRAPH
A
Approximately one week before a test is to be
given, Mr. Smith carefully goes through the textbook and constructs
multiple-choice items based on the material in the book. He always uses the
exact wording of the textbook for the correct answer so that there will be no
question concerning its correctness. He is careful to include some test items
from each chapter. After the text is given, he lists the scores from high to
low on the blackboard and tells each student his or her score. He does not
return the test papers to the students, but he offers to answer any question
they might have about the test. He puts the items from each test into a test
file, which he is building for future use.
STATEMENTS ON PARAGRAPH A
(T) 1. Mr. Smith’s tests
measure a limited range of learning outcomes.
(F) 2. Some of Mr. Smith’s
test items measure at the understanding level.
(N) 3. Mr. Smith’s tests
measure a balanced sample of subject matter.
(N) 4. Mr. Smith uses the
type of test item that is best for his purpose.
(T) 5. Students can
determine where they rank in the distribution of scores on Mr. Smith’s tests.
(F) 6. Mr. Smith’s testing
practices are likely to motivate students to overcome their weaknesses.
SUMMARY OF POINTS
1.
A good practice is to start with
multiple-choice items and switch to other selection-type items when more
appropriate.
2.
The true-false, or alternative-response item
is appropriate when there are only two possible alternatives.
3.
The true-false item is used primarily to
measure knowledge of specific facts, although there are some notable
exceptions.
4.
Each true-false statement should contain only
one central idea, be concisely stated, be free of clues and irrelevant sources
of difficulty, and have an answer on which experts would agree.
5.
Modifications of the true-false item are especially
useful for measuring the ability to ‘distinguish between fact and opinion’ and
‘identify cause-effect relations’.
6.
Modifications of the true-false item can be
used in interpretive exercises to measure various types of complex learning
outcomes.
7.
The matching item is a variation of the
multiple-choice form and is appropriate when it provides a more compact and
efficient means of measuring the same achievement.
8.
The matching item consists of a list of premises and a list of the responses
to be related to the premises.
9.
A good matching item is based on homogeneous
material, contains a brief list of premises and an uneven number of responses
(more or less) that can be used more than once, and has the brief responses in
the right-hand column.
10.
The directions for a matching item should
indicate the basis for matching and that each response can be used more than
once.
11.
The interpretive exercise consists of a
series of selection-type items based on some type of introductory material
(e.g. paragraph, table, chart, graph, map, or picture).
12.
The interpretive exercise uses both
multiple-choice and alternative-response items to measure a variety of complex
learning outcomes.
13.
The introductory material used in an
interpretive exercise must be relevant to the outcomes to be measured, at the
proper reading level, and as brief as possible.
14.
The test items used in an interpretive
exercise should call for the intended type of interpretation, and the answers
to the items should be dependent on the introductory material.
15.
The test items used in an interpretive
exercise should be in harmony with the rules for constructing that item type.
v.
Short-Answer Items
The
short-answer (or completion) item requires the examinee to supply the
appropriate words, numbers, or symbols to answer a question or complete a
statement.
Example
What
are the incorrect responses in a multiple-choice item called? (Distracters)
The
incorrect responses in a multiple-choice item are called distracters.
This
item type also includes computational problems and any other simple item form
that requires supplying the answer rather than selecting it. Except for its use
in computational problems, the short-answer item is used primarily to measure
simple knowledge outcomes.
The
short-answer item appears to be easy to write and use but there are two major
problems in constructing short-answer items. First, it is extremely difficult
to phrase the question or incomplete statement so that only one answer is
correct. In the example we have noted, for instance, a student might respond
with any one of a number of answers that could be defended as appropriate. The
student might write “incorrect alternatives”, “wrong answers”, “inappropriate
options”, “decoys”, “foils”, or some other equally descriptive response.
Second, there is the problem of spelling. If credit is given only when the
answer is spelled correctly, the poor spellers will be prevented from showing
their true level of achievement and the test scores will become an
un-interpretable mixture of knowledge and spelling skill. On the other hand, if
attempts are made to ignore spelling during the scoring process, there is still
the problem of deciding whether a badly spelled word represents the intended
answer. This, of course, introduces an element of subjectivity which tends to
make the scores less dependable as measures of achievement.
vi. Essay
Questions
The
most notable characteristic of the essay question is the freedom of response it
provides. As with the short-answer item, students must produce their own
answers. With the essay question, however, they are free to decide how to
approach the problem, what factual information to use, how to organize the
answer, and what degree of emphasis to give each aspect of the response. Thus,
the essay question is especially useful for measuring the ability to organize,
integrate, and express ideas.
|
SELECTION-TYPE ITEMS
|
ESSAY QUESTIONS
|
Learning
Outcomes Measured
|
Good
for measuring outcomes at the knowledge, comprehension, and application
levels of learning; inadequate for organizing and expressing ideas.
|
Inefficient
for measuring knowledge outcomes; best for ability to organize, integrate,
and express ideas.
|
Sampling
of Content
|
The
use of a large number of items results in broad coverage which makes
representative sampling of content feasible.
|
The
use of a small number of items limits coverage which makes representative
sampling of content infeasible.
|
Preparation
of Items
|
Preparation
of good items is difficult and time consuming.
|
Preparation
of good items is difficult but easier than selection-type items.
|
Scoring
|
Objective,
simple, and highly reliable.
|
Subjective,
difficult, and less reliable.
|
Factors
Distorting Scores
|
Reading
ability and guessing.
|
Writing
ability and bluffing.
|
Probable
Effect on Learning
|
Encourages
students to remember, interpret, and use the ideas of others
|
Encourages
students to organize, integrate, and express their own ideas.
|
TABLE 7: Summary of Comparison between
Selection-Type Items and Essay Questions
SUMMARY OF POINTS
1.
Use supply-type items whenever producing the
answer is an essential element in the learning outcome (e.g., defines
terms, instead of identifies meaning of terms).
2.
Supply-type items include short-answer items,
restricted-response essay, and extended-response essay.
3.
The short-answer item can be answered by a
word, number, symbol, or brief phrase.
4.
The short-answer item is limited primarily to
measuring simple knowledge outcomes.
5.
Each short-answer item should be so carefully
written that there is only one possible answer, the entire item can be read
before coming to the answer space, and there are no extraneous clues to the
answer,
6.
In scoring short-answer items, give credit
for all correct answers and score for spelling separately.
7.
Essay questions are most useful for measuring
the ability to organize, integrate, and express ideas.
8.
Essay questions are inefficient for measuring
knowledge outcomes because they provide limited sampling, are influenced by
extraneous factors (e.g., writing skills, bluffing, grammar, spelling,
handwriting), and scoring is subjective and unreliable.
9.
Restricted-response essay questions can be
more easily written and scored, but due to limitations on the responses they
are less useful for measuring the higher-level outcomes (e.g., integration of
diverse material).
10.
Extended-response essay questions provide the
freedom to select, organize, and express ideas in the manner that seems most
appropriate; therefore, they are especially useful for measuring such outcomes.
11.
Essay questions should be written to measure
complex learning outcomes, to present a clear task, and to contain only those
restrictions needed to call forth the intended response and provide for
adequate scoring.
12.
Essay answers should be scored by focusing on
the intended response, by using a model answer or set of criteria as a guide,
by scoring question by question, and by ignoring the writer’s identity. If an
important decision is to be based on the result, two or more competent scores
should be used.
2.2
Prepare Performance Assessment
Documents.
Performance assessments
can be classified by the type of situation or setting used. The following
classification system closely approximates the degree of realism present in the
situation and includes the following types: (1) paper-and-pencil performance,
(2) identification test, (3) structured performance test, (4) simulated
performance, and (5) work sample. Although these categories overlap to some
degree, they are useful in describing and illustrating the various approaches
used in performance assessment.
i.
Paper and Pencil Performance
Paper-and
pencil performance differs from the more traditional paper-and-pencil test by
placing greater emphasis on the application of knowledge and skill in a
simulated setting. These paper-and-pencil applications might result in desired
terminal learning outcomes, or they might serve as an intermediate step to
performance that involves a higher degree of realism (for example, the actual
use of equipment).
In
a number of instances, paper-and-pencil performance can provide a product of
educational significance. A source in test construction, for example, might
require students to perform activities such as the following:
Construct
a set of test specifications for a unit of instruction.
Construct
test items that fit a given set of specifications.
Construct
a checklist for evaluating an achievement test.
The
action verb ‘construct’ is frequently used in paper-and-pencil performance
testing. For instance, students might be asked to construct a weather map, bar
graph, diagram of an electrical circuit, floor plan, design for an article of
clothing, poem, short story, or plan for an experiment. In such cases, the
paper-and-pencil product is a result of both knowledge and skill, and it
provides a performance measure that is valued in its own right.
In
other cases, paper-and-pencil performance might simply provide a first step
toward hands-on performance. For example, before using a particular measuring
instrument, such as a micrometer, it might be desirable to have students read
various settings from pictures of the scale. Although the ability to read the
scale is not a sufficient condition for accurate measurement, it is a necessary
one. In this instance, paper-and-pencil performance would be favored because it
is a more convenient method of testing a group of students. Using
paper-and-pencil performance as a precursor to hands-on performance might be
favored for other reasons. For example, if the performance is complicated and
the equipment is expensive, demonstrating competence on paper-and-pencil situations
could avoid subsequent accidents or damage to equipment. Similarly, in the
health sciences, skill in diagnosing and prescribing for hypothetical patients
could avoid later harm to real patients.
ii.
Identification
Test
The
identification test includes a wide variety of test situations representing
various degrees of realism. In some cases, a student may be asked simply to
identify a tool or piece of equipment and to indicate its function. A more
complex test situation might present the student with a particular performance
task (e.g. locating a short in an electrical circuit) and ask him or her to
identify the tools, equipment, and procedures needed in performing the task. An
even more complex type of identification test might involve listening to the
operation of a malfunctioning identifying the most probable cause of the
malfunction.
Although
identification tests are widely used in industrial education, they are by no
means limited to that area. The biology teacher might have students identify
specimens that are placed at various stations around the room, or identify the
equipment and procedures needed to conduct a particular experiment. Similarly,
chemistry students might be asked to identify ‘unknown’ substances,
foreign-language students to identify correct pronunciation, mathematics
students to identify correct problem-solving procedures, English students to
identify the ‘best expression’ to be used in writing, and social studies
students to identify various leadership roles as they are acted out in a group.
Identifying correct procedures is also important, of course, in art, music,
physical education, and such vocational areas as agriculture, business,
education, and home economics.
The
identification test is sometimes used as an indirect measure of
performance skill. The experienced plumber, for example, is expected to have a
broader knowledge of the tools and equipment used in plumbing than the
inexperienced plumber. Thus, a tool identification test might be used to
eliminate the least skilled in a group of applications for a position as
plumber. More commonly, the identification test is used as an instructional
device to prepare students for actual performance in real or simulated
situations.
iii. Structured Performance Test
A
structured performance test provides for an assessment under standard,
controlled conditions. It might involve such things as making prescribed
measurements, adjusting a microscope, following safety procedures in starting a
machine, or locating a malfunction in electronic equipment. The performance
situation is structured and presented in a manner that requires all individuals
to respond to the same set of tasks.
The
construction of a structured performance test follows somewhat the same pattern
used in constructing other types of achievement tests but there are some added
complexities. The best situation can seldom be fully controlled and
standardized, they typically take more time to prepare and administer, and they
are frequently more difficult to score. To increase the likelihood that the
test situation will be standard for all individuals, instructions should be
used that describe the test situation, the required performance, and the
conditions under which the performance is to be demonstrated. Instructions for
locating a malfunction in electronic equipment, for example, would typically
include the following:
1.
Nature and purpose of the test
2.
Equipment and tools provided
3.
Testing procedure:
a.
Type and condition of equipment
b.
Description of required performance
c.
Time limits and other conditions
4.
Method of judging performance
When
using performance tests, it may be desirable to set performance standards that
indicate the minimum level of acceptable performance. These might be concerned
with accuracy (e.g. measure temperature to the nearest two tenths of a
degree), the proper sequencing of step (e.g. adjust a microscope following
the proper sequence of steps), total compliance with rules (e.g. check all
safety guards before starting a machine), or speed of performance (e.g., locate
a malfunction in electronic equipment in three minutes). Some common
standards for judging performance are shown in the accompanying box.
Performance
standards are, of course, frequently used in combination. A particular
performance may require correct form, accuracy, and speed. How much weight to
give to each depends on the stage of instruction as well as the nature of the
performance. In assessing laboratory measurement skills, for example, correct
procedure and accuracy might be stressed early in the instruction.
SOME COMMON STANDARDS FOR JUDGING
PERFORMANCE
Type
Rate
Error
Time
Precision
Quantity
Quality
(rating)
Percentage
Correct
Steps
Required
Use
of material
Safety
|
Examples
Solve
ten ‘addition’ problems in two minutes.
Type
40 words per minute.
No
more than two errors per typed page.
Count
to 20 in Spanish without error
Set
up laboratory equipment in five minutes.
Locate
an equipment malfunction in three minutes
Measure
a line within one eighth of an inch.
Read
a thermometer within two tenths of a degree.
Complete
20 laboratory experiments.
Locate
15 relevant references.
Write
a neat, well-spaced business letter.
Demonstrate
correct form in diving.
Solve
85 percent of the math problems.
Spell
correctly 90 percent of the words in the word list.
Diagnose
a motor malfunction in five steps.
Locate
a computer error using proper sequence of steps.
Build
a bookcase with less than 10 percent waste.
Cut
out a dress pattern with less than 10 percent waste.
Check
all safety guards before operating machine.
Drive
automobile without breaking any safety rules.
|
speed
of performance delayed until the later
stages of instruction. The particular situation might also influence the
importance of the dimension. In evaluating typing skill, for example, speed
might be stressed in typing routine business letters, whereas accuracy would be
emphasized in typing statistical tables for economic reports.
iv. Simulated
Performance
Simulated
performance is an attempt to match the performance in a real situation – either
in whole or in part. In physical education, for example, swinging a bat at an
imaginary ball, shadow boxing, and demonstrating various swimming or tennis
strokes are simulated performances. In science, vocational, and business
courses, skill activities are frequently designed to simulate portions of
actual job performance. In mathematics, the use of calculators in solving
lifelike problems represents simulated performance. Similarly, in social
studies, student role playing of a jury trial, a city council meeting, or a job
interview provides the instructor with opportunities to evaluate the simulated
performance of an assigned task. In some cases, specially designed equipment is
used for instructional and evaluative purposes. In both driver training and
flight training, for example, students are frequently trained and tested on
simulators. Such simulators may prevent personal injury or damage to expensive
equipment during the early stages of skill development. Simulators are also
used in various types of vocational training program.
In
some situations, simulated performance testing might be used as the final
assessment of a performance skill. This would be the case in assessing
student’s laboratory performance in chemistry, for example. In many situations,
however, skill in a simulated setting simply indicates readiness to attempt
actual performance. The student in driver training who has demonstrated driving
skill in the simulator, for example, is now ready to apply his or her skill in
the actual operation of an automobile.
v.
Work
Sample
Of
the various types of performance assessments, the work sample incorporates the
highest degree of realism. It requires the student to perform actual tasks that
are representative of the total performance to be measured. The sample tasks
typically include the most crucial elements of the total performance, and are
performed under controlled conditions. In being tested for automobile driving
skill, for example, the student is required to drive over a standard course
that includes the most common problem situations likely to be encountered in
normal driving. The performance on the standard course is then used as evidence
of the ability to drive an automobile under typical operating conditions.
Performance
assessments in business education and industrial education are frequently of
the work-sample type. When students are required to take and transcribe
shorthand notes from dictation, type a business letter, or operate a computer
in analyzing business data, a work-sample assessment is being employed.
Similarly, in industrial education, a work-sample approach is being used when
students are required to complete a metal-working or woodworking project that
includes all of the steps likely to be encountered in an actual job situation
(steps such as designing, ordering materials, and constructing). Still other
examples are the operation of machinery, the repair of equipment, and the
performance of job-oriented laboratory tasks. The work-sample approach to
assessing performance is widely used in occupations involving performance
skills, and many of these situations can be duplicated in the school setting.
vi. Portfolios
To
obtain a broader sample of student performance and one that represents more
typical behavior, a portfolio of work may be assembled for performance
assessment. For example, a portfolio of drawings may be used to evaluate
artistic skill. In some cases, a portfolio of all classroom performance
products may be assembled and evaluated as a whole.
Questions
concerning what should be put in the portfolio and how it should be evaluated
depend on the intended learning outcomes and the use to be made of the results.
A portfolio of writing samples, for example, may be used to measure particular
writing skills in order to evaluate progress and diagnose areas needing
improvement. Or, a portfolio of writing samples may be used to provide a
comprehensive measure of different types of writing (e.g., letter, essay,
fiction) to determine the extent to which writing skills can be applied to
various situations. In any event, the objectives must be clear so that plans
can be made by using prescribed exercises or by accumulating collections of
student’s regular class work over time.
Evaluation
of portfolio products typically is based on holistic scoring, analytic scoring,
or a combination of the two. Holistic scoring is based on an overall
impression of the product rather than a consideration of the individual
elements. The global judgment is made by assigning a numerical score to each
product. Typically, between 4 and 8 points are used, and an even number of
points is favored to avoid a ‘middle dumping ground’. Evaluation consists of
quickly examining the product and assigning the number that matches the general
impression of the product. In the case of a writing assessment, for example,
the reader will read each writing sample quickly for overall impression and
place it in one of the piles ranging from 4 to 1. It is assumed that good
writing is more than a sum of the individual elements that go into writing and
that holistic scoring will capture this total impression of the work.
Analytic
scoring requires a judgment for each significant characteristic of the product.
In evaluating writing skills, for example, such things as organization,
vocabulary, style, ideas, and mechanics might be judged separately. Typically,
a checklist or rating scale is used to focus attention on each characteristic
and to provide a place for recording judgments.
For
most instructional purposes, both holistic and analytical scoring are useful.
One gives the global judgment of the product and the other provides diagnostic
information useful for improving performance. Where both are used, the global
judgment should be made first to keep some specific elements from distorting
the general impression of the product.
The
use of portfolios for performance assessment includes the following steps:
1.
Decide on the learning outcomes to be
assessed and the use to be made of the results.
2.
Determine the nature of the samples of work
(e.g., writing, drawing, tapes) and the method of collecting (e.g., prescribed
exercises, routine class work).
3.
Prepare exercise that define the performance
tasks, or describe the nature of the routine class work to be collected.
4.
Select the method of scoring and prepare the
scale instruments for judging performance.
Example
Performance
Question
Marking
Scheme
Changing
a wheel on a motor car
|
|||
Serial
|
Sub
task
|
Yes
|
No
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
Stop
car on hard, level surface
Apply
hand brake
Switch
off engine
Position
warning triangle approximately 20 metres to rear of car
Remove
spare wheel and tools
Place
jack in position at nearest jacking point to wheel to be changed
Remove
hub cap
Loosen
wheel nuts
Jack
up wheel to approx 3 cm above the ground
Remove
wheel nuts—top one last (and place in hub cap)
Remove
wheel
Place
spare wheel in position
Replace
wheel nuts—top one first
Tighten
all nuts diagonally
Lower
jack so that wheel rests on ground
Tighten
all nuts fully
Replace
hub cap
Return
tools, warning triangle and replaced wheel to storage compartment
|
*
*
*
*
*
*
|
|
NB
|
Item
indicated * are critical and all must be performed correctly to pass the test
|
|
|
Plan
Assessment Session
1.
Test planning should be guided by the purpose
of the test and the nature of the learning tasks to be measured.
2.
Test planning should include stating the
instructional objectives as intended learning outcomes and defining them in
terms of student performance.
3.
Test specifications that describe the set of
tasks to be measured should be prepared before writing or selecting test items.
4.
Test specifications typically consist of a
two fold table of specifications, but a more limited set of specifications may
be useful for formative testing.
5.
The types of test items used in a test should
be determined by how directly they measure the intended learning outcomes and
how effective they are as measuring instruments.
6.
Each test item should provide a task that
matches the student performance described in a specific learning outcome.
7.
The functioning content of test items can be
improved by eliminating irrelevant barriers and unintended clues during item
writing.
8.
For mastery testing and criterion-referenced
interpretation, the difficulty of a test item should match the difficulty of
the learning task to be measured.
9.
For survey testing and norm-referenced
interpretation, item difficulty may be altered to provide a larger spread of
scores, but care must be taken not to introduce irrelevant difficulty (e.g. by
using obscure material).
10. An
achievement test should be short enough to permit all students to attempt all
items during the testing time available.
11. A
test should contain a sufficient number of test items for each type of
interpretation to be made. Interpretations based on fewer than 10 items should
be considered highly tentative.
12. Validity
and reliability are the two most important characteristics of achievement
testing and should be ‘built in’ during test construction.
13. An
achievement test will provide valid and reliable results if it measures a
representative sample of instructionally relevant tasks and provides scores
that are relatively free of measurement errors.
14. Following
a general set of guidelines during item writing will result in higher quality
items that contribute to the validity and reliability of the test results.
QUESTION
-
Which of the following is an example of performance item?
A.
Construct
B.
Fear
C.
Realize
D.
Think
2.
Specific Learning Outcomes: Identify
procedural steps in planning for a test.
Which one of the following steps
should be completed first in planning for knowledge assessment?
A.
Select
the types of test items to use.
B.
Decide
on the length of the test.
C.
Define
the intended learning outcomes.
D.
Prepare
the test specifications.
3.
Specific Learning Outcomes: Identify
examples of properly stated learning outcomes.
Which one of the following learning
outcomes is properly stated in performance terms?
A.
Student
realizes the importance of tests in teaching.
B.
Student
has acquired the basic principles of knowledge testing.
C.
Student
demonstrates a desire for more experience in test construction.
D.
Student
predicts the most probable effect of violating a test construction principle.
REFERENCES
1.
Calhoun
Robinson , Managing The Learning Process in Business Educations ,
Colonial Press, USA,1992
2.
Roger
Buckley & Jim Caple , The Theory & Practice of Training, Leagan
Page, London,2000
3.
Norman
E. Grenlund, How To Make Achievement Test And Assessment, Allyn &
Balm,1993
No comments:
Post a Comment