1.1
Evaluating
Knowledge Assessment Result.
Definition
of evaluation
It is the process
of making a decision about student learning. Evaluation requires us to make a
judgement about trainee knowledge and performance. The measurement or data you
gain from assessment helps you make the decision on evaluation.
Diagnostic
|
Formative
|
Summative
|
||||
Formal
|
Informal
|
Formal
|
Informal
|
Formal
|
Informal
|
|
Standardized
test
Pre
- tests
Inquiry
Questionnaires
|
Observations
Discussions
|
Checklist
Quizzes
Question
-
answers
Assignments
Standardized
tests
Classroom
tests
|
Journals
Observations
Question-
answers
Trainee
comments
Assignments
|
Inquiry
Work
Projects
Standardized
tests
Classroom
tests
|
Discussion
Observations
Work
Projek
Trainee
feedback
|
|
Figure 1: Framework of Evaluation
a.
Multiple
–choice items are suitable for various types of complex learning. It is most
appropriate for measuring interpretation of learning. Checklist can be used to
check and assess multiple-choice items. Guidelines to help instructors assess
multiple choice items are provided as in figure
2 below:
1.
Is
this type of item appropriate for measuring the intended learning outcome?
2.
Does
the item task match the learning task to be measured?
3.
Does
the stem of the item present a single, clearly formulated problem?
4.
Is
the stem stated in simple, clear language?
5.
Is
the stem worded so that there is no repetition of material in the
alternatives?
6.
Is
the stem stated in positive form wherever possible?
7.
If
negative wording is used in the stem, is it emphasized (by underlining or
caps)?
8.
Is
the intended answer correct or clearly the best?
9.
Are
all alternatives grammatically consistent with the stem and parallel in form?
10.
Are
the alternatives free from verbal clues to the correct answer?
11.
Are
the distracters plausible and attractive to the uninformed?
12.
To
eliminate length as a clue, is the relative length of the correct answer
varied?
13.
Has
the alternative all of the above been avoided and none of the above used only
when appropriate?
14.
Is
the position of the correct answer varied so that there is no detectable
pattern?
15.
Does
the item format and grammar usage provide for efficient test taking?
|
Figure
2: Checklist for Evaluating Multiple-Choice Items
b.
Guidelines
to check comprehension and application items need to be evaluated, based on the
constructed questions items. Below are guidelines for instructors to construct
assessment/test questions according to the above knowledge assessments. Refer
to the following checklist (figure 3).
Comprehension Question
Which
of the following is an example of ____________?
What
is the main thought expressed by _____________?
What
are the main differences between _____________?
What
are the common characteristics of _____________?
Which
of the following is another form of ____________?
Which
of the following best explains ____________?
Which
of the following best summarizes __________?
Which
of the following best illustrates ____________?
What
do you predict would happen if ___________?
What
trend do you predict in ___________?
Application Question
Which
of the following methods is best for _____________?
What
steps should be followed in applying _____________?
Which
situation would require the use of ____________?
Which
principle would be best for solving ___________?
What
procedure is best for improving ___________?
What
procedure is best for constructing __________?
What
procedure is best for correcting ___________?
Which
of the following is the best plan for __________?
Which
of the following provides the proper sequence for _______?
What
is the most probable effect of _________?
|
Figure 3:
Illustrative Comprehension and Application Questions
c.
Checklist
on True-False item is suitable for evaluating the ability to identify the
accuracy of a statement of fact. The instructor can use the following checklist
to re-evaluate all constructed questions. See figure 4.
16.
Is
this type of item appropriate for measuring the intended
learning outcome?
17.
Does
the item task match the learning task to be measured?
18.
Does
each statement contain one central idea?
19.
Can
each statement be unequivocally judged true or false?
20.
Are
the statements brief and stated in simple, clear
language?
21.
Are
negative statements used sparingly and double negatives
avoided?
22.
Are
statements of opinion attributed to some source?
23.
Are
the statements free of clues to the answer (e.g., verbal clues,
length)?
24.
Is
there approximately an even number of true and false
statements?
25.
When
arranged in the test, are the true and false items put in
random order?
|
Figure
4: Checklist For Evaluating True-False Items
To check evaluation of matching items
, the instructor needs to use the checklist proposed in figure 5 below:
26.
Is
this type of item appropriate for measuring the intended
learning outcome?
27.
Does
the item task match the learning task
to be measured?
28.
Does
each matching item contain only homogeneous
material?
29.
Are
the lists of items short with the brief responses on the
right?
30.
Is
an uneven match provided by making the list of responses
longer or shorter
than the list of premesis?
31.
Are
the responses in alphabetical or numerical order?
32.
Do
the directions clearly state the basis for matching and that
each response can be
used once, more than once, or not at
all?
33.
Does
the complete matching item appear on the same page?
|
Figure 5: Checklist for Evaluating
Matching Items
d.
Interpretive item is question in interpretive
form and the answer to this question type depends on the introductory material
such as paragraph, list, chart, map or picture. To evaluate this question type,
please refer to the checklist for instructor, in figure 6 below.
34.
Is
this type of exercise appropriate for measuring the
intended learning
outcome?
35.
Is
the introductory material relevant to the learning outcome?
36.
Is
the introductory material familiar but new to the
examinees?
37.
Is
the introductory material brief and at the appriopriate
reading level?
38.
Do
the test items call forth the performance specified in the
learning outcomes?
39.
Do
the test items meet the criteria for effective item writng
that apply to the item type used?
40.
Is
the interpretive exercise free of extraneous clues?
|
Figure 6: Checklist for Evaluating Interpretive Exercises
I-031-3(10)
IS 2
|
EVALUATION OF PERFORMANCE
ASSESSMENT RESULT
|
2.1
Evaluating
Performance Assessment Result
The procedure for developing
performance tests involves three major steps: (1) specifying the objectives,
(2) specifying the items to be observed, and (3) specifying the criteria that
determine successful completion of a task. The three-part objective should
likewise include (1) a statement of the conditions given within which the trainee
must perform, (2) the performance expected of the trainee, and (3) the standard
against which trainee performance will be evaluated. The level of the expected
performance is of critical concern to the instructor in developing performance
tests. Typically, almost any performance in business education skill courses is
a combination of more elementary performances. In constructing a performance
test, the instructor must be sure that the trainee brings to the task an
appropriate background of prepequisite knowing and doing skills.
For this reason, much care should be taken in writing the objective and in
selecting the correct verb. Once the verb is selected and the trainee is told
the nature of the expected performance, the context or “givens” should be
described. The givens describe the tools, previous knowledge, machines,
software, or prerequisites that the trainee will be required to use in
demonstrating the behavior. The third part of the three-part objectives—the
performance standard—is used to judge whether a trainee has mastered a task. It
communicates the quality of performance expected and provides a basis for the instructor
to judge the quality of the product, as in the following example:
Given 150 words of dictation at 90
words per minute, the trainee will transcribe a business letter into mailable
copy.
In a performance tests the items are
typically steps of procedure that the trainee must complete correctly in order
to complete the performance. In a product evaluation, the items are related to
a finished product that can be observed. To continue the previous example, in
order to turn out a mailable letter the trainee -----
Types with reasonable accuracy and
speed
Takes dictation in shorthand notes at
90 words per minute
Checks shorthand notes for grammar,
punctuation, special notations
Estimates the length and placement of
the business letter
Trasncribes shorthand notes at the
typewriter
Makes punctuation, spelling, and other
editing corrections, and proof reads the finished product.
The procedures used to determine the
items for both performance tests and product evaluations are quite similar.
They both start with a definition of the learning objective. The next step is
to determine which items should be observed during the performance on the
finished product. When identifing and stating steps, the following rules should
be observed:
1. Begin each step with a verb (major
categories include, for example - call, check,
compile, compose, deliver, determine,
type duplicate, file and so on).
2.
Make
each step independent of other steps. (Avoid evaluating the same
performance in more than one step)
3.
Include
only one task performance in each step
After the performance steps have been
identified following the above procedure, they are used together with the
performance objective as the foundation for developing a product performance
test. The most common form of performance test is a performance checklist which
summarizes (a) the objective the trainee has to achieve, (b) the procedural
learning steps he or she should have taken, and (c) criteria for making a
judgement about whether the trainee has completed the task satisfactorily.
After the basic instrument is constructed, the instructor must determine how it
is to be scored. If 100 percent mastery is required, the studen must complete
each step satisfactorily in terms of what can be seen after the performance is
completed
.
The following points summarize the
detailed procedures that should be used in developing performance evaluation
instruments.
1.
Specify
the objectives.
2.
Determine
whether to evaluate using a performance checklist or a final product
evaluation.
3.
List
the procedural steps if a performance checklist is to be used.
4.
List
the factors to be rated after the performance, if a product evaluation is to be
used.
5.
Identify
critical items. (Determine if trainee and/or instructor checkpoints are needed
when using a product evaluation).
6.
Determine
the criteria for judging satisfactory completion of each step.
7.
Establish
the acceptable mastery level score for
the instrument.
2.2
Rating
Scales
The rating scale is similar to
the checklist and serves somewhat the same purpose in judging procedures and
products. The main difference is that the rating scale provides an opportunity
to mark the degree to which an element is present instead of using the simple
present-absent judgement. The scale for rating is typically based on the frequency
with which an action is performed (e.g., always, sometimes, never), the general
quality for a performance (e.g., outstanding, above average, average, below
average), or a set of descriptive phrases that indicates degrees of acceptable
performance (e.g., completes task quickly, slow in completing task, cannot
complete task without help). Like the checklist, the rating scale directs
attention to the dimensions to be observed and provides a convenient form on
which to record the judgements.
A sample rating scale for evaluationg
both procedures and product is shown in figure
7. Although this numerical rating scale uses fixed alternatives, the same
scale items could be described by descriptive phrases that vary from item to
item.
In
this case, each rated item would be arranged as follows:
Plan for the project
Directions: Rate each of the following items by
circling the appropriate number. The numbers represent the following values:
5—outstanding; 4—above average; 3—average; 2—below average; 1—unsatisfactory.
PROCEDURE RATING SCALE
How
effective is the trainees performance in each of the following areas?
5 4
3 2 1 (a) Preparing a detailed plan for the
project.
5 4
3 2 1 (b) Determining the amount of material
needed.
5 4 3
2 1 (c) Selecting the proper
tools.
5 4
3 2 1 (d) Following the correct procedures for
each operation.
5 4
3 2 1 (e) Using tools properly and skillfully.
5 4
3 2 1 (f) Using materials without unnecessary
spoilage.
5 4 3 2 1
(g) Completing the work within a reasonable amount of time.
PRODUCT RATING SCALE
To
what extent does the product meet the following criteria?
5 4
3 2 1 (a) The product appears neat and well
constructed.
5 4
3 2 1 (b) The dimensions match the original
plan.
5 4
3 2 1 (c) The finish meets specifications.
5 4
3 2 1 (d) The joints and parts fit properly.
5 4
3 2 1 (e) The materials were used effectively.
Figure 7: Rating Scale for A
Woodworking Project
A
space for comments might also be added under each item, or at the bottom of
each set of items to provide a place for clarifying the ratings or describing
how to improve performance.
The construction of a rating scale for
performance assessment/test typically includes the following steps.
1.
List
the procedural or product characteristics to be evaluated.
2.
Select
the number of points to use on the scale and define them by descriptive terms
or phrases.
3.
Arrange
the items on the rating scale so that they are easy to use.
4.
Provide
clear, brief instructions that tell the rater how to mark items on the scale.
5.
Provide
a place for comments, if needed for diagnostic or instructional purposes.
1.
Specify
the intended performance outcomes in observable terms and describe the use to
be made of the results.
2.
Limit
the observable dimensions of the performance to a reasonable number.
3.
Provide
clear, definite criteria for judging the procedure or product
4.
Select
the performance setting that provides the most relevant and realistic
situation.
5.
If
a structured performance situation is used, provide clear and complete
instructions.
6.
Be
as objective as possible in observing, judging, and recording the
performance.
7.
Observe
the performance under various conditions and use multiple observations
whenever possible.
8.
Make
a record as soon as possible after an observation.
9.
Use
evaluation forms that are clear, relevant, and easy to use.
10.
Use
a scoring procedure that is appropriate for the use to be made of the results
(e.g., holistic for global evaluation, analytic for diagnostic purpose)
11.
Inform
trainees of the method and criteria to be used in evaluating the performance.
12.
Supplement
and verify performance assessments with other evidence of achievement.
|
Figure 8: Improving Performance Assessments
I-031-3(10)
IS 3
|
ANALYSIS OF ASSESSMENT
RESULT
|
3.1
Analyzing
Test Result.
i.
Analysis
and interpretation of test depend on the procedures as shown in figure 9.
Figure 9: Analysis And Interpretation Of Test
Result.
ii.
Using
marks as scores
Test marks are important data in the
test process. In the process of making analysis, the test results are known as
raw scores or raw marks. Raw marks are collected at random and not yet arranged
in order, as in figure 10
78,
78, 80, 65, 63, 74, 67, 58, 74, 65
65,
63, 74, 86, 80, 74, 67, 50, 78, 89
|
Figure 10: Scores
Raw scores can be used for analysis
when arranged in frequency distributuon table
iii.
Frequency
Distribution Table
The table shows raw score in ascending
or descending order with its frequency recorded in the table . Please refer to the
frequency score in figure 11.
Score (x)
|
Tally
|
Frequency
|
89
86
80
78
74
67
65
63
58
50
|
1
1
11
11
11
111
11
1
1
|
1
1
2
2
5
2
3
2
1
1
|
Jumlah
|
20
|
20
|
Figure. 11: Non-Cumulative Raw Score Frequency Distribution Table.
iv.
Raw
scores arranged as in figure 11 are
known as non-cumulative raw scores. By showing raw scores in frequency
distribution table, we can use the table to transfer raw scores in graphic form
as in figure 12.
v.
Forming Cumulative Frequency Distribution Table (Cumulative
Frequency )
Cumulative frequency table for any
non-cumulative score distribution is calculated from the sum of frequency for the
particular score and total frequency for an earlier score as in figure 13(below).
Score (x)
(Less than)
|
Sum of frequency
|
Cumulative frequency(cf)
|
50
55
60
55
70
75
80
85
90
|
0
1+0
1+1
2+2
5+4
5+9
2+14
2+16
2+18
|
0
1
2
4
9
14
16
18
20
|
Figure 13: Cumulative Frequency Distribution Table
vi.
Accumulation
of raw scores in class interval.
Class intervals are raw scores
accumulated according to the width or size of the class intervals. Class
interva can be determined by using the following formula:
Highest score – lowest score
Size of class interval = ________________________
Number of classes
By determining class interval in this
way, raw scores can easily be translated into cumulative scores in the
frequency distribution table.
Class
Interval Score
|
Tally
|
Frequency
|
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
|
1
1
11
11
11
11
|
1
1
2
5
5
2
2
2
|
Total
|
20
|
20
|
Frequency
Distribution Table with Cumulative Scores.
Based on the cumulative scores above,
we can form a cumulative frequency distribution and percentage table, as in the
table below.
Score(x) Class interval
|
Score (x)
Class boundary
|
Cumulative Frequency (f)
|
Cumulative frequency which is less than upper class boundary
|
Cumulative frequency
procedure
|
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
|
39.5
- 44.5
44.5
- 49.5
49.5
- 54.5
54.5
- 59.5
59.5
- 64.5
64.5
- 69.5
69.5
- 74.5
74.5
- 79.5
|
1
1
2
5
5
2
2
2
|
1
2
4
9
14
16
18
20
|
5
10
20
45
70
80
90
100
|
Cumulative Frequency Distribution
and Percentage table with Cumulative Score
Cumulative frequency percentage can be
obtained from the following formula:
Cumulative frequency = Cumulative frequency x 100%
percentage Total frequency
For class borderline 54.5 – 59.5,
Cumulative frequency =
9/20 X 100% = 45%
Transfer
of raw score into graphic forms
Accumulated scores in frequency
distribution and percentage table can also be transferred into graphic forms.
The types of graph they are usually transferred to are cumulative frequency
distribution and percentage curve, histogram, polygon frequency, frequency
curve. Example of cumulative frequency curve is as below.
Class boundary score
|
Frequency (f)
|
Cumulative frequency less than upper
class boundary
|
Cumulative frequency percentage
(cf%)
|
40.5-45.5
45.5-50.5
50.5-55.5
55.5-60.5
60.5-65.5
65.5-70.5
70.5-75.5
|
2
5
7
12
9
3
2
|
2
7
14
26
35
38
40
|
5
17.5
35
65
87.5
95
100
|
Total
|
40
|
|
|
Example of Percentage Ranking
Grade
Percentage
Grade percentage is a scale that
divides scores into 100 units, from 1-100. The percentage ranking is used to
determine the position of a candidate compared to the achievement percentage of
other candidates. For example if Faridah obtains ranking of 75%, we can
interpret that Faridah is much better than the other 75 % of candidates in any
test.
Percentage ranking for non-cumulative
scores can be calculated using the following formula :
Grade Percentage (PR) = 100 – 100P-50
N
When P = Grade , N = Total no. of
candidates.
|
Example:
Name
of candidate
|
Score (x)
|
Grade
|
Husssien
Soh Mun
Cheng
Hong
Sulaiman
Salmah
Mutu
|
84
80
80
78
75
64
|
1
2.5
2.5
4
5
6
|
|
N = 6
|
|
Trainees’
Ranking in a Science Test
vii Calculation of Mean, Median and Mode
Min, median and mode are formulated as
measurement of central tendency. These three measurements are basic statistics
for use in making comparison of achievements among candidates sitting for the
same test. The measurement of central tendency is used as basic criterion in
determining candidate’s grades achievement as well as interpret the degree of
difficulty and suitability of the test.
Mean
Mean, basically is measured by adding
all scores obtained in a measurement and the sum is divided by the total number
of scores or its frequency. The mean formula for non-cumulative score is:
X = ∑x ∑ = sum (sigma)
N X = individual score
N = Number of scores/ candidates
Example:
Score x = 20,
30, 30, 40,
50, 50, 50,
60, 70, 80
Therefore mean x = 20 +
30 + 30
+ 40 +
50 + 50
+ 50 + 60
+ 70
+ 80
10
= 480/10
= 48
If the above scores are presented in
the frequency distribution table, the mean value can be obtained by totaling
the product of frequency with their individual scores, and then the sum (fx) is
divided by the total frequency(f) ie:
Mean= x = fx fx =
the sum of multipled frequency
f with
individual scores
f =
the sum of frequency
Median
Median is the value placed in mid –
point among scores which are arranged in ascending or descending order. This
median can be obtained by dividing one set of non-cumulative scores which are
arranged into 2 equal parts. If the number of scores are even, the median is
the mean of the two mid-point scores.
Example 1
Score : 30, 45,
48, 48, 54,
55, 60, 62,
68
Median = score in mid point = 54
Mode
Mode in basic statistics is score with
the highest frequency in any score distribution. The symbol customarily used is
Mo. The value
of mode can be obtained by arranging the
score distribution in ascending or descending order. The score with the highest
frequency is its mode.
Example:
Test score : 52, 54,
54, 54, 57,
62, 63, 63,
63, 65
Mode,
Mo = 54 (that is, score with highest
frequency in its distribution).
Sometimes in a score distribution,
there are two or more than two modes. A score distribution that has two mode
values is known as bi-mode
Example :
Test score : 48,
53, 62, 62,
65, 70, 70,
70, 75
Not every score distribution possesses
the mode. The value of mode will not exist when the score distribution
possesses the same frequency.
Measurement
of Variability
Apart from mean, median and mode,
variability measurement is also used to make analysis and interpretation of trainee’s
achievement in any test. This measurement will give a more complete picture of
marks distribution.
Data Interpretation
Making analysis and interpretation is
the most important stage in any evaluation activity. This is because the
feedback from the interpretation will be the basic consideration for any
recommendation and follow-up action, to improve unsatisfactory tests and
practices or as basis for modification of future testing processes.
The process of analysis and
interpretation can only be carried out after data have been collected from the
testing and measurement processes. Data collected in this way will be presented
in statistical form such as table or data representation, for example: histogram,
line graph, frequency curve and so on. Interpretation will be carried out by
referring to the table or the data representation. Following that, the
conclusion will be made based on data interpretation.
Figure 14: The process of Data Analysis of an Assessment
Grades assigned to trainee’s work
should represent the extent to which the instructional objectives (i.e., the
intended learning outcomes) have been achieved, and should be in harmony with
the grading policies of the school. Some schools have both clearly defined
objectives and grading policies; many schools have neither. With or without the
guidance of clear-cut policies and procedures, the assigning of grades is a
difficult and frustrating task. It is somewhat easier if valid evidence of
achievement has been gathered throughout the course.
Assessment of learning during
instruction might include the use of objective and essay tests, ratings,
papers, and various types of projects or laboratory work. The probem of grading
is that of summarizing this diverse collection of information into a single
letter grade or brief report. Because the single letter guide (e.g., A, B, C,
D, F ) is the most widely used grading system, we shall focus on how best to
assign such grades. This involves several important considerations: (1) What
frame of reference or standard, should be used to report level of performance?
(2) How should the performance data be combined for grading? (3) What
guidelines should be followed to provide the most effective and fair grading
system? Each of these will be discussed in turn.
i. Selecting the Basis for Grading
Letter grades are typically assigned
by comparing a trainee’s performance to a pre-specified standard of performance
(absolute grading) or to the performance of the members of a group (relative
grading). In some cases, grades are based on or modified by learning ability of
the trainee, the amount of improvement shown over a given instructional period,
or trainee effort. As we shall see later, these factors provide an inadequate
basis for assigning grades.
a.
Absolute
Grading
A common type of absolute grading is
the use of letter grades defined by a 100-point system. Whether assigning
grades to an individual set of test scores, or as a basis for the final grades
in a course, the set of grades might be expressed as one of the following:
A=
B=
C=
D=
F=
|
POINTS
90-100
80-89
70-79
60-69
below 60
|
POINTS
95-100
85-94
75-84
65-74
below 65
|
POINTS
91-100
86-90
81-85
75-80
below 75
|
In the case of an individual test,
this 100-point system might represent the percentage of items correct or the
total number of points earned on the test. When used as a final grade, it
typically represents a combining of scores from various tests and other
assessment results. In any event, it provides an absolute basis for assigning
letter grades.
Which set of points provides the best
basis for assigning grades? There is no way of knowing. The distribution of
points is arbitrary. Whatever distribution is used, however, should be based on
the instructor’s experience with this and past groups of trainees, knowledge
concerning the difficulty of the intended learning outcomes, the difficulty of
the tests and other assessments used, the conditions of learning and the like.
These are all subjective judgements, however, and shifts in the proportion of trainees
getting the letter grade of A or F are difficult to evaluate. Do a larger
number of grades of A represent improved instruction and better study habits by
trainees, or easier tests and less rigid grading of papers and projects? Do
more failures indicate poor teaching, inadequate study, or tests that have
inadvertently increased in difficulty?
Despite the problem of setting
meaningful standards for an absolute grading system, this method is widely used
in schools. It is most appropriate in mastery-type programs where the set of
learning tasks has been clearly specified, the standards have been defined in
terms of the learning tasks, and the tests and other assessment techniques have
been designed for criterian-referenced interpretation. All too frequently,
however, absolute grading is based on some hodgepodge of ill- defined
achievement results. When the distribution of points does not fit the grading
scale, the points are adjusted upward or downward by some obscure formula to
get a closer fit. Needless to say, such grades do not provide a meaningful
report of the extent to which the intended learning outcomes have been
achieved.
b.
Relative
Grading
When assigning grades on a relatives basis,
the trainees are typically ranked in order of performance (based on a set of
test scores or combined assessment results) and the trainees ranking highest
receive a letter grade of A, the next highest receive a B, and so on. What
proportion of trainees should receive each grade is pre-determined and might
appear as one of the following:
A
B
C
D
F
|
PERCENT
OF TRAINEES
15
25
45
10
5
|
PERCENT
OF TRAINEES
10-20
20-30
40-50
10-20
0-10
|
The percentage of trainees to be
assigned each grade is just as arbitrary as the selection of points for each
grade in the absolute grading system. The use of a range of percents (e.g.,
A=10-20 percent) should probably be favored because it makes some allowance for
differences in the ability level of the class. It does not make sense to assign
15 percent A’s to both a regular class and a gifted class. Likewise, in an
advanced course a larger proportion of A’s and B’s should be assigned and fewer
(if any) F’s because the low-achieveing trainees have been weeded out in earlier
courses. Where these percentages have been set by the school system, one has
little choice but to follow the school practice—at least until efforts to
change it are successful.
Older measurement books recommended
using the normal curve to assign grades. This resulted in the same percent of
A’s and F’s (e.g., 7 percent) and B’s and D’s (e.g., 38 percent). Although some
instructors may still use such a system, its use should be discouraged.
Measures of achievement in classroom groups seldom yield normally distributed
scores. Also, to maintain the same proportion of grades, especially failures,
at different grade levels does not take into account that the trainee
population is becoming increasingly select as the failing trainees are held
back or drop out of school.
The relative grading system requires a
reliable ranking of trainees; thus, it is most meaningful when the achievement
measures provide a wide range of scores. This makes it possible to draw the
lines between grades with greater assurance that misclassifications will be
kept to a minimum. Ideally, of course, the spread of scores should be based on
the difficulty and complexity of the material learned. For example, an A should
not simply represent more knowledge of factual material, but a higher level of
understanding, application, and thinking skills. Thus, although norm-referenced
interpretation is being utilized, the real meaning of the grades comes from
referring back to the nature of the achievement that each grade represents.
c.
Learning
Ability, Improvement and Effort
In some cases, attempts are made to
base grades on achievement in relation to learning ability, the amount of
improvement in achievement, or the amount of effort a trainee puts forth. All
of these procedures have problems that distort the meaning of grades.
Grading on the basis of learning
ability has sometimes been used at the elementary level to motivate trainees
with less ability. At first glance, it seems sensible to give a grade of A to trainees
who are achieving all that they are capable of achieving. There are two major
problems with this procedure, however. First, it is difficult, if not
impossible, to get a dependable measure of learning ability apart from
achievement. Both tests have similar type items and measure similar concepts.
Second, the meaning of the grades become distorted. A low-ability trainee with
average performance might receive an A, whereas a high-ability trainee with
average performance receives a grade of C. Obviously the grades are no longer
very meaningful as indicators of achievement.
Using the amount of improvement
as a basis for grading also has its problems. For one, the different scores
between measures of achievement over short spans of time are very unreliable.
For another, trainees who score high on the entry test cannot possibly get a
high grade because little improvement can be shown. Trainees who know about
this grading procedure ahead of time can, of course, do poorly on the first
test and be assured of a fairly good grade. This is not an uncommon practice
where grades are based on improvement. Finally, the grades lack meaning as
indicators of achievement, when an increase in achievement is more
considerable, the trainee might receive
an A, while achieving trainee with little improvement receives a B or C.
Grading on the basis of effort,
or adjusting grades for effort, also distorts the meaning of the results. Low
achieving trainees who put forth great effort receive higher grades than their
achievement warrants and high-achieving trainees who put forth little effort
are likely to receive lower grades than deserved. Although such grading seems
to serve a motivational function for low-achieving trainees, the grades become
meaningless as measures of the extent to which trainees are achieving the
intended learning outcomes.
In summary, assigning grades that take
into account learning ability, amount of improvement, or effort simply
contaminates the grades and distorts their meaning as indicators of trainee
achievement. Other factors may be rated separately on a report card, but they
should not be allowed to distort the meaning of the letter grade.
d.
A
Combination of Absolute and Relative Grading
Grades should represent the degree of
which instructional objectives (i.e., intended learning outcomes) are achieved
by trainees. Some of the objectives of instruction are concerned with minimum
essentials that must be mastered if a trainee is to proceed to the next level
of instruction. Other objectives are concerned with learning outcomes that are
never fully achieved but towards which trainees can show varying degrees of
progress. The first are called minimal objective and the second developmental
objectives.
Minimal objectives are concerned with
the knowledge, skill, and other lower-level learning outcomes that represent
the minimum essentials of the course. In order to receive a passing grade, a trainee
must demonstrate that this basic knowledge and skill, which are pre-requisite
to further learning in the area, have been learned to a satisfactory degree. Developmental
objectives are concerned with higher-level learning outcomes such as
understanding, application, and thinking skills. Although we can identify
degrees of progress toward these objectives, we cannot expect to ever fully
achieve them. In science, for example, we might expect all trainees to master
basic terms, concepts, and skills, but encourage each trainee to proceed as far
as he or she can in understanding and applying the scientific process, and in
developing the intellectual skills used by scientists. Similarly, all trainees
in math might be expected to master the fundamental operations, but show wide
diversity in problem-solving ability and mathematical reasoning. In all
instructional areas there are lower-level objectives that should be mastered by
all trainees and higher-level objectives that provide goals that never can be
fully achieved. Thus, with minimal objectives, we attempt to obtain a uniformly
high level of performance for all trainees, and with developmental objectives
we encourage each trainee to strive for maximum development.
As indicated earlier, the pass-fail
decision should be based on whether or not the minimal objectives have been
mastered. Trainees demonstrating that they have achieved the minimal
objectives, and thus have the necessary prerequisites for success at the next
level of instruction, should be passed. Those who do not should fail. This
requires an absolute judgment, not a relative one. Trainees should not
be failed simply because their achievement places them near the bottom of some
group. It is the nature of the achievement that is significant.
Above the pass-fail cutoff point,
grades should be assigned on a relative basis. This is because trainee’s scores
will tend to be spread out in terms of their degree of development beyond the
minimal level. Trainees cannot be expected to master the more complex learning
outcomes described by developmental objectives, but they can show varying
degrees of progress towards their attainment. Although absolute grading could be used, this is not
possible at this time. The best we can do is obtain a spread of trainee
achievement scores in terms of the complexity of the learning outcomes attained
and use relative grading. If properly done, a grade of A would represent
greater achievement of the higher-level learning outcomes and not simply a high
relative position in the group. This would assume, of course, that tests and
other assessment techniques would measure a range of achievement from simple to
complex, and not just knowledge of factual information and simple skills, as is
commonly done now.
In many cases the school will dictate
the grading policy, including the basis on which the grades are to be assigned.
Regardless of the system used, it is important to relate the grades back to trainee
achievement so that different grades represent different levels of performance.
Letter grades without an achievement referent tend to have little meaning.
e.
Combining
Data For Grading
Assigning grades typically involves
combining results from various types of assessment, including such things as
tests, projects, papers, and laboratory work. If each element is to be included
in the grade in terms of its relative importance, the data must be combined in
a way that proper weights are used. For example, if we want test scores to
count 50 percent, paper 25 percent, and laboratory work 25 percent of the
grade, we need a method that will convert
results into numerical scores
first.
The method of combining scores so that
proper weights are obtained for each element is not as simple as it seems. A
common procedure is simply to add scores together if they are to have equal
weight and to multiply by two if an element is to count twice as much as the
other. This typically will not result in each element receiving its proper
weight, even if the highest possible score is the same for all sets of scores.
How much influence each element has in a composite score is determined by the
spread, or variability of score and not the number of total points.
The problem of weighting scores when
combining them can be best illustrated with a simple example. Let’s assume we
only have two measures of achievement and we want to give them equal weight in
a grade. Our two sets of achievement scores have score ranges as follows:
Test scores 20 to 100
Laboratory work 30 to 5
If we simply add together a trainee’s
test score and score on laboratory work, the grade the trainee receive would be
determined largely by the test score because of its wide spread of scores. This
can be shown by comparing a trainee who has the highest test score and lowest
laboratory score (Trainee 1) with a trainee who has the lowest test score and
highest laboratory score (Trainee 2).
Test
score
Laboratory
score
|
TRAINEE 1
100
30
|
TRAINEE 2
20
50
|
Composite
score
|
130 |
70 |
It
is quite obvious that the composite score does not represent equal weightage.
With sets of scores like those for our
test and laboratory work, it is not uncommon for instructors to attempt to give
them equal weight by making the top possible score equal. This can be done, of
course, by multiplying the score on laboratory work by two, making the highest
possible score 100 for both measures. Here is how the two composite scores for
our hypothetical trainees would compare under this system:
Test
score
Laboratory
score (x 2)
|
TRAINEE 1
100
60
|
TRAINEE 2
20
100
|
Composite
score
|
160 |
120 |
Our composite scores make clear that
equalizing the maximum possible score does not provide equal weights either. As
noted earlier, the influence a measure has on the composite score depends on
the spread, or variability, of scores. Thus, the greater the spread of scores,
the larger the contribution to the composite score.
We can give equal weight to our two
sets of scores by using the range of scores in each set. Because our test
scores have a range of 80 (100-20) and our laboratory scores have a range of 20
(50-30), we must multiply each laboratory score by four to equalize the spread
of scores and, thus, give them equal weight in the somposite score. Here are
the composite scores for our two hypothetical trainees:
Test
score
Laboratory
score (x 4)
|
TRAINEE 1
100
120
|
TRAINEE 2
20
200
|
Composite
score
|
220 |
220 |
At last we have a system that gives
the two measures equal weight in the
composite score. Note that if we wanted to count our test score twice as much
as the laboratory score, we would multiply it by two and the laboratory score
by four. However, if we wanted to have our laboratory score count twice as much
as the test score, we would have to multiply each laboratory score by eight.
Thus, when we ariginally multiplied our laboratory score by four, we simply
adjusted the spread of those scores to match the spread of the test scores.
When the two sets of scores have the same range of scores, we can then assign
additional weights in terms of their relative importance.
The
range of scores provides only a rough approximation of score variability but it
is satisfactory for most classroom grading purposes. A more dependable basis
for weighting grade components can be obtained with the standard deviation (see Oosterhof, 1990).
Some
instructors obtain a composite grade by converting all test scores and other
assessments to letter grades, converting the letter grades to numbers (e.g., A
= 4, B = 3, C = 2, D = 1, F = 0) and then averaging them for a final grade.
When this procedure is followed, information is lost because the data are
reduced to only five categories. For example, a trainee with a high A and high
B would receive the same average grade as a trainee with a low A and a low B.
To overcome this problem, pluses and minuses are sometimes added (e.g., A+ =
12, A = 11, A- = 10, B+ = 9, B = 8, B- = 7, etc). This provides more categories
but some information is still lost. A better solution is to use numerical
scores on all assessments and then combine these numerical scores into a
composite score before assigning grades.
ii. Guidelines for Effective and Fair
Grading
Assigning grades that
provide a valid measure of trainee’s achievement, that have a meaning beyond
the classroom in which they are given, and that are considered to be fair by trainees,
is a difficult but important part of teaching. The following guidelines provide
a framework that should help clarify and standardize the task.
- Inform
trainees at the beginning of instruction what grading procedures will be
used. This
should include what will be included in the final grade (e.g., tests,
projects, laboratory work) and how much weight will be given to each
element. It should also include a description, in achievement terms, of
what each letter grade represents. A descriptive handout may be helpful.
- Base
grades on trainee achievement, and achievement only. Grades should represent the
extent to which the intended learning outcomes were achieved by trainees.
They should not be contaminated by trainee effort, tardiness, misbehavior,
or other extraneous factors. These can be reported on seperatedly, but
they should not influence the achievement grade. If they are permitted to
become a part of the grade, the meaning of the grade as an indicator of
achievement is lost.
- Base
grades on a wide variety of valid assessment data. All too frequently, grades are
based primarily, if not entirely, on test scores. If grades are to be
sound indicators of achievement, all important learning outcomes must be
assessed and the results included in the final grade. Evaluation of
papers, projects, and laboratory work is not as reliable as objective test
scores but to eliminate them lowers the validity of the grades.
- When
combining scores for grading, use a proper weighting technique. As noted earlier, the influence
of a component on the overall grade is determined by the spread, or
variability, of the scores. Thus, in combining scores to obtain a
composite for assigning grades, be sure the spread of scores is equalized
before weighting and combining them.
- Select
an appropriate frame of reference for grading. If the entire instruction is
based on mastery learning, it is necessary to use an absolute
standard for grading and to define the grades in mastery terms. For
conventional classroom instruction, the pass-fail distinction should be
described in absolute terms and the grades above that determined by
relative position in the group. However, these relative letter grades
should have achievement referents representing learning outcomes ranging
from simple to complex.
- Review
borderline cases by re-examining all achievement evidences. When setting cutoff points for
each grade, there is typically a trainee or two just below the cut-off
line. Measurement errors alone might be responsible for a trainee being
just below (or above) the line. Also, the composite score may contain a
clerical error, or one low test score contributing to the composite score,
may be due to illness or some other extraneous factor. In any event, it is
wise to review the data for borderline cases and make any needed
adjustments. When in doubt, fair grading would favor giving the trainee
the higher grade.
I-031-3(10)
IS 4
|
PREPARATION OF REPORT
|
Preparing a report is the last stage in all evaluation
processes. The evaluation report usually contains the following items:
a)
Introduction
to the research topic.
b)
Description / explanation of theme or topic.
c)
Objective
or purpose of evaluation.
d)
Research
methodology.
e)
Procedures
for data gathering.
f)
Collected
or gathered data are attached together with record, document and forms for
carrying out analysis and
interpretation.
g)
Report
on analysis, interpretation and conclusion.
h)
Recommendation
and follow-up action based on interpretation and conclusion.
i)
Bibliografy
and reference materials.
No comments:
Post a Comment