Sunday, August 23, 2015

MODULE 8 - VALIDATING ASSESSMENT


1.1         Evaluating Knowledge Assessment Result.

Definition of evaluation
It is the process of making a decision about student learning. Evaluation requires us to make a judgement about trainee knowledge and performance. The measurement or data you gain from assessment helps you make the decision on evaluation.
                                                                                                        

                 Diagnostic
               Formative
             Summative
Formal
Informal
Formal
Informal
Formal
Informal
Standardized test
Pre - tests
Inquiry
Questionnaires
Observations
Discussions
Checklist
Quizzes
Question -
answers
Assignments
Standardized
tests
Classroom
tests
Journals
Observations
Question-
answers
Trainee
comments
Assignments
Inquiry Work
Projects
Standardized tests
Classroom tests
Discussion
Observations
Work
Projek
Trainee
feedback
Figure 1: Framework of Evaluation

a.    Multiple –choice items are suitable for various types of complex learning. It is most appropriate for measuring interpretation of learning. Checklist can be used to check and assess multiple-choice items. Guidelines to help instructors assess multiple choice items are provided as in figure 2 below:
 


1.        Is this type of item appropriate for measuring the intended learning outcome?
2.        Does the item task match the learning task to be measured?
3.        Does the stem of the item present a single, clearly formulated problem?
4.        Is the stem stated in simple, clear language?
5.        Is the stem worded so that there is no repetition of material in the alternatives?
6.        Is the stem stated in positive form wherever possible?
7.        If negative wording is used in the stem, is it emphasized (by underlining or caps)?
8.        Is the intended answer correct or clearly the best?
9.        Are all alternatives grammatically consistent with the stem and parallel in form?
10.     Are the alternatives free from verbal clues to the correct answer?
11.     Are the distracters plausible and attractive to the uninformed?
12.     To eliminate length as a clue, is the relative length of the correct answer varied?
13.     Has the alternative all of the above been avoided and none of the above used only when appropriate?
14.     Is the position of the correct answer varied so that there is no detectable pattern?
15.     Does the item format and grammar usage provide for efficient test taking?

Figure 2: Checklist for Evaluating Multiple-Choice Items

b.    Guidelines to check comprehension and application items need to be evaluated, based on the constructed questions items. Below are guidelines for instructors to construct assessment/test questions according to the above knowledge assessments. Refer to the following checklist (figure 3).

Comprehension Question


Which of the following is an example of ____________?
What is the main thought expressed by _____________?
What are the main differences between _____________?
What are the common characteristics of _____________?
Which of the following is another form of ____________?
Which of the following best explains ____________?
Which of the following best summarizes __________?
Which of the following best illustrates ____________?
What do you predict would happen if ___________?
What trend do you predict in ___________?

Application Question


Which of the following methods is best for _____________?
What steps should be followed in applying _____________?
Which situation would require the use of ____________?
Which principle would be best for solving ___________?
What procedure is best for improving ___________?
What procedure is best for constructing __________?
What procedure is best for correcting ___________?
Which of the following is the best plan for __________?
Which of the following provides the proper sequence for _______?
What is the most probable effect of _________?

Figure  3: Illustrative Comprehension and Application Questions



c.    Checklist on True-False item is suitable for evaluating the ability to identify the accuracy of a statement of fact. The instructor can use the following checklist to re-evaluate all constructed questions. See figure 4.

16.     Is this type of item appropriate for measuring the intended  
                         learning outcome?
17.     Does the item task match the learning task to be measured?
18.     Does each statement contain one central idea?
19.     Can each statement be unequivocally judged true or false?
20.     Are the statements brief and stated in simple, clear 
                         language?
21.     Are negative statements used sparingly and double negatives
                         avoided?
22.     Are statements of opinion attributed to some source?
23.     Are the statements free of clues to the answer (e.g., verbal clues,
                         length)?
24.     Is there approximately an even number of true and false
                         statements?
25.     When arranged in the test, are the true and false items put in
                         random order?

Figure 4: Checklist For Evaluating True-False Items

To check evaluation of matching items , the instructor needs to use the checklist proposed in figure 5 below:

26.     Is this type of item appropriate for measuring the intended
                         learning  outcome?
27.     Does the item  task match the learning task to be measured?
28.     Does each matching item contain only homogeneous
                         material?
29.     Are the lists of items short with the brief responses on the
                         right?
30.     Is an uneven match provided by making the list of responses
                         longer or shorter than the list of premesis?
31.     Are the responses in alphabetical or numerical order?
32.     Do the directions clearly state the basis for matching and that   
                         each response can be used once, more than once, or not at
                         all?
33.     Does the complete matching item appear on the same page?

Figure 5: Checklist  for Evaluating Matching Items




d.     Interpretive item is question in interpretive form and the answer to this question type depends on the introductory material such as paragraph, list, chart, map or picture. To evaluate this question type, please refer to the checklist for instructor, in figure 6 below.


34.     Is this type of exercise appropriate for measuring the
                         intended learning outcome?
35.     Is the introductory material relevant to the learning outcome?
36.     Is the introductory material familiar but new to the
                         examinees?
37.     Is the introductory material brief and at the appriopriate
                         reading level?
38.     Do the test items call forth the performance specified in the
                         learning outcomes?
39.     Do the test items meet the criteria for effective item writng
                         that  apply to the item type used?
40.     Is the interpretive exercise free of extraneous clues?

Figure 6: Checklist for Evaluating Interpretive Exercises






I-031-3(10) IS 2
EVALUATION OF PERFORMANCE ASSESSMENT RESULT

2.1         Evaluating Performance Assessment Result

The procedure for developing performance tests involves three major steps: (1) specifying the objectives, (2) specifying the items to be observed, and (3) specifying the criteria that determine successful completion of a task. The three-part objective should likewise include (1) a statement of the conditions given within which the trainee must perform, (2) the performance expected of the trainee, and (3) the standard against which trainee performance will be evaluated. The level of the expected performance is of critical concern to the instructor in developing performance tests. Typically, almost any performance in business education skill courses is a combination of more elementary performances. In constructing a performance test, the instructor must be sure that the trainee brings to the task an appropriate background of prepequisite knowing and doing skills. For this reason, much care should be taken in writing the objective and in selecting the correct verb. Once the verb is selected and the trainee is told the nature of the expected performance, the context or “givens” should be described. The givens describe the tools, previous knowledge, machines, software, or prerequisites that the trainee will be required to use in demonstrating the behavior. The third part of the three-part objectives—the performance standard—is used to judge whether a trainee has mastered a task. It communicates the quality of performance expected and provides a basis for the instructor to judge the quality of the product, as in the following example:

Given 150 words of dictation at 90 words per minute, the trainee will transcribe a business letter into mailable copy.

In a performance tests the items are typically steps of procedure that the trainee must complete correctly in order to complete the performance. In a product evaluation, the items are related to a finished product that can be observed. To continue the previous example, in order to turn out a mailable letter the trainee -----
Types with reasonable accuracy and speed
Takes dictation in shorthand notes at 90 words per minute
Checks shorthand notes for grammar, punctuation, special notations
Estimates the length and placement of the business letter
Trasncribes shorthand notes at the typewriter
Makes punctuation, spelling, and other editing corrections, and proof reads the finished product.

The procedures used to determine the items for both performance tests and product evaluations are quite similar. They both start with a definition of the learning objective. The next step is to determine which items should be observed during the performance on the finished product. When identifing and stating steps, the following rules should be observed:


1.    Begin each step with a verb (major categories include, for example - call, check,
compile, compose, deliver, determine, type duplicate, file and so on).
2.                          Make each step independent of other steps. (Avoid evaluating the same
performance in more than one step)
3.                          Include only one task performance in each step

After the performance steps have been identified following the above procedure, they are used together with the performance objective as the foundation for developing a product performance test. The most common form of performance test is a performance checklist which summarizes (a) the objective the trainee has to achieve, (b) the procedural learning steps he or she should have taken, and (c) criteria for making a judgement about whether the trainee has completed the task satisfactorily. After the basic instrument is constructed, the instructor must determine how it is to be scored. If 100 percent mastery is required, the studen must complete each step satisfactorily in terms of what can be seen after the performance is completed
.
The following points summarize the detailed procedures that should be used in developing performance evaluation instruments.

1.    Specify the objectives.
2.    Determine whether to evaluate using a performance checklist or a final product evaluation.
3.    List the procedural steps if a performance checklist is to be used.
4.    List the factors to be rated after the performance, if a product evaluation is to be used.
5.    Identify critical items. (Determine if trainee and/or instructor checkpoints are needed when using a product evaluation).
6.    Determine the criteria for judging satisfactory completion of each step.
7.    Establish the acceptable mastery level  score for the instrument.


2.2         Rating Scales

The rating scale is similar to the checklist and serves somewhat the same purpose in judging procedures and products. The main difference is that the rating scale provides an opportunity to mark the degree to which an element is present instead of using the simple present-absent judgement. The scale for rating is typically based on the frequency with which an action is performed (e.g., always, sometimes, never), the general quality for a performance (e.g., outstanding, above average, average, below average), or a set of descriptive phrases that indicates degrees of acceptable performance (e.g., completes task quickly, slow in completing task, cannot complete task without help). Like the checklist, the rating scale directs attention to the dimensions to be observed and provides a convenient form on which to record the judgements.
A sample rating scale for evaluationg both procedures and product is shown in figure 7. Although this numerical rating scale uses fixed alternatives, the same scale items could be described by descriptive phrases that vary from item to item.
In this case, each rated item would be arranged as follows:

Plan for the project

 








Directions: Rate each of the following items by circling the appropriate number. The numbers represent the following values: 5—outstanding; 4—above average; 3—average; 2—below average; 1—unsatisfactory.

PROCEDURE RATING SCALE


How effective is the trainees performance in each of the following areas?

5   4   3   2   1 (a) Preparing a detailed plan for the project.
5   4   3   2   1 (b) Determining the amount of material needed.
5   4   3   2   1 (c) Selecting the proper tools.
5   4   3   2   1 (d) Following the correct procedures for each operation.
5   4   3   2   1 (e) Using tools properly and skillfully.
5   4   3   2   1 (f) Using materials without unnecessary spoilage.
5   4   3   2   1 (g) Completing the work within a reasonable amount of time.

PRODUCT RATING SCALE


To what extent does the product meet the following criteria?

5   4   3   2   1 (a) The product appears neat and well constructed.
5   4   3   2   1 (b) The dimensions match the original plan.
5   4   3   2   1 (c) The finish meets specifications.
5   4   3   2   1 (d) The joints and parts fit properly.
5   4   3   2   1 (e) The materials were used effectively.

Figure 7: Rating Scale for A Woodworking Project


A space for comments might also be added under each item, or at the bottom of each set of items to provide a place for clarifying the ratings or describing how to improve performance.








The construction of a rating scale for performance assessment/test typically includes the following steps.

1.        List the procedural or product characteristics to be evaluated.
2.        Select the number of points to use on the scale and define them by descriptive terms or phrases.
3.        Arrange the items on the rating scale so that they are easy to use.
4.        Provide clear, brief instructions that tell the rater how to mark items on the scale.
5.        Provide a place for comments, if needed for diagnostic or instructional purposes.


1.        Specify the intended performance outcomes in observable terms and describe the use to be made of the results.
2.        Limit the observable dimensions of the performance to a reasonable number.
3.        Provide clear, definite criteria for judging the procedure or product
4.        Select the performance setting that provides the most relevant and realistic situation.
5.        If a structured performance situation is used, provide clear and complete instructions.
6.        Be as objective as possible in observing, judging, and recording the performance.
7.        Observe the performance under various conditions and use multiple observations whenever possible.
8.        Make a record as soon as possible after an observation.
9.        Use evaluation forms that are clear, relevant, and easy to use.
10.     Use a scoring procedure that is appropriate for the use to be made of the results (e.g., holistic for global evaluation, analytic for diagnostic purpose)
11.     Inform trainees of the method and criteria to be used in evaluating the performance.
12.     Supplement and verify performance assessments with other evidence of achievement.

Figure 8: Improving Performance Assessments





I-031-3(10) IS 3
ANALYSIS OF ASSESSMENT RESULT

3.1         Analyzing Test Result.

i.      Analysis and interpretation of test depend on the procedures as shown in figure 9.
Text Box: Make analysis on data representation Text Box: Do interpretation
Text Box: Collect and arrange trainee’s scores.  Text Box: Calculate the mean, mode, mean and range.
 







Figure 9: Analysis And Interpretation Of Test Result. 

ii.    Using marks as scores

Test marks are important data in the test process. In the process of making analysis, the test results are known as raw scores or raw marks. Raw marks are collected at random and not yet arranged in order, as in figure 10

78, 78, 80, 65, 63, 74, 67, 58, 74, 65
65, 63, 74, 86, 80, 74, 67, 50, 78, 89

Figure 10: Scores

Raw scores can be used for analysis when arranged in frequency distributuon table  

iii.   Frequency Distribution Table

The table shows raw score in ascending or descending order with its frequency recorded in the table . Please refer to the frequency score in figure 11.
Score (x)
Tally
Frequency
89
86
80
78
74
67
65
63
58
50
1
1
11
11
1111
11
111
11
1
1
1
1
2
2
5
2
3
2
1
1
Jumlah
20
20
Figure. 11: Non-Cumulative Raw Score Frequency Distribution Table.


iv.   Raw scores arranged as in figure 11 are known as non-cumulative raw scores. By showing raw scores in frequency distribution table, we can use the table to transfer raw scores in graphic form as in figure 12.












v.    Forming  Cumulative Frequency Distribution Table (Cumulative Frequency )

Cumulative frequency table for any non-cumulative score distribution is calculated from the sum of frequency for the particular score and total frequency for an earlier score as in figure 13(below). 

Score (x)
(Less than)
Sum of frequency
Cumulative frequency(cf)
50
55
60
55
70
75
80
85
90
0
1+0
1+1
2+2
5+4
5+9
2+14
2+16
2+18
0
1
2
4
9
14
16
18
20
Figure 13: Cumulative Frequency Distribution Table



vi.   Accumulation of raw scores in class interval.

Class intervals are raw scores accumulated according to the width or size of the class intervals. Class interva can be determined by using the following formula: 

            Highest score – lowest score 
Size of class interval = ________________________
        Number of classes 

By determining class interval in this way, raw scores can easily be translated into cumulative scores in the frequency distribution table.

Class Interval Score
Tally
Frequency
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
1
1
11
1111
1111
11
11
11
1
1
2
5
5
2
2
2
Total
20
20
Frequency Distribution Table with Cumulative Scores. 


Based on the cumulative scores above, we can form a cumulative frequency distribution and percentage table, as in the table below.


Score(x) Class interval
Score  (x)
Class boundary
Cumulative Frequency    (f)
Cumulative frequency  which is less than  upper class boundary
Cumulative frequency
procedure
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
39.5 - 44.5
44.5 - 49.5
49.5 - 54.5
54.5 - 59.5
59.5 - 64.5
64.5 - 69.5
69.5 - 74.5
74.5 - 79.5
1
1
2
5
5
2
2
2
1
2
4
9
14
16
18
20
5
10
20
45
70
80
90
100
 Cumulative Frequency Distribution and Percentage table with Cumulative Score



Cumulative frequency percentage can be obtained from the following formula:

Cumulative frequency =   Cumulative frequency  x 100% 
percentage                           Total frequency 

For class borderline   54.5 – 59.5,

Cumulative frequency   =   9/20 X 100% = 45%

Transfer of raw score into graphic forms
Accumulated scores in frequency distribution and percentage table can also be transferred into graphic forms. The types of graph they are usually transferred to are cumulative frequency distribution and percentage curve, histogram, polygon frequency, frequency curve. Example of cumulative frequency curve is as below.

Class boundary score
Frequency (f)
Cumulative frequency less than upper class boundary
Cumulative frequency percentage (cf%)
40.5-45.5
45.5-50.5
50.5-55.5
55.5-60.5
60.5-65.5
65.5-70.5
70.5-75.5
2
5
7
12
9
3
2
2
7
14
26
35
38
40
5
17.5
35
65
87.5
95
100
Total
40


   Example of Percentage Ranking

Grade Percentage


Grade percentage is a scale that divides scores into 100 units, from 1-100. The percentage ranking is used to determine the position of a candidate compared to the achievement percentage of other candidates. For example if Faridah obtains ranking of 75%, we can interpret that Faridah is much better than the other 75 % of candidates in any test.

Percentage ranking for non-cumulative scores can be calculated using the following formula :

Grade Percentage (PR) = 100 – 100P-50

             N

         When P = Grade , N = Total no. of candidates.



Example:
Name of candidate
Score (x)
Grade
Husssien
Soh Mun
Cheng Hong
Sulaiman
Salmah
Mutu
84
80
80
78
75
64
1
2.5
2.5
4
5
6

N = 6

Trainees’ Ranking in a Science Test


vii  Calculation of Mean, Median and Mode

Min, median and mode are formulated as measurement of central tendency. These three measurements are basic statistics for use in making comparison of achievements among candidates sitting for the same test. The measurement of central tendency is used as basic criterion in determining candidate’s grades achievement as well as interpret the degree of difficulty and suitability of the test.

Mean

Mean, basically is measured by adding all scores obtained in a measurement and the sum is divided by the total number of scores or its frequency. The mean formula for non-cumulative score is:  

X            =  ∑x            = sum (sigma)
                       N             X         = individual score
                                       N         = Number of scores/ candidates   

Example:

Score x                  =   20,   30,   30,   40,   50,   50,   50,   60,   70,   80
Therefore mean x  = 20  +  30  +  30  +  40  +  50  +  50  +  50  +  60  +   70  +  80
10
= 480/10
= 48
If the above scores are presented in the frequency distribution table, the mean value can be obtained by totaling the product of frequency with their individual scores, and then the sum (fx) is divided by the total frequency(f) ie: 

Mean=         x =      fx                fx    = the sum of multipled frequency 
                                  f                                with individual scores
                                 f      = the sum of frequency




Median
Median is the value placed in mid – point among scores which are arranged in ascending or descending order. This median can be obtained by dividing one set of non-cumulative scores which are arranged into 2 equal parts. If the number of scores are even, the median is the mean of the two mid-point scores.

Example 1
Oval: 54
 


Score :              30,    45,    48,    48,    54,         55,    60,    62,    68

Median =  score in mid point = 54


Mode
Mode in basic statistics is score with the highest frequency in any score distribution. The symbol customarily used is Mo. The value of mode can be obtained  by arranging the score distribution in ascending or descending order. The score with the highest frequency is its mode.

Example:
Test score      :   52,  54,  54,  54,  57,  62,  63,  63,  63,  65
Mode, Mo =  54 (that is, score with highest frequency in its distribution).

Sometimes in a score distribution, there are two or more than two modes. A score distribution that has two mode values is known as   bi-mode 

Example :
Test  score           :   48,  53,  62,  62,  65,  70,  70,  70,  75
Bi-mode, Mo=  62 dan 70 (because the two values possess frequency 3 which is equal as well as the highest).

Not every score distribution possesses the mode. The value of mode will not exist when the score distribution possesses the same frequency.   

Measurement of Variability
Apart from mean, median and mode, variability measurement is also used to make analysis and interpretation of trainee’s achievement in any test. This measurement will give a more complete picture of marks distribution.


Data Interpretation
Making analysis and interpretation is the most important stage in any evaluation activity. This is because the feedback from the interpretation will be the basic consideration for any recommendation and follow-up action, to improve unsatisfactory tests and practices or as basis for modification of future testing processes.  

The process of analysis and interpretation can only be carried out after data have been collected from the testing and measurement processes. Data collected in this way will be presented in statistical form such as table or data representation, for example: histogram, line graph, frequency curve and so on. Interpretation will be carried out by referring to the table or the data representation. Following that, the conclusion will be made based on data interpretation.





 



Figure 14: The process of Data Analysis of an Assessment 


3.2       Assigning Grades

Grades assigned to trainee’s work should represent the extent to which the instructional objectives (i.e., the intended learning outcomes) have been achieved, and should be in harmony with the grading policies of the school. Some schools have both clearly defined objectives and grading policies; many schools have neither. With or without the guidance of clear-cut policies and procedures, the assigning of grades is a difficult and frustrating task. It is somewhat easier if valid evidence of achievement has been gathered throughout the course.
Assessment of learning during instruction might include the use of objective and essay tests, ratings, papers, and various types of projects or laboratory work. The probem of grading is that of summarizing this diverse collection of information into a single letter grade or brief report. Because the single letter guide (e.g., A, B, C, D, F ) is the most widely used grading system, we shall focus on how best to assign such grades. This involves several important considerations: (1) What frame of reference or standard, should be used to report level of performance? (2) How should the performance data be combined for grading? (3) What guidelines should be followed to provide the most effective and fair grading system? Each of these will be discussed in turn.

i.      Selecting the Basis for Grading

Letter grades are typically assigned by comparing a trainee’s performance to a pre-specified standard of performance (absolute grading) or to the performance of the members of a group (relative grading). In some cases, grades are based on or modified by learning ability of the trainee, the amount of improvement shown over a given instructional period, or trainee effort. As we shall see later, these factors provide an inadequate basis for assigning grades.



a.    Absolute Grading
A common type of absolute grading is the use of letter grades defined by a 100-point system. Whether assigning grades to an individual set of test scores, or as a basis for the final grades in a course, the set of grades might be expressed as one of the following:

 

A=
B=
C=
D=
F=
POINTS
90-100
80-89
70-79
60-69
below 60
POINTS
95-100
85-94
75-84
65-74
below 65
POINTS
91-100
86-90
81-85
75-80
below 75

In the case of an individual test, this 100-point system might represent the percentage of items correct or the total number of points earned on the test. When used as a final grade, it typically represents a combining of scores from various tests and other assessment results. In any event, it provides an absolute basis for assigning letter grades.
Which set of points provides the best basis for assigning grades? There is no way of knowing. The distribution of points is arbitrary. Whatever distribution is used, however, should be based on the instructor’s experience with this and past groups of trainees, knowledge concerning the difficulty of the intended learning outcomes, the difficulty of the tests and other assessments used, the conditions of learning and the like. These are all subjective judgements, however, and shifts in the proportion of trainees getting the letter grade of A or F are difficult to evaluate. Do a larger number of grades of A represent improved instruction and better study habits by trainees, or easier tests and less rigid grading of papers and projects? Do more failures indicate poor teaching, inadequate study, or tests that have inadvertently increased in difficulty?
Despite the problem of setting meaningful standards for an absolute grading system, this method is widely used in schools. It is most appropriate in mastery-type programs where the set of learning tasks has been clearly specified, the standards have been defined in terms of the learning tasks, and the tests and other assessment techniques have been designed for criterian-referenced interpretation. All too frequently, however, absolute grading is based on some hodgepodge of ill- defined achievement results. When the distribution of points does not fit the grading scale, the points are adjusted upward or downward by some obscure formula to get a closer fit. Needless to say, such grades do not provide a meaningful report of the extent to which the intended learning outcomes have been achieved.



b.    Relative Grading
When assigning grades on a relatives basis, the trainees are typically ranked in order of performance (based on a set of test scores or combined assessment results) and the trainees ranking highest receive a letter grade of A, the next highest receive a B, and so on. What proportion of trainees should receive each grade is pre-determined and might appear as one of the following:


 

A
B
C
D
F
PERCENT OF TRAINEES
15
25
45
10
5
PERCENT OF TRAINEES
10-20
20-30
40-50
10-20
0-10

The percentage of trainees to be assigned each grade is just as arbitrary as the selection of points for each grade in the absolute grading system. The use of a range of percents (e.g., A=10-20 percent) should probably be favored because it makes some allowance for differences in the ability level of the class. It does not make sense to assign 15 percent A’s to both a regular class and a gifted class. Likewise, in an advanced course a larger proportion of A’s and B’s should be assigned and fewer (if any) F’s because the low-achieveing trainees have been weeded out in earlier courses. Where these percentages have been set by the school system, one has little choice but to follow the school practice—at least until efforts to change it are successful.
Older measurement books recommended using the normal curve to assign grades. This resulted in the same percent of A’s and F’s (e.g., 7 percent) and B’s and D’s (e.g., 38 percent). Although some instructors may still use such a system, its use should be discouraged. Measures of achievement in classroom groups seldom yield normally distributed scores. Also, to maintain the same proportion of grades, especially failures, at different grade levels does not take into account that the trainee population is becoming increasingly select as the failing trainees are held back or drop out of school.
The relative grading system requires a reliable ranking of trainees; thus, it is most meaningful when the achievement measures provide a wide range of scores. This makes it possible to draw the lines between grades with greater assurance that misclassifications will be kept to a minimum. Ideally, of course, the spread of scores should be based on the difficulty and complexity of the material learned. For example, an A should not simply represent more knowledge of factual material, but a higher level of understanding, application, and thinking skills. Thus, although norm-referenced interpretation is being utilized, the real meaning of the grades comes from referring back to the nature of the achievement that each grade represents.


c.    Learning Ability, Improvement and Effort
In some cases, attempts are made to base grades on achievement in relation to learning ability, the amount of improvement in achievement, or the amount of effort a trainee puts forth. All of these procedures have problems that distort the meaning of grades.
Grading on the basis of learning ability has sometimes been used at the elementary level to motivate trainees with less ability. At first glance, it seems sensible to give a grade of A to trainees who are achieving all that they are capable of achieving. There are two major problems with this procedure, however. First, it is difficult, if not impossible, to get a dependable measure of learning ability apart from achievement. Both tests have similar type items and measure similar concepts. Second, the meaning of the grades become distorted. A low-ability trainee with average performance might receive an A, whereas a high-ability trainee with average performance receives a grade of C. Obviously the grades are no longer very meaningful as indicators of achievement.
Using the amount of improvement as a basis for grading also has its problems. For one, the different scores between measures of achievement over short spans of time are very unreliable. For another, trainees who score high on the entry test cannot possibly get a high grade because little improvement can be shown. Trainees who know about this grading procedure ahead of time can, of course, do poorly on the first test and be assured of a fairly good grade. This is not an uncommon practice where grades are based on improvement. Finally, the grades lack meaning as indicators of achievement, when an increase in achievement is more considerable, the trainee  might receive an A, while achieving trainee with little improvement receives a B or C.
Grading on the basis of effort, or adjusting grades for effort, also distorts the meaning of the results. Low achieving trainees who put forth great effort receive higher grades than their achievement warrants and high-achieving trainees who put forth little effort are likely to receive lower grades than deserved. Although such grading seems to serve a motivational function for low-achieving trainees, the grades become meaningless as measures of the extent to which trainees are achieving the intended learning outcomes.
In summary, assigning grades that take into account learning ability, amount of improvement, or effort simply contaminates the grades and distorts their meaning as indicators of trainee achievement. Other factors may be rated separately on a report card, but they should not be allowed to distort the meaning of the letter grade.

d.    A Combination of Absolute and Relative Grading
Grades should represent the degree of which instructional objectives (i.e., intended learning outcomes) are achieved by trainees. Some of the objectives of instruction are concerned with minimum essentials that must be mastered if a trainee is to proceed to the next level of instruction. Other objectives are concerned with learning outcomes that are never fully achieved but towards which trainees can show varying degrees of progress. The first are called minimal objective and the second developmental objectives.
Minimal objectives are concerned with the knowledge, skill, and other lower-level learning outcomes that represent the minimum essentials of the course. In order to receive a passing grade, a trainee must demonstrate that this basic knowledge and skill, which are pre-requisite to further learning in the area, have been learned to a satisfactory degree. Developmental objectives are concerned with higher-level learning outcomes such as understanding, application, and thinking skills. Although we can identify degrees of progress toward these objectives, we cannot expect to ever fully achieve them. In science, for example, we might expect all trainees to master basic terms, concepts, and skills, but encourage each trainee to proceed as far as he or she can in understanding and applying the scientific process, and in developing the intellectual skills used by scientists. Similarly, all trainees in math might be expected to master the fundamental operations, but show wide diversity in problem-solving ability and mathematical reasoning. In all instructional areas there are lower-level objectives that should be mastered by all trainees and higher-level objectives that provide goals that never can be fully achieved. Thus, with minimal objectives, we attempt to obtain a uniformly high level of performance for all trainees, and with developmental objectives we encourage each trainee to strive for maximum development.
As indicated earlier, the pass-fail decision should be based on whether or not the minimal objectives have been mastered. Trainees demonstrating that they have achieved the minimal objectives, and thus have the necessary prerequisites for success at the next level of instruction, should be passed. Those who do not should fail. This requires an absolute judgment, not a relative one. Trainees should not be failed simply because their achievement places them near the bottom of some group. It is the nature of the achievement that is significant.
Above the pass-fail cutoff point, grades should be assigned on a relative basis. This is because trainee’s scores will tend to be spread out in terms of their degree of development beyond the minimal level. Trainees cannot be expected to master the more complex learning outcomes described by developmental objectives, but they can show varying degrees of progress towards their attainment. Although  absolute grading could be used, this is not possible at this time. The best we can do is obtain a spread of trainee achievement scores in terms of the complexity of the learning outcomes attained and use relative grading. If properly done, a grade of A would represent greater achievement of the higher-level learning outcomes and not simply a high relative position in the group. This would assume, of course, that tests and other assessment techniques would measure a range of achievement from simple to complex, and not just knowledge of factual information and simple skills, as is commonly done now.
In many cases the school will dictate the grading policy, including the basis on which the grades are to be assigned. Regardless of the system used, it is important to relate the grades back to trainee achievement so that different grades represent different levels of performance. Letter grades without an achievement referent tend to have little meaning.

e.    Combining Data For Grading
Assigning grades typically involves combining results from various types of assessment, including such things as tests, projects, papers, and laboratory work. If each element is to be included in the grade in terms of its relative importance, the data must be combined in a way that proper weights are used. For example, if we want test scores to count 50 percent, paper 25 percent, and laboratory work 25 percent of the grade, we need a method that will convert  results  into numerical scores first.
The method of combining scores so that proper weights are obtained for each element is not as simple as it seems. A common procedure is simply to add scores together if they are to have equal weight and to multiply by two if an element is to count twice as much as the other. This typically will not result in each element receiving its proper weight, even if the highest possible score is the same for all sets of scores. How much influence each element has in a composite score is determined by the spread, or variability of score and not the number of total points.
The problem of weighting scores when combining them can be best illustrated with a simple example. Let’s assume we only have two measures of achievement and we want to give them equal weight in a grade. Our two sets of achievement scores have score ranges as follows:

Test scores       20 to 100
Laboratory work        30 to 5

If we simply add together a trainee’s test score and score on laboratory work, the grade the trainee receive would be determined largely by the test score because of its wide spread of scores. This can be shown by comparing a trainee who has the highest test score and lowest laboratory score (Trainee 1) with a trainee who has the lowest test score and highest laboratory score (Trainee 2).


Test score
Laboratory score
TRAINEE 1
100
30
TRAINEE 2
20
50

Composite score
 130
 70

It is quite obvious that the composite score does not represent equal weightage.
With sets of scores like those for our test and laboratory work, it is not uncommon for instructors to attempt to give them equal weight by making the top possible score equal. This can be done, of course, by multiplying the score on laboratory work by two, making the highest possible score 100 for both measures. Here is how the two composite scores for our hypothetical trainees would compare under this system:







Test score
Laboratory score (x 2)
TRAINEE 1
100
60
TRAINEE 2
20
100

Composite score
 160
 120

Our composite scores make clear that equalizing the maximum possible score does not provide equal weights either. As noted earlier, the influence a measure has on the composite score depends on the spread, or variability, of scores. Thus, the greater the spread of scores, the larger the contribution to the composite score.
We can give equal weight to our two sets of scores by using the range of scores in each set. Because our test scores have a range of 80 (100-20) and our laboratory scores have a range of 20 (50-30), we must multiply each laboratory score by four to equalize the spread of scores and, thus, give them equal weight in the somposite score. Here are the composite scores for our two hypothetical trainees:


Test score
Laboratory score (x 4)
TRAINEE 1
100
120
TRAINEE 2
20
200

Composite score
 220
 220

At last we have a system that gives the two measures equal  weight in the composite score. Note that if we wanted to count our test score twice as much as the laboratory score, we would multiply it by two and the laboratory score by four. However, if we wanted to have our laboratory score count twice as much as the test score, we would have to multiply each laboratory score by eight. Thus, when we ariginally multiplied our laboratory score by four, we simply adjusted the spread of those scores to match the spread of the test scores. When the two sets of scores have the same range of scores, we can then assign additional weights in terms of their relative importance.
      The range of scores provides only a rough approximation of score variability but it is satisfactory for most classroom grading purposes. A more dependable basis for weighting grade components can be obtained with the standard  deviation (see Oosterhof, 1990).
      Some instructors obtain a composite grade by converting all test scores and other assessments to letter grades, converting the letter grades to numbers (e.g., A = 4, B = 3, C = 2, D = 1, F = 0) and then averaging them for a final grade. When this procedure is followed, information is lost because the data are reduced to only five categories. For example, a trainee with a high A and high B would receive the same average grade as a trainee with a low A and a low B. To overcome this problem, pluses and minuses are sometimes added (e.g., A+ = 12, A = 11, A- = 10, B+ = 9, B = 8, B- = 7, etc). This provides more categories but some information is still lost. A better solution is to use numerical scores on all assessments and then combine these numerical scores into a composite score before assigning grades.

ii.    Guidelines for Effective and Fair Grading

Assigning grades that provide a valid measure of trainee’s achievement, that have a meaning beyond the classroom in which they are given, and that are considered to be fair by trainees, is a difficult but important part of teaching. The following guidelines provide a framework that should help clarify and standardize the task.

  1. Inform trainees at the beginning of instruction what grading procedures will be used. This should include what will be included in the final grade (e.g., tests, projects, laboratory work) and how much weight will be given to each element. It should also include a description, in achievement terms, of what each letter grade represents. A descriptive handout may be helpful.
  2. Base grades on trainee achievement, and achievement only. Grades should represent the extent to which the intended learning outcomes were achieved by trainees. They should not be contaminated by trainee effort, tardiness, misbehavior, or other extraneous factors. These can be reported on seperatedly, but they should not influence the achievement grade. If they are permitted to become a part of the grade, the meaning of the grade as an indicator of achievement is lost.
  3. Base grades on a wide variety of valid assessment data. All too frequently, grades are based primarily, if not entirely, on test scores. If grades are to be sound indicators of achievement, all important learning outcomes must be assessed and the results included in the final grade. Evaluation of papers, projects, and laboratory work is not as reliable as objective test scores but to eliminate them lowers the validity of the grades.
  4. When combining scores for grading, use a proper weighting technique. As noted earlier, the influence of a component on the overall grade is determined by the spread, or variability, of the scores. Thus, in combining scores to obtain a composite for assigning grades, be sure the spread of scores is equalized before weighting and combining them.
  5. Select an appropriate frame of reference for grading. If the entire instruction is based on mastery learning, it is necessary to use an absolute standard for grading and to define the grades in mastery terms. For conventional classroom instruction, the pass-fail distinction should be described in absolute terms and the grades above that determined by relative position in the group. However, these relative letter grades should have achievement referents representing learning outcomes ranging from simple to complex.
  6. Review borderline cases by re-examining all achievement evidences. When setting cutoff points for each grade, there is typically a trainee or two just below the cut-off line. Measurement errors alone might be responsible for a trainee being just below (or above) the line. Also, the composite score may contain a clerical error, or one low test score contributing to the composite score, may be due to illness or some other extraneous factor. In any event, it is wise to review the data for borderline cases and make any needed adjustments. When in doubt, fair grading would favor giving the trainee the higher grade.




I-031-3(10) IS 4
PREPARATION OF REPORT

Preparing a report is the last stage in all evaluation processes. The evaluation report usually contains the following items:
a)            Introduction to the research topic.
b)             Description / explanation of theme or topic.
c)            Objective or purpose of evaluation.
d)            Research methodology.
e)            Procedures for data gathering.
f)             Collected or gathered data are attached together with record, document and forms for carrying out   analysis and interpretation. 
g)            Report on analysis, interpretation and conclusion.
h)           Recommendation and follow-up action based on interpretation and conclusion. 
i)             Bibliografy and reference materials. 


No comments:

Post a Comment