## Higher Education

Over the course of an academic career the average student will be exposed to a variety of grading systems and procedures. Although some of these systems may be qualitative in nature, such as an annual or semiannual written narrative, the vast majority are quantitative and depend upon numerical or alphanumerical metrics. Perhaps the most familiar of these involves the letters "A" through "F," where "A" is usually given a value of 4.0 and is characterized in words as outstanding or excellent and "F" is given a value of 0.0 and is described as unsatisfactory or failing. The grades of A through F are usually derived from some more differentiated quantitative value such as test score, in which the specific nature of the relationship between grade and test score may take a variety of different forms: (e.g., an A is defined by a score of 90% or better or by a value that falls in the top 5–10% of scores independent of absolute value, and so on). Regardless of the specific translation of test performance into letter grade, the point to keep in mind is that the A–F scale defines the most frequent grading system used in higher education over the past half century or more.

### Variations in the Grading System

Like all prototypes, the A–F system admits many variations. These often take the form of plusses and minuses, thereby producing a scale having the possibility of fifteen distinct units: A+, A, A–, B+, B … F–. In actual practice, the grade of A+ is scarcely ever used and the same is true for D+ and D–and F+ and F–, thereby yielding a scale of between eight to ten units. Generally speaking, the greater the number of units in the grading system the more precisely does it hope to quantify student performance. What is interesting in this regard are fluctuations in the actual number of units used in different historical eras. Without going too deeply into the relevant historical facts, it is clear that certain historical periods, such as the 1960s, reduced the grading system to two or so units–Pass, No Credit (P/NC)–whereas other periods, such as the 1980s, expanded it to ten, eleven or twelve units.

Variations in the breadth of the grading system would seem to have significant educational implications. At a minimum, these differences may be taken to imply that scales having a large number of units indicate a relative comfort in making precise distinctions, whereas those having fewer units suggest a relative discomfort in making such distinctions. In the case of more differentiated systems, distinctions and rankings are significant, and individual achievement is emphasized; in the case of less differentiated systems, distinctions and rankings are de-emphasized and interstudent competition is minimized. To some degree, it is possible to view fluctuations in American grading systems as reflecting a more general ambivalence the society has in regard to competition and cooperation, between individual recognition and social equity. Educational institutions sometimes emphasize strict evaluation, competition, and individual achievement, whereas at other times they emphasize less precise evaluation, cooperation, and sympathetic understanding for students of all achievement levels.

Another property of grading systems is that individual class grades often are combined to produce an overall metric called the grade point average or GPA. Unlike its constituent values, which usually are carried to only one (or no numerically significant places), the GPA presents a metric of 400 units yielding the possibility that a GPA of 3.00 will locate the student in the category of "good" whereas a value of2.99 will exclude him or her from this category. In the same way, honors, admission to graduate school, preliminary selection for interviews by a desirable company, and so forth, may be defined by a single point difference on the GPA scale (e.g., 3.50 versus3.49 for Phi Beta Kappa, etc.).

Because GPAs are significant in categorizing student performance, a number of evaluations have been made of their reliability and validity. One issue to be addressed here concerns field of study, where it is well documented that classes in the natural sciences and business produce lower overall grades than those in the humanities or social sciences. What this means is that it is unreasonable to equate grade values across disciplines. It also suggests that the GPA is composed of unequal components and that students may be able to secure a higher GPA by a judicious selection of courses.

Although other factors may be mentioned aside from academic discipline (such as SAT level of school, quality and nature of tests, etc.) the conclusion must be that the GPA is a poor measure and should not be used by itself in coming to significant decisions about the quality of student performance or differences between departments and/or educational institutions. The GPA is also a relatively poor basis on which to predict future performance, which perhaps explains why such attempts are never very impressive. In fact, a number of meta-analyses of this relationship, conducted every ten years or so since 1965, reveals that the median correlation between GPA and future performance is 0.18; a value that is neither very useful nor impressive. The strongest relationship between GPA and future achievement is usually found between undergraduate GPA and first-year performance in graduate or professional school.

Grading systems represent just one aspect of an interconnecting network of educational processes, and any attempt to describe grading systems without considering other aspects of this network must necessarily be incomplete. Perhaps the most important of these processes concerns the procedures used to produce grades in the first place, namely, the classroom test. Here, of course, are purely formal differences; for example, between multiple choice and essay tests, or between in-class and take-home tests or papers. Also to be included are the quality of test items themselves not only in terms of content but also in terms of the clarity of the question and, in the case of multiple choice tests, of the distractors.

One way to capture the complexity of possible ways in which grades are produced is to consider the set of implicit choices that lie behind an instructor's use of a specific testing and/or grading procedure. Included here are such questions as: What evaluation procedure should I use? Term papers, classroom discussions, or in-class tests? If I choose tests, what kind(s)? Essay, true/false, fill-in-the-blank, matching, or multiple-choice? If I choose multiple-choice, what grading model should I use? Normal curve, percent-correct, improvement over preceding tests? If I choose percent-correct, how many tests should I give? Final only, two in-class tests and a final, one midterm and one final? How should I weight each test if I choose the midterm-final pattern? Midterm equals final, midterm is equivalent to twice the final exam grade, final equals twice the midterm grade? What grade report system should I use? P/F; A, B, C, D, F; or A+, A, A–, B+, … F? An examination of this collection of possible choices suggests that instructors have a large number of options as to how to go about testing and grading their students.

Any consideration of the ways in which testing and grading relate to one another must also deal with the ways in which one or both of these activities relate to learning and teaching. The relationship between learning and testing is a fairly direct (if neglected) one, especially if tests are used not only to evaluate student achievement but also to reinforce or promote learning itself. Thus it is easy to develop a classroom question or exercise that requires the student to read some material before being able to answer the question or complete the exercise. Teaching, on the other hand, would seem to be somewhat further removed from issues of testing and grading, although the specific testing and grading plan used by the instructor does inform the student as to what constitutes relevant knowledge as well as what attitude he or she holds toward precise evaluation and academic competition.

## BIBLIOGRAPHY

BAIRD, LEONARD L. 1985. "Do Tests and Grades Predict Adult Achievement?" Research in Higher Education 23:3–85.

CURRETON, LOUISE W. 1971. "The History of Grading Practices." Measurement in Education 2:1–9.

DUKE, J. D. 1983. "Disparities in Grading Practice: Some Resulting Inequities and a Proposed New Index of Academic Achievement." Psychological Reports 53:1023–1080.

GOLDMAN, ROY D. ; SCHMIDT, DONALD, E. ; HEWITT, BARBARA, N.; and FISHER, RONALD. 1974. "Grading Practices in Different Major Fields." American Education Research Journal 11:343–357.

MILTON, E. OHMER; POLLIO, HOWARD R.; and EISON, JAMES A. 1986. Making Sense of College Grades. San Francisco: Jossey-Bass.

POLLIO, HOWARD R.; and BECK, HALL P. 2000. "When the Tail Wags the Dog: Perceptions of Learning and Grade Orientation in and by Contemporary College Students and Faculty." The Journal of Higher Education 71:84–102.

HOWARD R. POLLIO