Over the course of an academic career the average student will be exposed to a variety of grading systems and procedures. Although some of these systems may be qualitative in nature, such as an annual or semiannual written narrative, the vast majority are quantitative and depend upon numerical or alphanumerical metrics. Perhaps the most familiar of these involves the letters "A" through "F," where "A" is usually given a value of 4.0 and is characterized in words as outstanding or excellent and "F" is given a value of 0.0 and is described as unsatisfactory or failing. The grades of A through F are usually derived from some more differentiated quantitative value such as test score, in which the specific nature of the relationship between grade and test score may take a variety of different forms: (e.g., an A is defined by a score of 90% or better or by a value that falls in the top 5–10% of scores independent of absolute value, and so on). Regardless of the specific translation of test performance into letter grade, the point to keep in mind is that the A–F scale defines the most frequent grading system used in higher education over the past half century or more.

Variations in the Grading System

Like all prototypes, the A–F system admits many variations. These often take the form of plusses and minuses, thereby producing a scale having the possibility of fifteen distinct units: A+, A, A–, B+, B … F–. In actual practice, the grade of A+ is scarcely ever used and the same is true for D+ and D–and F+ and F–, thereby yielding a scale of between eight to ten units. Generally speaking, the greater the number of units in the grading system the more precisely does it hope to quantify student performance. What is interesting in this regard are fluctuations in the actual number of units used in different historical eras. Without going too deeply into the relevant historical facts, it is clear that certain historical periods, such as the 1960s, reduced the grading system to two or so units–Pass, No Credit (P/NC)–whereas other periods, such as the 1980s, expanded it to ten, eleven or twelve units.

Variations in the breadth of the grading system would seem to have significant educational implications. At a minimum, these differences may be taken to imply that scales having a large number of units indicate a relative comfort in making precise distinctions, whereas those having fewer units suggest a relative discomfort in making such distinctions. In the case of more differentiated systems, distinctions and rankings are significant, and individual achievement is emphasized; in the case of less differentiated systems, distinctions and rankings are de-emphasized and interstudent competition is minimized. To some degree, it is possible to view fluctuations in American grading systems as reflecting a more general ambivalence the society has in regard to competition and cooperation, between individual recognition and social equity. Educational institutions sometimes emphasize strict evaluation, competition, and individual achievement, whereas at other times they emphasize less precise evaluation, cooperation, and sympathetic understanding for students of all achievement levels.

Another property of grading systems is that individual class grades often are combined to produce an overall metric called the grade point average or GPA. Unlike its constituent values, which usually are carried to only one (or no numerically significant places), the GPA presents a metric of 400 units yielding the possibility that a GPA of 3.00 will locate the student in the category of "good" whereas a value of2.99 will exclude him or her from this category. In the same way, honors, admission to graduate school, preliminary selection for interviews by a desirable company, and so forth, may be defined by a single point difference on the GPA scale (e.g., 3.50 versus3.49 for Phi Beta Kappa, etc.).

Because GPAs are significant in categorizing student performance, a number of evaluations have been made of their reliability and validity. One issue to be addressed here concerns field of study, where it is well documented that classes in the natural sciences and business produce lower overall grades than those in the humanities or social sciences. What this means is that it is unreasonable to equate grade values across disciplines. It also suggests that the GPA is composed of unequal components and that students may be able to secure a higher GPA by a judicious selection of courses.

Although other factors may be mentioned aside from academic discipline (such as SAT level of school, quality and nature of tests, etc.) the conclusion must be that the GPA is a poor measure and should not be used by itself in coming to significant decisions about the quality of student performance or differences between departments and/or educational institutions. The GPA is also a relatively poor basis on which to predict future performance, which perhaps explains why such attempts are never very impressive. In fact, a number of meta-analyses of this relationship, conducted every ten years or so since 1965, reveals that the median correlation between GPA and future performance is 0.18; a value that is neither very useful nor impressive. The strongest relationship between GPA and future achievement is usually found between undergraduate GPA and first-year performance in graduate or professional school.

Despite such difficulties in understanding the exact meanings of grades and the GPA, they remain important social metrics and sometimes yield heated discussions over issues such as grade inflation. Although grade inflation has many different meanings, it usually is defined by an increase in the absolute number of As and Bs over some period of years. The tacit assumption here seems to be that any continuing increase in the overall percentage of "good grades" or in the overall GPA implies a corresponding decline in academic standards. Although historically there have been periods in which the number of good grades decreased (so-called grade deflation), significant social concerns usually only accompany the grade inflation pattern. This one-sided emphasis suggests that grade inflation is as much a sociopolitical issue as an educational one and depends upon the dubious equating of grades with money. What really seems of concern here is a value issue, not a cogent analogy that reveals anything significant about grades or money.

How Grades Are Produced

Grading systems represent just one aspect of an interconnecting network of educational processes, and any attempt to describe grading systems without considering other aspects of this network must necessarily be incomplete. Perhaps the most important of these processes concerns the procedures used to produce grades in the first place, namely, the classroom test. Here, of course, are purely formal differences; for example, between multiple choice and essay tests, or between in-class and take-home tests or papers. Also to be included are the quality of test items themselves not only in terms of content but also in terms of the clarity of the question and, in the case of multiple choice tests, of the distractors.

One way to capture the complexity of possible ways in which grades are produced is to consider the set of implicit choices that lie behind an instructor's use of a specific testing and/or grading procedure. Included here are such questions as: What evaluation procedure should I use? Term papers, classroom discussions, or in-class tests? If I choose tests, what kind(s)? Essay, true/false, fill-in-the-blank, matching, or multiple-choice? If I choose multiple-choice, what grading model should I use? Normal curve, percent-correct, improvement over preceding tests? If I choose percent-correct, how many tests should I give? Final only, two in-class tests and a final, one midterm and one final? How should I weight each test if I choose the midterm-final pattern? Midterm equals final, midterm is equivalent to twice the final exam grade, final equals twice the midterm grade? What grade report system should I use? P/F; A, B, C, D, F; or A+, A, A–, B+, … F? An examination of this collection of possible choices suggests that instructors have a large number of options as to how to go about testing and grading their students.

Any consideration of the ways in which testing and grading relate to one another must also deal with the ways in which one or both of these activities relate to learning and teaching. The relationship between learning and testing is a fairly direct (if neglected) one, especially if tests are used not only to evaluate student achievement but also to reinforce or promote learning itself. Thus it is easy to develop a classroom question or exercise that requires the student to read some material before being able to answer the question or complete the exercise. Teaching, on the other hand, would seem to be somewhat further removed from issues of testing and grading, although the specific testing and grading plan used by the instructor does inform the student as to what constitutes relevant knowledge as well as what attitude he or she holds toward precise evaluation and academic competition.

Students are not immune to testing and grade procedures, and educational researchers have made the distinction between students who are grade oriented and those who are learning oriented. Although this distinction is surely too one-dimensional, it does suggest that for some students the classroom is a place where they experience and enjoy learning for its own sake. For other students, however, the classroom is experienced as a crucible in which they are tested and in which the attainment of a good grade becomes more important than the learning itself. When students are asked how they became grade (or learning) oriented, they usually point to the actions of their teachers in emphasizing grades as a significant indicator of future success; alternatively, they describe instructors who are excited by promoting new learning in their classrooms. When college instructors are asked about the reason(s) for their emphasis on grades, they report that student behaviors–such as arguing over the scoring of a single question–make it necessary for them to maintain strict and well-defined grading standards in their classrooms. The ironic point is that both the student and the instructor see the "other" as emphasizing grades over learning, and neither sees this as a desirable state of affairs. What seems missing in this context is a clear recognition by both the instructor and the student that grades are best construed as a type of communication. When grades (and tests) are thought about in this way, they can be used to improve learning. As it now stands, however, the communicative purpose of grading is ordinarily submerged in their more ordinary use as a means of rating and sorting students for social and institutional purposes not directly tied to learning. Only when grades are integrated into a coherent teaching and learning strategy do they serve the purpose of providing useful and meaningful feedback not only to the larger culture but to the individual student as well.



