7 minute read


Performance Assessment

The term performance assessment (PA) is typically used to refer to a class of assessments that is based on observation and judgment. That is, in PA an assessor usually observes a performance or the product of a performance and judges its quality. For example, to judge one's competence to operate an automobile, it is normally required that one pass a road test, during which actual driving is observed and evaluated. Similarly, Olympic athletes are judged on the basis of observed performances. PA has long been used to judge proficiency in industrial, military, and artistic settings, and interest in its application to educational settings has grown at the start of the twenty-first century.

Educators' interest in PA can be attributed to several factors. It has been argued that performance measures offer a potential advantage of increased validity over other forms of testing that rely on indirect indicators of a desired competence or proficiency. That is, to assess ability to spell one might prefer to have direct evidence that a person can spell words correctly rather than inferring the ability from tasks that involve identifying misspelled words in a list. Proponents of performance assessment have identified many possible benefits, such as allowing a broad range of learning outcomes to be assessed and preserving the complex nature of disciplinary knowledge and inquiry, including conceptual understanding, problem-solving skills, and the application of knowledge and understanding to unique situations. Of particular interest is the potential of PA to capture aspects of higher-order thinking and reasoning, which are difficult to test in other ways.

Moreover, because some research has reported that teachers tend to adapt their instructional practice to reflect the form and content of external assessments, and because performance assessments tend to be better than conventional forms of testing at capturing more complex instructional goals and intentions, it has been argued that "teaching to the test" might be a positive consequence if PA were used to evaluate student achievement. Finally, some proponents have argued that PA could be more equitable than other forms of assessment because PA can engage students in "authentic," contextualized performance, closely related to important instructional goals, thus avoiding the sources of bias associated with testing rapid recall of decontextualized information.

Educational Uses of Performance Assessment

Although performance assessment has been employed in many educational settings, including the assessment of teachers, a primary use in education has been to assess student learning outcomes. PA has long been used in classrooms by teachers to determine what has been learned and by whom. PA may be applied in the classroom in informal ways (as when a teacher observes a student as she solves a problem during seat work) or in more formal ways (as when a teacher collects and scores students' written essays). Within the classroom PA can serve as a means of assigning course grades, communicating expectations, providing feedback to students, and guiding instructional decisions. When PA is used for internal classroom assessment, both the form and content of the assessment can be closely aligned with a teacher's instructional goals. Therefore, the use of performance assessment in the classroom has been seen by some as a promising means of accomplishing a long-standing, elusive goal–namely, the integration of instruction and assessment.

Performance assessment has also been employed in the external assessment of student learning outcomes. PA received significant attention from educators and assessment specialists during the latter part of the 1980s and throughout the 1990s. This increased interest in PA occurred as subject matter standards were established and corresponding changes in instructional practice were envisioned. A growing dissatisfaction with selected-response testing (e.g., true/false questions and multiple-choice items) and an awareness of advances in research in cognition and instruction also spawned interest in PA. Constructed-response tasks (e.g., tasks calling for brief or extended explanations or justifications) became increasingly popular as a means of capturing much of what is valued instructionally in a form that could be included in an external assessment of student achievement. In addition, for subjects such as science and mathematics, tasks that involve handson use of materials and tools have been developed. The net result of approximately fifteen years of research and development effort is the inclusion of written essays and constructed-response tasks in tests intended to assess achievement in various subject areas, including writing, history, mathematics, and science. A survey of state assessment practices in the mid-1990s found that thirty-four states required writing samples, and ten states incorporated constructed-response tasks into their assessments.

Performance Assessment: Challenges and Opportunities

A variety of technical and feasibility issues have plagued attempts to employ PA on a large scale. Among the technical issues that await satisfactory resolution are concerns about ensuring generalizability and comparability of performance across tasks and concerns about the scoring of complex tasks and the appropriate interpretation of performances. Efforts to use PA have also been limited due to concerns about the relatively high costs of development, administration, and scoring, when compared to more conventional testing. Finally, despite the hopes of advocates of PA regarding the likely benefits of its widespread adoption, some analyses have raised concerns about equity issues and the limited positive impact on classroom teaching of using PA in external testing.

Despite the problems that have prevented widespread adoption of performance assessment, many educators and assessment experts remain enthusiastic about the potential of PA to address many limitations of other forms of assessment. In particular, advances in the cognitive sciences and technology, along with the increasing availability of sophisticated technological tools in educational settings, may provide new opportunities to resolve many of these issues. For example, the costs of development, administration, and scoring may be decreased through the use of new technologies. And generalizability across tasks may be increased through the use of intelligent systems that offer ongoing assessment well integrated with instruction and sensitive to changes in students' understanding and performance, with performance data collected over a long period of time as opposed to one-time, on-demand testing.


AIRASIAN, PETER W. 1991. Classroom Assessment. New York: McGraw-Hill.

BAXTER, GAIL P., and GLASER, ROBERT. 1998. "Investigating the Cognitive Complexity of Science Assessments." Educational Measurement: Issuesand Practice 17 (3):37–45.

BENNETT, RANDY E., and WARD, WILLIAM C., eds. 1993. Construction Versus Choice in Cognitive Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.

BOND, LLOYD A. 1995. "Unintended Consequences of Performance Assessment: Issues of Bias and Fairness." Educational Measurement: Issues andPractice 14 (4):21–24.

BOND, LLOYD A.; BRASKAMP, DAVID; and ROEBER, EDWARD. 1996. The Status Report of the Assessment Programs in the United States. Oak Brook, IL: North Central Regional Educational Laboratory.

BRENNAN, ROBERT L., and JOHNSON, EUGENE G. 1995. "Generalizability of Performance Assessments." Educational Measurement: Issues andPractice 14 (4):9–12, 27.

COLE, NANCY S. 1988. "A Realist's Appraisal of the Prospects for Unifying Instruction and Assessment." Assessment in the Service of Learning: Proceedings of the 1987 ETS Invitational Conference. Princeton, NJ: Educational Testing Service.

DARLING-HAMMOND, LINDA. 1995. "Equity Issues in Performance-Based Assessment." In Equity and Excellence in Educational Testing and Assessment, ed. Michael T. Nettles and Arie L. Nettles. Boston: Kluwer.

FREDERIKSEN, JOHN R., and COLLINS, ALLAN. 1989. "A Systems Approach to Educational Testing." Educational Researcher 18 (9):27–32.

GAO, X. JAMES; SHAVELSON, RICHARD J.; and BAXTER, GAIL P. 1994. "Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems." Applied Measurementin Education 7:323–342.

GLASER, ROBERT, and SILVER, EDWARD A. 1994. "Assessment, Testing, and Instruction: Retrospect and Prospect." In Review of Research in Education, Vol. 20, ed. Linda Darling-Hammond. Washington, DC: American Educational Research Association.

GREEN, BERT F. 1995. "Comparability of Scores from Performance Assessments." Educational Measurement: Issues and Practice 14 (4):13–15,24.

HEUBERT, JAY, and HAUSER, ROBERT. 1999. High Stakes: Testing for Tracking, Promotion, and Graduation. Washington, DC: National Academy Press.

MESSICK, SAMUEL. 1994. "The Interplay of Evidence and Consequences in the Validation of Performance Assessments." Educational Researcher 23 (1):13–23.

MESSICK, SAMUEL, ed. 1995. "Special Issue: Values and Standards in Performance Assessment: Issues, Findings, and Viewpoints." Educational Measurement: Issues and Practice 14 (4).

PELLEGRINO, JAMES; CHUDOWSKY, NAOMI; and GLASER, ROBERT. 2001. Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: National Academy Press.

RECKASE, MARK, ed. 1993. "Special Issue: Performance Assessment." Journal of Educational Measurement 30 (3).

RESNICK, LAUREN B., and RESNICK, DANIEL P. 1992. "Assessing the Thinking Curriculum: New Tools for Educational Reform." In Changing Assessments: Alternative Views of Aptitude, Achievement, and Instruction, ed. Bernard R. Gifford and Mary C. O'Connor. Boston: Kluwer.

SHAVELSON, RICHARD J.; BAXTER, GAIL P.; and GAO, X. JAMES. 1993. "Sampling Variability of Performance Assessments." Journal of Educational Measurement 30:215–232.

SHAVELSON, RICHARD J.; BAXTER, GAIL P.; and PINE, JERRY. 1992. "Performance Assessments: Political Rhetoric and Measurement Reality." Educational Researcher 21 (4):22–27.

SILVER, EDWARD A.; ALACACI, CENGIZ; and STYLIANOU, DESPINA. 2000. "Students' Performance on Extended Constructed-Response Tasks." In Results from the Seventh Mathematics Assessment of the National Assessment of Educational Progress, ed. Edward A. Silver and Patricia A. Kenney. Reston, VA: National Council of Teachers of Mathematics.

SMITH, MARY L. 1991. "Put to the Test: The Effects of External Testing on Teachers." Educational Researcher 20 (5):8–11.

WIGGINS, GRANT. 1989a. "Teaching to the (Authentic) Test." Educational Leadership 46 (7):41–47.

WIGGINS, GRANT. 1989b. "A True Test: Toward More Authentic and Equitable Assessment." Phi Delta Kappan 70:703–713.

WIGGINS, GRANT. 1992. "Creating Tests Worth Taking." Educational Leadership 49 (8):26–33.

WOLF, DENNIE; BIXBY, JANET; GLENN, JOHN, III; and GARDNER, HOWARD. 1991. "To Use Their Minds Well: Investigating New Forms of Student Assessment." In Review of Research in Education, Vol. 17, ed. Gerald Grant. Washington, DC: American Educational Research Association.


Additional topics

Education Encyclopedia - StateUniversity.comEducation Encyclopedia: AACSB International - Program to Septima Poinsette Clark (1898–1987)Assessment - Dynamic Assessment, National Assessment Of Educational Progress, Performance Assessment, Portfolio Assessment - CLASSROOM ASSESSMENT