12 minute read


National Assessment Of Educational Progress

The primary means to monitor the status and development of American education, the National Assessment of Education Progress (NAEP), was conceived in 1963 when Francis Keppel, U.S. Commissioner of Education, appointed a committee to explore options for assessing the condition of education in the United States. The committee, chaired by Ralph Tyler, recommended that an information system be developed based on a battery of psychometric tests.

NAEP's Original Purpose and Design

A number of key features were recommended in the original design of NAEP, several of which were intended to make it substantially different than typical standardized tests of academic achievement. Many, but not all, of these features were incorporated into the first assessments and have persisted throughout NAEP's history. Others have changed in response to policy needs.

With respect to matters of content, each assessment cycle was supposed to target one or more broadly defined subject areas that corresponded to familiar components of school curricula, such as mathematics. For each subject area, panels of citizens would be asked to form consensus groups about appropriate learning objectives at each target age. Test questions or items were to be developed bearing a one-to-one correspondence to particular learning objectives. Thus, from NAEP's beginning, there have been heavy demands for content validity as part of the assessment development process.

Several interesting technical design features were proposed for the assessment program. Of special note was the use of matrix-sampling, a design that distributes large numbers of items broadly across school buildings, districts, and states but limits the number of items given to individual examinees. In essence, the assessment was designed to glean information from hundreds of items, several related to each of many testing objectives, while restricting the amount of time that any student has to spend responding to the assessment. The target period proposed was approximately fifty minutes per examinee. All test items were to be presented by trained personnel rather than by local school personnel in order to maintain uniformly high standards of administration.

The populations of interest for NAEP were to be all U.S. residents at ages 9, 13, and 17, as well as young adults. This would require the selection of private and public schools into the testing sample, as well as selection of examinees at each target age who were not in school. Results would be tabulated and presented by age and by demographic groups within age–but never by state, state subunit, school district, school, or individual. Assessment results would be reported to show the estimated percentage of the population or subpopulation that answered each item and task correctly. And finally, only a subset of the items would be released with each NAEP report. The unreleased items would remain secure, to be administered at a later testing for determining performance changes over time, thereby providing the basis for determining trends in achievement.

The agenda and design laid out for NAEP in the mid-1960s reflected the political and social realities of the time. Prominent among these was the resistance of state and local policymakers to a national curriculum; state and local leaders feared federal erosion of their autonomy and voiced concern about pressure for accountability. Several of NAEP's features thwarted perceptions of the program as a federal testing initiative addressing a nationally prescribed curriculum. Indeed, NAEP's design provided nationally and regionally representative data on the educational condition of American schools, while avoiding any implicit federal standards or state, district, and school comparisons. NAEP was coined the "nation's educational barometer." It became operational in 1969 and 1970 and the first assessments were in science, citizenship, and writing.

Pressures for Redesign of NAEP

As federal initiatives during the 1960s and 1970s expanded educational opportunities, they fostered an administrative imperative for assessment data to help gauge the effect on the nation's education system. NAEP's original design could not accommodate the increasing demands for data about educationally important populations and issues. Age-level (rather than grade-level) testing made it difficult to link NAEP results to state and local education policies and school practices. Furthermore, its reporting scheme allowed for measurement of change on individual items, but not on the broad subject areas; monitoring the educational experiences of students in varied racial and ethnic, language, and economic groups was difficult without summary scores. Increasingly, NAEP was asked to provide more information so that government and education officials would have a stronger basis for making judgments about the adequacy of education services; NAEP's constituents were seeking information that, in many respects, conflicted with the basic design of the program.

The first major redesign of NAEP took place in 1984, when responsibility for its development and administration was moved from the Education Commission of the States to the Educational Testing Service. The design for NAEP's second generation changed the procedures for sampling, objectivesetting, item development, data collection, and analysis. Tests were administered by age and grade groupings. Summary scores were provided for each subject area; scale scores were introduced for reporting purposes. These and other changes afforded the program much greater flexibility in responding to policy demands as they evolved.

Almost concurrently, however, the report A Nation at Risk was issued in 1983. It warned that America's schools and students were performing poorly and spawned a wave of state-level education reforms. As states invested more and more in their education systems, they sought information about the effectiveness of their efforts. State-level policymakers looked to NAEP for guidance on the effectiveness of alternative practices. The National Governors' Association issued a call for state-comparable achievement data, and a new report, The Nation's Report Card, recommended that NAEP be expanded to provide state-level results.

As the program retooled to accommodate this change, participants in a 1989 education summit in Charlottesville, Virginia, set out to expand NAEP even further. President George Bush and the nation's governors challenged the prevailing assumptions about national expectations for achievement in American schools. They established six national goals for education and specified the subjects and grades in which progress should be measured with respect to national and international frames of reference. By design, these subjects and grades paralleled NAEP's structure. The governors called on educators to hold students to "world-class" standards of knowledge and skill. The governors' commitment to high academic standards included a call for the reporting of NAEP results in relation to rigorous performance standards. They challenged NAEP to describe not only what students currently know and can do, but also what young people should know and be able to do as participants in an education system that holds its students to high standards.

NAEP in the Early Twenty-First Century

The program that took shape during the 1990s is the large and complex NAEP that exists in the early twenty-first century. The NAEP program continues to evolve in response to both policy challenges and results from federally mandated external evaluations. NAEP includes two distinct assessment programs with different instrumentation, sampling, administration, and reporting practices. The two assessments are referred to as trend NAEP and main NAEP.

Trend NAEP is a collection of test items in reading, writing, mathematics, and science that have been administered many times since the 1970s. As the name implies, trend NAEP is designed to document changes in academic performance over time. During the 1990s, trend NAEP was administered in 1990, 1992, 1994, 1996, and 1999. Trend NAEP is administered to nationally representative samples of students aged 9, 13, and 17 following the original NAEP design.

Main NAEP consists of test items that reflect current thinking about what students should know and be able to do in the NAEP subject areas. They are based on contemporary content and skill outlines developed by consensus panels for reading, writing, mathematics, science, U.S. history, world history, geography, civics, the arts, and foreign languages. These content frameworks are periodically reviewed and revised.

Main NAEP is further complicated by having two components, national NAEP and state NAEP. The former assesses nationally representative samples of students in grades 4, 8, and 12. In most but not all subjects, national NAEP is supposed to be administered two, three, or four times during a twelve-year period, to make it possible to examine short term trends in performance over a decade. State NAEP assessments are administered to state representative samples of students in states that voluntarily elect to participate in the program. State NAEP uses the same large-scale assessment materials as those used in national NAEP, but is only administered in grades four and eight in reading, writing, mathematics, and science. In contrast to national NAEP, the tests are administered by local school personnel rather than an independent contractor.

One of the most substantial changes in the main NAEP program is the reporting of results relative to performance standards. In each content area, performance standards are defined for three levels of achievement: basic, proficient, and advanced. The percentage of students at a given grade level whose performance is at or above an achievement level standard is reported, as are trends in the percentages over successive administrations of NAEP in a content area. Achievement level reporting is done for both main NAEP and state NAEP and has become one of the most controversial aspects of the NAEP program.

NAEP's complex design is mirrored by a complex governance structure. The program is governed by the National Assessment Governing Board (NAGB), appointed by the secretary of education but independent of the department. The board, authorized to set policy for NAEP, is designed to be broadly representative of NAEP's varied audiences. It selects the subject areas to be assessed and ensures that the content and skill frameworks that specify goals for assessment are produced through a national consensus process. In addition, NAGB establishes performance standards for each subject and grade tested, in consultation with its contractor for this task. NAGB also develops guidelines for NAEP reporting. The commissioner of education statistics, who leads the National Center for Education Statistics (NCES) in the U.S. Department of Education, retains responsibility for NAEP operations and technical quality control. NCES procures test development and administration services from cooperating private companies.

Evaluations of NAEP

As part of the process of transforming and expanding NAEP during the 1990s, Congress mandated periodic, independent evaluations of the NAEP program. Two such multiyear evaluations were conducted, the first by the National Academy of Education and the second by the National Academy of Sciences. Both evaluations examined several features of the NAEP program design including development of the assessment frameworks, the technical quality of the assessments, the validity of the achievement level reporting, and initiation of the state NAEP assessments. The evaluations concluded that there are many laudatory aspects of NAEP supporting its label as the "gold standard" for assessment of academic achievement. Among the positives is NAEP's attempt to develop broad, consensus-based content area frameworks, incorporate constructed response tasks and item formats that tap more complex forms of knowledge, use matrix sampling to cover a wide range of curriculum content area topics, and employ powerful statistical methods to analyze the results and develop summary scores. These evaluations also concluded that state NAEP, which had developmental status at the start of the 1990s, served a valuable purpose and should become a regular part of the NAEP program, which it did.

The two evaluations also saw considerable room for improvement in NAEP, in many of the areas mentioned above where strength already existed. Two areas of concern were of particular note. The first was the need to broaden the range of knowledge and cognitive skills that should be incorporated into NAEP's assessment frameworks and included as part of the assessment design. Both evaluations argued that NAEP was not fully taking advantage of advances in the cognitive sciences regarding the nature of knowledge and expertise and that future assessments needed to measure aspects of knowledge that were now deemed to be critical parts of the definition of academic competence and achievement. Suggestions were made for how NAEP might do this by developing a portfolio of assessment methods and approaches.

The second major area of concern was the validity of the achievement level analysis and reporting process. Both evaluations, as well as others that preceded them, were extremely critical of both the process that NAEP was using to determine achievement levels and the outcomes that were reported. It was judged that the entire achievement level approach lacked validity and needed a major conceptual and operational overhaul. As might be expected, this critique met with less than resounding approval by the National Assessment Governing Board, which is responsible for the achievement level–setting process.

Many of the concerns raised in the two major evaluations of NAEP, along with many other reviews of various aspects of the NAEP program, have served as stimuli in an ongoing process of refining, improving, and transforming NAEP. One of NAEP's hallmarks as an assessment program is its capacity to evolve, engage in cutting edge assessment development work, and provide results of value to many constituencies. It continues to serve its role as "The Nation's Report Card."


ALEXANDER, LAMAR. 1991. America 2000. Washington, DC: U.S. Department of Education.

ALEXANDER, LAMAR, and JAMES, H. THOMAS. 1987. The Nation's Report Card: Improving the Assessment of Student Achievement. Stanford, CA: National Academy of Education.

GLASER, ROBERT; LINN, ROBERT; and BOHRNSTEDT, GEORGE. 1992. Assessing Student Achievement in the States. Stanford, CA: National Academy of Education.

GLASER, ROBERT; LINN, ROBERT; and BOHRNSTEDT, GEORGE. 1993. The Trial State Assessment: Prospects and Realities. Stanford, CA: National Academy of Education.

GLASER, ROBERT; LINN, ROBERT; and BOHRNSTEDT, GEORGE. 1996. Quality and Utility: The 1994 Trial State Assessment in Reading. Stanford, CA: National Academy of Education.

GLASER, ROBERT; LINN, ROBERT; and BOHRNSTEDT, GEORGE. 1997. Assessment in Transition: Monitoring the Nation's Educational Progress. Stanford, CA: National Academy of Education.

JONES, LYLE V. 1996. "A History of the National Assessment of Educational Progress and Some Questions about Its Future." Educational Researcher 25 (6):1–8.

MESSICK, SAMUEL; BEATON, ALBERT; and LORD, FREDERICK. 1983. National Assessment of Educational Progress Reconsidered: A New Design for a New Era. Princeton, NJ: Educational Testing Service.

NATIONAL CENTER FOR EDUCATION STATISTICS. 1974. NAEP General Information Yearbook. Washington, DC: U.S. Department of Education.

NATIONAL COMMISSION ON EXCELLENCE IN EDUCATION. 1983. A Nation at Risk: The Imperative for Educational Reform. Washington, DC: U.S. Government Printing Office.

OFFICE OF TECHNOLOGY ASSESSMENT. 1992. Testing in America's Schools: Asking the Right Questions. Washington, DC: U.S. Government Printing Office.

PELLEGRINO, JAMES W.; JONES, LEE R.; and MITCHELL, KAREN J. 1999. Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress. Washington, DC: National Academy Press.


Additional topics

Education Encyclopedia - StateUniversity.comEducation Encyclopedia: AACSB International - Program to Septima Poinsette Clark (1898–1987)Assessment - Dynamic Assessment, National Assessment Of Educational Progress, Performance Assessment, Portfolio Assessment - CLASSROOM ASSESSMENT