Standardized Tests And High-stakes Assessment, Statewide Testing Programs, Test Preparation Programs, Impact OfSTANDARDIZED TESTS AND EDUCATIONAL POLICY
STANDARDIZED TESTS AND EDUCATIONAL POLICY
Johanna V. Crighton
STANDARDIZED TESTS AND HIGH-STAKES ASSESSMENT
Lorrie A. Shepard
STATEWIDE TESTING PROGRAMS
TEST PREPARATION PROGRAMS, IMPACT OF
NATIONAL ACHIEVEMENT TESTS, INTERNATIONAL
Ann E. Flanagan
INTERNATIONAL STANDARDS OF TEST DEVELOPMENT
STANDARDIZED TESTS AND EDUCATIONAL POLICY
The term standardized testing was used to refer to a certain type of multiple-choice or true/false test that could be machine-scored and was therefore thought to be "objective." This type of standardization is no longer considered capable of capturing the full range of skills candidates may possess. In the early twenty-first century it is more useful to speak of standards-based or standards-linked assessment, which seeks to determine to what extent a candidate meets certain specified expectations or standards. The format of the test or examination is less important than how well it elicits from the candidate the kind of performance that can give us that information.
The use of standardized testing for admission to higher education is increasing, but is by no means universal. In many countries, school-leaving (exit) examinations are nonstandardized affairs conducted by schools, with or without government guidelines, while university entrance exams are set by each university or each university department, often without any attempt at standardization across or within universities. The functions of certification (completion of secondary school) and selection (for higher or further education) are separate and frequently noncomparable across institutions or over time. In most countries, however, a school-leaving certificate is a necessary but not a sufficient condition for university entrance.
In the United States, states began using high school exit examinations in the late 1970s to ensure that students met minimum state requirements for graduation. In 2001 all states were at some stage of implementing a graduation exam. These are no longer the "minimum competency tests" of the 1970s and 1980s; they are based on curriculum and performance standards developed in all fifty states. All students should be able to demonstrate that they have reached performance standards before they leave secondary school.
Certification examinations based on state standards have long been common in European countries. They may be set centrally by the state and conducted and scored by schools (France); set by the state and conducted and scored by an external or semi-independent agency (the Netherlands, the United Kingdom, Romania, Slovenia); or set, conducted, and scored entirely within the schools themselves (Russian Federation), often in accordance with government guidelines but with no attempt at standardization or comparability. Since the objective is to certify a specified level of learning achieved, these exit examinations are strongly curriculum-based, essentially criterion-referenced, and ideally all candidates should pass. They are thus typically medium-or low-stakes, and failing students have several opportunities to retake the examination. Sometimes weight is given to a student's in-school performance as well as to exam results–in the Netherlands, for example, the weightings are 50-50, giving students scope to show a range of skills not easily measured by examination.
In practice, however, when constructing criteria for a criterion-referenced test, norm-referencing is unavoidable. Hidden behind each criterion is norm-referenced data: assumptions about how the average child in that particular age group can be expected to perform. Pure criterion-referenced assessment is rare, and it would be better to think of assessment as being a hybrid of norm-and criterion-referencing. The same is true of setting standards, especially if they have to be reachable by students of varying ability: one has to know something about the norm before one can set a meaningful standard.
By contrast, university entrance examinations aim to select some candidates rather than others and are therefore norm-referenced: the objective is not to determine whether all have reached a set standard, but to select "the best." In most cases, higher education expectations are insufficiently linked to K–12 standards.
Entrance exams are typically academic and high-stakes, and opportunities to retake them are limited. Where entrance exams are set by individual university departments rather than by an entire university or group of universities, accountability for selection is limited, and failing students have little or no recourse. University departments are unwilling to relinquish what they see as their autonomy in selecting entrants; moreover, the lack of accountability and the often lucrative system of private tutoring for entrance exams are barriers to a more transparent and equitable process of university selection.
In the United States the noncompulsory SATs administered by the Educational Testing Service (ETS) are most familiar. SAT I consists of quantitative and verbal reasoning tests, and for a number of years ETS insisted that these were curriculum-free and could not be studied for. Indeed they used to be called "Scholastic Aptitude Tests," because they were said to measure candidates' aptitude for higher level studies. The emphasis on predictive validity has become less; test formats include a wider range of question types aimed at eliciting more informative student responses; and the link with curriculum and standards is reflected in SAT II, a new subject test in high school subjects, for example English and biology. Fewer U.S. universities and colleges require SAT scores as part of their admission procedure in the early twenty-first century, though many still do.
Certification Combined with Selection
A number of countries (e.g., the United Kingdom, the Netherlands, Slovenia, and Lithuania) combine school-leaving examinations with university entrance examinations. Candidates typically take a national, curriculum-based, high school graduation exam in a range of subjects; the exams are set and marked (scored) by or under the control of a government department or a professional agency external to the schools; and candidates offer their results to universities as the main or sole basis for selection. Students take only one set of examinations, but the question papers and scoring methods are based on known standards and are nationally comparable, so that an "A" gained by a student in one part of the country is comparable to an "A" gained elsewhere. Universities are still entitled to set their own entrance requirements such as requiring high grades in biology, chemistry, and physics for students who wish to study medicine, or accepting lower grades in less popular disciplines or for admittance to less prestigious institutions.
Trends in Educational Policy: National Standards and Competence
Two main trends are evident worldwide. The first is a move towards examinations linked to explicit national (or state) standards, often tacitly aligned with international expectations, such as Organisation for Economic Co-Operation and Development (OECD) indicators or the results of multinational assessments such as the Third Mathematics and Science Study (TIMSS) or similar studies in reading literacy and civics. The second trend is towards a more competence-based approach to education in general, and to assessment in particular: less emphasis on what candidates can remember and more on what they understand and can do.
Standards. The term standards here refers to official, written guidelines that define what a country or state expects its state school students to know and be able to do as a result of their schooling.
In the United States all fifty states now have student testing programs, although the details vary widely. Few states, however, have testing programs that explicitly measure student achievement against state standards, despite claims that they do. (Some states ask external assessors to evaluate the alignment of tests to standards.) Standards are also used for school and teacher accountability purposes; about half the states rate schools primarily on the basis of student test scores or test score gains over time, and decisions to finance, close, take over, or otherwise overhaul chronically low-performing schools can be linked to student results. Much debate centers on whether tests designed for one purpose (measuring student learning) can fairly be used to judge teachers, schools, or education systems.
The same debate is heard in England and Wales, where student performance on national curriculum "key stage" testing at ages seven, eleven, fourteen, and sixteen has led to the publication of "league tables" listing schools in order of their students' performance. A painstaking attempt to arrive at a workable "value-added" formula that would take account of a number of social and educational variables ended in failure. The concept of "value added" involves linking a baseline assessment to subsequent performance: the term refers to relative progress of pupils or how well pupils perform compared to other pupils with similar starting points and background variables. A formula developed to measure these complex relationships was scientifically acceptable but judged too laborious for use by schools. Nevertheless, league tables are popular with parents and the media and remain a feature of standards-based testing in England and Wales.
Most countries in Central and Eastern Europe are likewise engaged in formulating educational standards, but standards still tend to be expressed in terms of content covered and hours on the timetable ("seat time") for each subject rather than student outcomes. When outcomes are mentioned, it is often in unmeasurable terms: "[Candidates] must be familiar with … the essence, purpose, and meaning of human life [and] … the correlation between truth and error" (State Committee for Higher Education, Russian Federation, p. 35).
Competence. The shift from content and "seat-time" standards to specifying desired student achievement expressed in operational terms ("The student will be able to … ") is reflected in new types of performance-based assessment where students show a range of skills as well as knowledge. Portfolios or coursework may be assessed as well as written tests. It has been argued that deconstructing achievement into a list of specified behaviors that can be measured misses the point: that learning is a subtle process that escapes formulas people seek to impose on it. Nevertheless, the realization that it is necessary to focus on outcomes (and not only on input and process) of education is an important step forward.
Apart from these conceptual shifts, many countries are also engaged in practical examinations reform. Seven common policy issues are: (1) changing concepts and techniques of testing (e.g., computer-based interactive testing on demand); (2) shift to standards-and competence-based tests; (3) changed test formats and question types (e.g., essays rather than multiple-choice); (4) more inclusive target levels of tests; (5) standardization of tests; (6) independent, external administration of tests; and (7) convergence of high school exit exams and university entrance.
Achieving Policy Goals
In terms of monitoring the achievement of education policy goals, standards-linked diagnostic and formative (national, whole-population, or sample-based) assessments at set points during a student's schooling are clearly more useful than scores on high-stakes summative examinations at the end. Trends in annual exam results can still be informative to policy makers, but they come too late for students themselves to improve performance. Thus the key-stage approach used in the United Kingdom provides better data for evidence-based policy making and more helpful information to parents than, for example, the simple numerical scores on SAT tests, which in any case are not systematically fed back to schools or education authorities.
However, the U.K. approach is expensive and labor-intensive. The best compromise might be sample-based periodic national assessments of a small number of key subjects (for policy purposes), plus a summative, curriculum-and standards-linked examination at the end of a major school cycle (for certification and selection).
See also: INTERNATIONAL ASSESSMENTS; STANDARDS FOR STUDENT LEARNING; STANDARDS MOVEMENT IN AMERICAN EDUCATION; TESTING, subentries on INTERNATIONAL STANDARDS OF TEST DEVELOPMENT, NATIONAL ACHIEVEMENT TESTS, INTERNATIONAL.
CRESSWELL, MICHAEL J. 1996. "Defining, Setting and Maintaining Standards in Curriculum-Embedded Examinations: Judgmental and Statistical Approaches." In Assessment: Problems, Developments and Statistical Issues, ed. Harvey Goldstein and Toby Lewis. London: Wiley and Sons.
DORE, RONALD P. 1997. The Diploma Disease: Education, Qualifications and Development. London: George Allen and Unwin.
ECKSTEIN, MAX A., and NOAH, HAROLD J. eds. 1992. Examinations: Comparative and International Studies. Oxford: Pergamon Press.
GREEN, ANDY. 1997. Education, Globalization and the Nation State. Basingstoke, Eng.: Macmillan.
HEYNEMAN, STEPHEN P. 1987. "Uses of Examinations in Developing Countries: Selection, Research and Education Sector Management." International Journal of Education Development 7 (4):251–263.
HEYNEMAN, STEPHEN P., and FAGERLIND, INGEMAR, eds. 1988. University Examinations and Standardized Testing. Washington, DC: World Bank.
HEYNEMAN, STEPHEN P., and RANSOM, ANGELA. 1990. "Using Examinations to Improve the Quality of Education." Educational Policy 4 (3):177–192.
LITTLE, ANGELA, and WOLF, ALISON, eds. 1996. Assessment in Transition: Learning, Monitoring and Selection in International Perspective. Tarrytown, NY: Pergamon.
SCHOOL CURRICULUM AND ASSESSMENT AUTHORITY (SCAA). 1997. The Value Added National Project: Final Report. London: SCAA Publications.
STATE COMMITTEE FOR HIGHER EDUCATION, RUSSIAN FEDERATION. 1995. State Educational Standards for Higher Professional Education. Moscow: Ministry of Education.
TYMMS, PETER. 2000. Baseline Assessment and Monitoring in Primary Schools: Achievements, Attitudes and Value-added Indicators. London: David Fulton.
UNIVERSITY OF CAMBRIDGE LOCAL EXAMINATIONS SYNDICATE (UCLES). 1998. MENO Higher-Level Thinking Skills Test Battery. Cambridge, Eng.: UCLES Research and Development Division.
WEST, RICHARD, and CRIGHTON, JOHANNA. 1999. "Examination Reform in Central and Eastern Europe: Issues and Trends." Assessment in Education: Principles, Policy and Practice 6 (2):71–289.
NATIONAL GOVERNORS ASSOCIATION. 2002. "High School Exit Exams: Setting High Expectations." <www.nga.org/center/divisions/1,1188,C_ISSUE_BRIEF%5ED_1478,00.html>.
NATIONAL CENTER FOR PUBLIC POLICY AND HIGHER EDUCATION. 2002. "Measuring Up 2000." <http://measuringup2000.highereducation.org/>.
JOHANNA V. CRIGHTON
- Textbooks - OVERVIEW, SCHOOL TEXTBOOKS IN THE UNITED STATES
- Lewis Terman (1877–1956)
- Testing - Standardized Tests And High-stakes Assessment
- Testing - Statewide Testing Programs
- Testing - Test Preparation Programs, Impact Of
- Testing - National Achievement Tests, International
- Testing - International Standards Of Test Development