11 minute read

Teacher Evaluation

Methods



In the decade from 1991 to 2001, a number of developments in public policy and assessment practices significantly altered the landscape for teacher evaluation practices. The single most important shift in the public policy arena has been the emergence of a tidal wave of support for what is loosely called "teacher accountability." What this seems to mean in effect is a growing insistence on measurement of teacher quality and teacher performance in terms of student achievement, which is often poorly defined, crudely measured, and unconnected to what educators regard as significant learning.



Because there is still little consensus about acceptable ways to meet the very substantial challenges posed by links between measures of student achievement and consequent conclusions about teacher effectiveness, the fact that this issue dominates current discourse about teacher evaluation is very significant, and somewhat alarming. This is not a new effort or a new issue, but the heated insistence on its power as the single most important criterion for establishing a teacher's effectiveness is new. Simply put, most efforts to connect student achievement to individual teacher performance have foundered in the past on the following weaknesses:

  • The measurement does not take into account teaching context as a performance variable.
  • The measurement is unreliable, in part because it does not include time as a variable–both the teacher's time with a cohort of students; and some model or models of sufficient time to see learning effects in students.
  • The measures used to reflect student achievement are not congruent with best practice and philosophy of instruction in modern education.

The link between teacher performance and student achievement is both so intuitively compelling as a major part of a teacher's performance evaluation and so very difficult to implement that it has never really been systematically achieved in the United States. The pressure to forge such links is immense in the early twenty-first century, and it is critical to the health and vitality of the education workforce that the link be credible and valid. A foundational validity issue is, of course, the quality and integrity of the methods states and districts have developed or adopted to measure student achievement. The teaching workforce has long disdained standardized national tests, the most commonly used assessments in school districts across the United States to represent student achievement, arguing persuasively that actual local and state curricula–and thus instruction–are not adequately aligned (or aligned at all) with the content of these tests. Furthermore, education reformers have almost universally excoriated these tests for two decades as reductive and not representative of the skills and abilities students really need to develop for the new millennium.

An evaluative commentary on the use of student tests for the purpose of high stakes accountability decisions was given by incoming American Educational Research Association President Robert Linn (2002), who evaluated fifty years of student testing in the U.S. education system, and the effects of that testing:

I am led to conclude that in most cases the instruments and technology have not been up to the demands that have been placed on them by high-stakes accountability. Assessment systems that are useful monitors lose much of their dependability and credibility for that purpose when high stakes are attached to them. The unintended negative effects of the high-stakes accountability uses often outweigh the intended positive effects. (p. 14)

Given the policy climate in the early twenty-first century, this is a sobering and cautionary conclusion, coming as it does from such a major figure in the measurement community, and one known for his even-handed and judicious treatment of measurement issues. It is clear that the most widely used current measures of student achievement, primarily standardized norm-referenced multiple-choice tests developed and sold off-the-shelf by commercial test publishers, are useful for many educational purposes, but not valid for school accountability. Indeed, they may be positively misleading at the school level, and certainly a distortion of teaching effectiveness at the individual teacher level. Concerns about the increased dependence on high-stakes testing has prompted a number of carefully worded technical cautions from important policy bodies as well. Although it is possible to imagine a program of student testing that aligns the assessments used to the standards for learning and to the curriculum actually taught, and that employs multiple methods and occasions to evaluate student learning, the investment such a program would demand would increase the cost of student assessment significantly. Furthermore, involving teachers in the conceptual development and interpretation of assessment measures that would be instructionally useful (particularly when those measures may have a direct effect on the teachers' performance evaluation and livelihood) is no closer to the realities of assessment practice than it has ever been—it is, in general, simply not part of the practice of school districts in the United States.

The emphasis on teacher quality has gained considerable momentum from the body of empirical evidence substantiating the linkage between teacher competence and student achievement. The "value-added" research, typified by the work of William Sanders and colleagues (1996; 1997; 1998) reinforces the assumption that the teacher is the most significant factor that affects student achievement. Sanders's work in this area is the best known and, increasingly, most influential among policymakers. In the measurement community, however, independent analyses of Sanders's data and methods have just begun. There appear to be controversial issues associated with both the statistical model Sanders uses and replicability of his findings.

Teacher Evaluation

At the beginning of the twenty-first century there is more teacher testing for various purposes than ever before. Some of this testing serves traditional purposes; for example, for admission into programs of professional preparation in colleges and universities or for licensure. For the first time in the United States there is a high-stakes assessment for purposes of certification, the National Board for Professional Teaching Standards (NBPTS) certification assessments, which are modeled on medical specialty board certification. Finally, there is a growing use of performance assessments of actual teaching for both formative purposes–during a teacher's initial years of practice, or the induction period–and for summative purposes, to grant an initial or more advanced teaching license. Performance-assessment-based licensure has been implemented in Connecticut since 2000 and is being implemented in 2002 in Ohio and in 2003 in Arkansas. In addition, California plans to implement a teaching performance assessment for all beginning teachers in California beginning in 2004.

Both the policy climate and the standards movement have had profound effects on teacher testing. States set passing standards on licensing tests, often rigorous, for demonstrations of sufficient skill and knowledge to be licensed. For example, as of the year 2000, thirty-nine states require all licensed teachers to pass a basic skills test (reading, mathematics, and writing), twenty-nine require secondary teachers to pass subject-specific tests in their prospective teaching fields, and thirty-nine require prospective secondary teachers to have a major, minor, or equivalent course credits for a subject-specific license. This means that a number of states require all three hurdles to be cleared before granting a license. In addition, most states require that the teacher's preparation institution recommend the candidate for the license. In every state but New Jersey, however, the state has the power to waive all of these requirements "either by granting licenses to individuals who have not met them or by permitting districts to hire such people" (Edwards, p. 8). And, perhaps most discouraging, only about twenty-five of the fifty states even have accessible records of "the numbers and percentages of teachers who hold various waivers" (Jerald and Boser, p. 44). Thus, reliance on rigorous state testing and preparation requirements to assure the quality of the education workforce is likely to lead to disappointment.

In 2000, thirty-six of the thirty-nine states that require teachers to pass a basic skills test waived that requirement and permitted a teacher to enter a classroom as teacher of record without passing the test. In sixteen states, this waiver can be renewed indefinitely, so long as the hiring school district asserts its inability to find a qualified applicant. Of the twenty-nine states that require secondary teachers to pass subject matter exams–most often only multiple-choice tests, even though more sophisticated tests are available–only New Jersey denies a license and therefore a job to candidates who have not passed the tests. Eleven of these twenty-nine states allow such candidates to remain in the job indefinitely, and all twenty-nine but New Jersey waive the course work completion requirement for secondary teachers if the hiring district claims that it cannot find a more qualified applicant for the position.

Thus, while initial licensing tests have become increasingly sophisticated–they are based on K–12 student and disciplinary standards and offer both multiple-choice and constructed response formats–the requirements for their use are not only widely variable, but also not rigorously enforced.

As of 2001 the NBPTS certification assessments represent the first-ever, widely accepted national recognition of excellence in the teaching profession. The program has grown exponentially since 1994 when eighty-six teachers, the first National Board Certified Teachers, were announced. In 2001 approximately 14,000 candidates in nineteen different fields were assessed; the NBPTS expects that the number of National Board Certified Teachers nationwide will rise from 9,534 (2000) to approximately 15,000. The certification assessment consists of a classroom-based portfolio, including videotapes and student work samples with detailed analytical commentaries by the teacher-candidates, and a computer-delivered written assessment focused primarily on content and pedagogical-content knowledge. The NBPTS assessments have established a number of new benchmarks for teacher evaluation. The assessments themselves are both elaborate and very rigorous; it takes approximately nine months for a teacher to complete the assessment process. Almost universally regarded by candidates as the single most profound learning and professional development experience they have ever had, the assessment process is being widely used as a model for teacher professional development. The scoring process, which requires extensive training of peer teachers, is itself a substantial professional development opportunity.

In addition, the actual technical quality of the scoring has contradicted long-held opinions that complex human judgments sacrifice reliability or consistency for validity, or credibility. The NBPTS scoring reliability is extremely high. The expense of the assessment–$2,300 per candidate in 2002–has been borne largely by states and local governments as part of their support for teacher quality initiatives. That level of public support for a high-stakes, voluntary assessment is unprecedented in education in the United States. In 2001, the NBPTS published the first in a series of validity studies that showed substantive differences between National Board Certified Teachers and non-certified teachers in terms of what actually goes on in the classrooms and in student learning.

The third area of change and innovation in teacher evaluation has taken place in states' provision for mentoring and formative assessment in the initial period of a beginning teacher's career, a period commonly called the induction period. States vary in the nature of the support they provide, with twenty-eight states requiring or providing funds for beginning-teacher induction programs, but only ten states doing both. The most sophisticated induction programs exist in Connecticut (the Connecticut BEST program), California (the CFASST program), and Ohio (the Ohio FIRST program). Each of these programs uses structured portfolio-based learning experiences to guide a new teacher and a mentor through a collaborative first year of practice.

Few states assess the actual teaching performance of new teachers. Twenty-seven states require that the school principal evaluate each new teacher. As of 2000, only four states (Kentucky, Louisiana, Oklahoma, and South Carolina) go beyond this requirement and require that the principal and a team of other educators from outside the school, trained to a common set of criteria, participate in the new teacher's evaluation. As of 2001 Connecticut, New York, and Ohio will all have performance-based licensure tests for beginning teachers at the end of the first or second year of teaching. Connecticut requires a subject-specific teaching portfolio; New York requires a videotape of teaching; Ohio will use an observation-based licensing assessment developed by Educational Testing Service called Praxis III. In 2002 Arkansas will begin using Praxis III as well; by 2004 California will make its work-sample-based Teaching Performance Assessment operational for initial licensure of all California teachers.

BIBLIOGRAPHY

BOND, LLOYD; SMITH, TRACY; BAKER, WANDA K.; and HATTIE, JOHN A. 2000. The Certification System of the National Board for Professional Teaching Standards: A Construct and Consequential Validity Study. Washington, DC: National Board for Professional Teaching Standards.

EDWARDS, VIRGINIA B. 2000. "Quality Counts 2000." Education Week 19 (18)[entire issue].

EDWARDS, VIRGINIA B., ed. 2000. "Who Should Teach? The States Decide." Education Week 19 (18):8–9.

ELMORE, RICHARD F., and ROTHMAN, ROBERT, eds. 1999. Testing, Teaching, and Learning: A Guide for States and School Districts. Washington, DC: National Academy Press.

FEUER, MICHAEL J., et al., eds. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: National Academy Press.

HEUBERT, JAY P., and HAUSER, ROBERT M., eds. 1999. High Stakes: Testing for Tracking, Promotion, and Graduation. Washington, DC: National Academy Press.

JERALD, CRAIG D., and BOSER, ULRICH. 2000. "Setting Policies for New Teachers." Education Week 19 (18):44–47.

KORETZ, DANIEL, et al. 2001. New Work on the Evaluation of High-Stakes Testing Programs. Symposium conducted at the National Council on Measurement in Education's Annual Meeting, Seattle, WA.

LINN, ROBERT L. 2000. "Assessments and Accountability." Educational Researcher 29:4–16.

MADAUS, GEORGE E., and O'DWYER, LAURA M. 1999. "A Short History of Performance Assessment: Lessons Learned." Phi Delta Kappan 80:688–695.

MILLMAN, JASON, ed. 1997. Grading Teachers, Grading Schools. Is Student Achievement a Valid Evaluation Measure? Thousand Oaks, CA: Corwin.

SANDERS, WILLIAM L., and HORN, SANDRA P. 1998. "Research Findings from the Tennessee Value-Added Assessment System (TVASS) Database: Implications for Educational Evaluation and Research." Journal of Personnel Evaluation in Education 12:247–256.

SANDERS, WILLIAM L., and RIVERS, JUNE C. 1996. Cumulative and Residual Effects of Teachers on Future Student Academic Achievement. Knoxville, TN: University of Tennessee Value-Added Research and Assessment Center.

WRIGHT, S. PAUL; HORN, SANDRA P.; and SANDERS, WILLIAM L. 1997. "Teacher and Classroom Context Effects on Student Achievement: Implications for Teacher Evaluation." Journal of Personnel Evaluation in Education 11:57–67.

INTERNET RESOURCES

AMERICAN EDUCATIONAL RESEARCH ASSOCIATION. 2000. "AERA Position Statement Concerning High-Stakes Testing in PreK–12 Education." <www.aera.net/about/policy/stakes.htm>.

BARTON, PAUL. 1999. "Too Much Testing of the Wrong Kind: Too Little of the Right Kind in K–12 Education." <www.ets.org/research/pic>.

MARI A. PEARLMAN

Additional topics

Education - Free Encyclopedia Search EngineEducation EncyclopediaTeacher Evaluation - OVERVIEW, METHODS