None of The Above: A New Approach To Testing and Assessment
Fourteen Educators 4 Excellence teachers came together to make recommendations from the classroom on ways to improve standardized testing.
The team studied areas where assessment should be improved, as well as where it is working and should be sustained.
Based on relevant research and their own experience as educators, the teachers generated recommendations to improve testing in four main areas: design, culture, teaching and accountability.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3K views40 pages
None of The Above: A New Approach To Testing and Assessment
Fourteen Educators 4 Excellence teachers came together to make recommendations from the classroom on ways to improve standardized testing.
The team studied areas where assessment should be improved, as well as where it is working and should be sustained.
Based on relevant research and their own experience as educators, the teachers generated recommendations to improve testing in four main areas: design, culture, teaching and accountability.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40
A NEW APPROACH TO
TESTI NG AND ASSESSMENT
August 2014 Teachers have a vital perspective on testing and assessment. As front- line observers, we experience how state assessments work with our specic student populations. As a result, we have valuable insight on how to use testing in schools. SURAJ GOPAL, ninth-grade STEM special education teacher, Hudson High School of Learning Technologies Executive Summary 1 Introduction 5 Design: Improve the accuracy of standardized assessments 7 Culture: Create and maintain a positive testing environment in schools 12 Teaching: Use data to improve instruction 16 Accountability: Include data in critical decisions 20 Conclusion 27 Teacher Policy Team Process and Methodology 30 Notes 31 Teacher Policy Team and Acknowledgements 34 1 P R E F A C E Standardized testing can be deeply benecial to students, teachers, and schools by providing an important measure of progress, as well as meaningful feedback about areas of success and areas of growth. As teachers, we know the costs and benets of assessments. This leaves us between two sides of an often-heated debate, but this is where the evidence leads us. In short, tests have value so lets take advantage of them. Here is how: DESIGN: IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS A large body of research shows that well-designed standardized tests can provide valuable information about students knowledge and teachers performance. In fact, such tests are often predictive of long-term life outcomes. It is essential to ensure that all standardized tests are well-designed and that feedback from teachers is solicited during all stages of the testing process. A common concern is that the accuracy of assessments is undermined by excessive teaching to the test, which does not contribute to meaningful learning. However, there is little evidence that test preparation even produces signifcantly higher test scores when tests are well-designed and focused on higher-order skills. Teachers and principals should be strongly discouraged from teaching to the test because it neither raises test scores nor results in genuine learning. Computer-adaptive testing is an important tool for improving the accuracy of assessments. Such tests do a better job than traditional assessments of measuring both high- and low-achieving students, and should be made widely available for adoption. Finally, ensuring the quality of state-created tests is an iterative process. The vast majority of state test items should be released publicly so that stakeholders, such as teachers and parents, can ofer feedback on the exams. CULTURE: CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS In some schools, the negative culture surrounding standardized testing is pervasive, undermining the value of assessments and harming teachers morale and students motivation. A truly pernicious culture can lead to cheating. As educators, we must work within our schools to create a positive culture that recognizes the value of testing for learning and growth. Best practices should be instituted to deter, detect, and investigate potential instances of cheating. Policymakers must address the negative impact of excessive testing by getting an accurate measure of time spent on assessment and eliminating unnecessary tests. Moreover, the use of alternate assessments, including holistic, portfolio- based exams, should be studied to determine whether they are compatible with data-driven improvement and accountability. TEACHING: USE DATA TO IMPROVE INSTRUCTION The data from standardized tests can serve as an important tool for teachers and administrators. Research suggests that both teachers and schools beneft from thoughtful use of data. Data-driven instruction can be improved in a variety of ways, including: ongoing professional development for teachers; a dedicated data specialist in each school; and data that is returned to teachers in a timely, disaggregated, and accessible manner. 2 ACCOUNTABILITY: INCLUDE DATA IN CRITICAL DECISIONS Because test scores are important refections of student learning, assessment data should be a part of consequential decisions. In fact, there is a large body of literature showing the benefts of using tests as part of a multiple measure accountability framework. However, tests should never be the sole basis for any high-stakes decision. For example, the current system of denying graduation to any student who does not pass all Regents exams is misguided and should be revised to incorporate multiple measures. Furthermore, when connecting student test scores to teacher performance, special care must be taken to isolate the efect of teachers and exclude the multitude of factors outside teachers control that afect student performance. Teachers of traditionally non-tested subjects should be evaluated using growth measures or student learning objectives on assessments that are designed with signifcant input from educators. CONCLUSION We believe in the value of standardized assessments when they are used carefully. They can be a critical tool for teachers and students alike, and we would be unwise to discard them. At the same time, policymakers, administrators, and teachers must invest the time, money, refection, and work necessary to realize the value of assessments. Throughout our teams research, a positive culture of assessments and data-driven instruction was a key recurring theme for school success. That culture starts with each of us, in our own classrooms and buildings, and will only happen if teachers are invested as active participants in the process of shaping changes to testing and assessment. Trevor Baisden, founding fth-grade ELA and history lead teacher, Success Academy Bronx 2 Middle School E X E C U T I V E
S U M M A R Y CULTURE CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS TEACHING USE DATA TO IMPROVE INSTRUCTION DESIGN IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS ACCOUNTABILITY INCLUDE DATA IN CRITICAL DECISIONS MAKING GOOD USE OF STANDARDIZED TESTS 4 5 Standardized assessments have increasingly become a part of life for schools across the country. Since No Child Left Behind became law in 2001, there has been a growing attention to measuring districts, schools, and students progress, with a particular focus on historically disadvantaged students. Critics of this trend suggest doing away with standardized tests entirely, while many proponents argue that we simply need to stay the course. As a team of 14 teachers, committed to elevating our profession and ensuring students succeed, our response is none of the above. We are unifed in the belief that testing has signifcant value, with the understanding that the way tests are currently designed and used must be improved. In this paper, we lay out a new vision for testing and assessment, beginning with the design of assessments and ending with the important decisions that test results should inform. In New York, testing has dominated the conversation about the implementation of new teacher evaluation programs and the Common Core State Standards. We fnd ourselves frmly in the middle between those who would do away with testing altogether and those who do not acknowledge any faws in the current system. But we are comfortable in the rational middlecomfortable with the view that as educators we can beneft from the information these tests provide. We are comfortable with the idea that our students growth on tests can be one part of our evaluations, while using that same data to inform our teaching decisions. Finally, we believe that a standard measure can be critical in ensuring equality in education. We believe that disaggregated assessment data shines a light on populations of students who are not getting the education they deserve. We all have a part to play in changing the substance and the culture of testing. None of the Above has something for everyone: teachers and principals, state and district administrators, elected ofcials and policymakers. In June 2014, in response to concerns about the role of standardized tests in teacher evaluation, the New York State legislature passed a so-called safety net that removes the impact of state assessments on teachers with the lowest evaluation ratings for two years. Let us say no to the all- or-nothing approaches, and make the most of this time to get these tests right. Teachers see the impact that testing and assessment have on our practice and our students. Teachers know rsthand what is best for our students and our practice. Its important for us to have a voice in the testing and assessment debate because it has a direct impact on the daily actions of teachers and students. Christine Montera, social studies teacher, East Bronx Academy for the Future 7 KEY TAKEAWAYS FROM RESEARCH AND EXPERIENCE At the core of the debate on testing is a critical question: Are standardized assessments reective of students learning? Test opponents, on one end of the spectrum, claim they are not indicative of student learning or achievement. 1 At the other extreme are those who argue that a single assessment on a single day is the only measure that we should use to make high-stakes decisions. 2 As classroom teachers, we think the truth falls somewhere in between. There is abundant research showing that standardized tests are meaningful. Such assessments can predict with moderate accuracy individuals frst-year college GPA, 3
cumulative college GPA, 4 post-college income, 5 and success in graduate school. 6 Aggregate international test scores are also predictive of the economic prosperity of countries. 7 Additionally, teachers whose students standardized test scores grow produce an increase in those students adult incomes and rates of college attendance. 8
This research shows that standardized tests are able to capture important information about what is happening in our classrooms. Standardized tests, however, are not the be-all and end-all; they do not measure everything that matters. There are many students who do not test well and end up leading happy, successful lives. Research indicates that certain subjective evaluations of teachers are only modestly correlated with their students test-based success, 9
suggesting what many teachers know: that tests cannot measure the full value of an educator. Indeed, the teachers who have the greatest positive efects on students social and behavioral skills are not always the ones who produce the highest test score gains. 10 This is why past E4E- New York papers on teacher evaluation 11 and Common Core implementation 12 have insisted on multi-measure evaluation and decision-making for teachers and students. There are other limitations to standardized tests, which we will discuss later in this paper, but, in short, tests are meaningful but dont measure everything. I MPROVE THE ACCURACY OF STANDARDI ZED ASSESSMENTS SUMMARY OF RECOMMENDATIONS When designing tests, follow best practices such as ensuring alignment to standards, testing higher- order thinking, and actively soliciting teacher input.
Prioritize higher-order instruction, and eliminate excessive test preparation that does not contribute to meaningful learning. Use computer-adaptive assessments, which improve tests accuracy by measuring the growth of low- and high-performing students. Release the vast majority of state test items publicly after the assessment window has closed so that all stakeholders can monitor the quality of the exams. =STATE =DI STRI CTS =SCHOOLS 8 RECOMMENDATION: WHEN DESIGNING TESTS, FOLLOW BEST PRACTICES SUCH AS ENSURING ALIGNMENT TO STANDARDS, TESTING HIGHER-ORDER THINKING, AND ACTIVELY SOLICITING TEACHER INPUT. All tests are not created equal. Anecdotally, as teachers, all of us have experience with assessments that were poorly written or were not aligned with the academic standards. We also all have experience with many well-designed tests that were fair assessments of our students learning and our teaching, and that gave us important data that we were able to use to improve our instruction. We were heartened to learn about the process that New York State test questions (technically called items) go through before they are ever used on an ofcial exam. It takes a full two years for each item to be approved through a process that includes extensive feld testing, statistical validation, and input from a committee of teachers. 13 It is disconcerting, however, that even after such a thorough process, there are still concerns from educators about the quality of these tests. 14 We are glad the New York State Education Department uses a committee of teachers to validate testing items. The opportunity to join such a committee should be widely disseminated so that as many teachers as possible have the chance to share their voice. We also believe that there should be a formal system for soliciting and receiving teacher commentary so that all educators can share feedback after a test has been given. We recommend that the State Education Department send a survey to all teachers who administered tests to gather feedback on positive and negative aspects of the assessments. Improving tests will only be efective with the active participation of teachers in testing design on a district and state level. Our ability to share insights from the classroom, as well as the cultural and socioeconomic backgrounds of our students, will undoubtedly help create high-quality assessments. Blackfoot U-Ahk, fourth- and fth-grade teacher of students with severe emotional disabilities, Coy L. Cox School P.369k DESIGNING QUALITY ASSESSMENTS When designing all tests, the following practices must be followed: Classroom teachers need to provide input throughout the process, from the creation of the tests to feedback after the tests are given. This feedback must be taken into account and meaningfully acted upon. Tests must be aligned to standards and assess higher-order thinking skills. 15 The diversity of students backgroundsincluding differences in geography, socioeconomic status, racial identity, disability status, etc.must be considered in test development in order to avoid potential bias. Test items should be worded to make sure each item measures the specic standard being assessed, as opposed to students ability to understand a tricky question. The amount of time given for assessments and the number of assessments given in a single day need to be age-appropriate. D E S I G N 9 RECOMMENDATION: PRIORITIZE HIGHER-ORDER INSTRUCTION AND ELIMINATE EXCESSIVE TEST PREP. One of the most serious critiques of standardized assessments is that excessive teaching to the test can efectively negate the validity of an exam, as students learn how to score well without learning meaningful skills or content. Teaching to the test, or drill and kill, tends to take valuable time away from rich, higher-order instruction. No teacher gets into the profession for this kind of mechanized work, and it undermines teachers and students love of school. But contrary to the notion that tests can be gamed by excessive preparation, research suggests that the best way to prepare for most standardized assessments is through challenging, authentic work focused on content and skills. 16 One study that examined students preparation for the ACT found that improvements from [an ACT pre- test] to the ACT are smaller the more time teachers spend on test preparation in their classes and the more they use test preparation materials. Moreover, the focus on testing strategies and practice diverts students and teachers eforts from what really mattersdeep analytic work in academic classes. 17 In other words, at least for well-designed assessments, excessive test preparation may actually lead to worse results. This aligns with our experience, as well as recent statements from education leaders. As New York City Schools Chancellor Carmen Faria said, If we do good teaching, thats the best test prep. 18 Similarly, New York State Education Commissioner John King stated, The best preparation for testing is good teaching. 19 We agree. Since there is scant evidence that excessive teaching to the test will lead to higher assessment results, teachers and principals need to be shown this research. When educators realize that test prep is counterproductive, more time will be spent on authentic teaching and learning. RECOMMENDATION: USE COMPUTER-ADAPTIVE ASSESSMENTS. One valid concern about traditional tests is that they cannot adequately capture the growth of students who are signifcantly above or below grade level. The good news is that technology ofers a solution to this problem computer-adaptive testing adjusts question difculty based on students demonstrated skill level. This sort of assessment, which is already relatively widely used, including by the Graduate Record Examinations (GRE) 20 and the Graduate Management Admission Test (GMAT), 21 would help teachers get a better sense of students growth from year to year. 22 Similarly, computer-adaptive tests give more accurate information to students and parents. We therefore strongly support the use of computer-adaptive testing whenever available, and encourage investment in this alternative where it does not exist. Questions have been raised regarding whether computer- adaptive testing will lead to low expectations for struggling students. 23 We understand these concerns, but ultimately disagree: We are not aware of evidence that educators will lower expectations for their students simply because tests focus on academic growth. If, for example, data show that a certain schools students are not making progress, eforts can be made to help those students and ensure that teachers are held accountable. In that sense, more accurate data will help rather than hinder the improvement and accountability process. Moreover, there is no clear alternativestudents who are far behind or far ahead need a meaningful gauge of their progress, and computer- adaptive tests provide this. Computer-adaptive testing is absolutely crucial because many of my students are far behind and would benet from a test scaled to their abilities. Rachael Beseda, rst-grade special education teacher, Global Community Charter School D E S I G N 10 That being said, it is important that computer-adaptive assessments give all students a fair opportunity to engage with grade-level content. All tests should begin with grade-level questions, and only move down once it becomes clear that students are not at grade level. Furthermore, such tests should attempt to push all students to demonstrate higher-order thinking skills. For example, a student reading below grade level can still be given the chance to show the same skills as her grade-level peers, but do so with a less-challenging text. RECOMMENDATION: RELEASE THE VAST MAJORITY OF STATE TEST ITEMS PUBLICLY AFTER THE ASSESSMENT WINDOW HAS CLOSED. All tests, especially those used for making high-stakes decisions, need to undergo careful scrutiny both before and after administration. We believe there is a healthy process in place to ensure quality in the creation of New York State exams. At the same time, it has been frustrating for many educators that state tests prohibit teachers and students from discussing the contents of the exam. 24 Right now, with low public confdence in tests, 25 the state needs to allocate funds to signifcantly increase the transparency of state assessments, 26 except for feld test items, which, by design, cannot be publicly released. These funds will allow for the printing of additional forms of state assessments that will give the state the ability to feld test more items, decreasing the need to reuse (and thus keep hidden from public view) previous items. This will allow for the elimination of the widely criticized 27 stand-alone feld tests. Increased transparency will let educators, parents, and students give feedback on state tests, which is particularly important as the Common Core standards are being implemented. This also ensures that teachers and students have a better understanding of what to expect on future exams. We believe that this will not only improve the assessments themselves by holding test designers and the New York State Education Department accountable to the public, but will also help restore public trust in the exams. Schools and teachers emphasis should always be on high-quality rigorous instruction. Both research and experience suggest that this is the best method for preparing for well-designed assessments. Vivett Hemans, English and language arts teacher, Eagle Academy for Young Men of Southeast Queens D E S I G N WHAT IS COMPUTER-ADAPTIVE TESTING? Computer-adaptive assessments start all students at the same levelin this case, at their grade levels. However, questions on the test become progressively harder as the test-taker gets more questions right or progressively easier as the test-taker gets more questions wrong. That does not mean that if a student gets the rst few questions wrong, the remainder of the test will be below grade level. Instead, the test continuously adapts based on the students responses. For example, if a student gets the rst few questions wrong, but the next several questions right, the difculty level will begin increasing as more correct answers are given. This process allows assessments to meet students where they are at in order to get an accurate measure of their learning and growth. ADDITIONAL BENEFITS OF TESTING Our paper is organized around the two main benets of standardized assessments: using them for improvement and as a factor in important decisions. However, we would be remiss if we did not discuss some smaller, but important additional benets of testing. Assessments provide evidence of achievement and opportunity gaps. Using both the NAEP and state tests mandated by No Child Left Behind, policymakers and concerned citizens have quantitative evidence of the inequities that persist in our country. Testing not only shows that this is the case, but also helps quantify the gap and determine whether it is expanding, contracting, or staying constant. While qualitative evidence is also important in this regard, test scores can provide the hard data necessary to bring light to the shameful inequities that persist in our country. Standardized tests are important to prepare students for success in adult life. Not only must college-bound students take the SAT or ACT, but all those who aspire to graduate school must take additional exams. Potential lawyers must take the LSAT and the bar exam; would-be doctors must do well on the MCAT and board exams. The list goes on and includes most professions. That is not to say that the purpose of K12 education should be to prepare students for assessments, but we would be doing a disservice if we limit students exposure to the types of high-stakes tests they need to do well on later in life. There is some evidence that assessments do not simply measure learning, but actually enhance it. A variety of studies 28 have found that students retain information better after being tested on it. At this point, it is not clear that this research applies to standardized tests, but it is a potential value that points to the necessity of aligning standards what is taught in classto what is tested. 12 C U L T U R E CREATE AND MAI NTAI N A POSI TI VE TESTI NG ENVI RONMENT I N SCHOOLS KEY TAKEAWAYS FROM RESEARCH AND EXPERIENCE Many of our experiences suggest that in too many instances, the culture of testing and assessment in New York has turned toxic. No doubt this is not the case in all schools, but for too many of us, testing has become something to be feared and avoided. But it does not have to be that way. The negative culture of testing that permeates some schools must change. We believe that part of this shift has to come from us as teachers: We should be focusing on the value that assessments have to ofer. We cannot be surprised that a pessimistic culture exists in schools if the adults in those buildings have counterproductive attitudes about testing. Teachers cannot solve this problem alone, however. We need principals to do their part, by setting a positive building-wide tone about assessments. Moreover, as discussed earlier, we need principals to communicate clearly to teachers that excessive test prep will not raise test scores. Currently, though, it is often principals who mandate that teachers engage in this counterproductive practice, feeding a negative cycle that harms student engagement. As we will discuss further in a subsequent section, teachers also need to be given the tools to use test results to improve instruction. When teachers are supplied with what we need to make tests valuable, our outlook will change for the better. Moreover, part of the anxiety that surrounds testing comes from the feeling that a single test can determine our students futures. A commitment to using multiple measures for all high-stakes decisionsanother topic we will elaborate on in a later sectionwill go a long way toward eliminating this fear. SUMMARY OF RECOMMENDATIONS Measure time spent, by both students and teachers, on testing and eliminate unnecessary and redundant exams. Implement best practices, such as administering tests in controlled environments and monitoring for test irregularities, to prevent and detect cheating.
Create or expand pilot programs of schools using nontraditional tests to determine whether they lead to positive results for students, and can be used to evaluate and support teachers and schools. =STATE =DI STRI CTS =SCHOOLS 13 Finally, accountability must be paired with support throughout the year. What if teachers and students did not feel that low test scores would lead to punishments or poor ratings, but that they would lead to increased support and resources? To be clear, we do believe in accountability, but accountability should always go hand-in-hand with support and resources. Tests should be instructive, as well as evaluative. It is outside the scope of this paper to address what such support should look like specifcally, but this should be a core tenet of any accountability system. RECOMMENDATION: MEASURE TIME SPENT, BY BOTH STUDENTS AND TEACHERS, ON TESTING AND ELIMINATE UNNECESSARY AND REDUNDANT EXAMS. One cause of the general frustration directed at standardized tests is the widespread feeling that there are simply too many of them. We certainly feel that way. As we have elaborated, we believe there is value in assessment, but any such value must be weighed against the time and efort invested in testing. The frst and most important step must be to accurately gauge how much time is being spent on testing. We were glad that New York State Governor Andrew Cuomos Common Core Implementation Panel attempted to address the underlying problem by recommending a 2 percent limit on school time spent on local and state assessments combined, and a 2 percent limit on test prep. 29
These suggested changes were subsequently implemented in the State Budget. 30
The goal here is laudable, but we are skeptical of an arbitrary percentage that does not vary by grade. That is why we need a genuine fgure for just how much time and money are spent on testingthis should include time spent preparing, administering, and grading these assessments for teachers; money spent developing the test; time spent by students taking tests (including feld tests); and instructional time lost on days when tests are administered. We think the state took a step in the right direction by requiring an audit of assessments to make sure districts are not giving unnecessary assessments based on the assumption that they are mandated by the state. 31 It is important that this audit is prioritized so that excessive testing is reduced as soon as possible. Once these two audits are complete, districts can make smart decisions, with the input of teachers, about which tests are worthwhile and which are not. 14 RECOMMENDATION: IMPLEMENT BEST PRACTICES TO PREVENT AND DETECT CHEATING. Though the vast majority of educators regularly administer assessments with honesty and fdelity, an extreme outgrowth of a counterproductive school culture manifests itself in cheating scandals, which have occurred throughout the country. 32
Some have taken these cheating scandals to mean that standardized tests should be eliminated, but this makes no more sense than cancelling fnal exams because a handful of students tried to cheat on them. Instead, we should institute best practicesbased on a U.S. Department of Education symposium on test integrity 33 to ensure that cheating rarely happens, and how to detect and investigate it if it does. In order to PREVENT CHEATING, 34
the state, districts, and schools must: In order to DETECT EVIDENCE OF CHEATING, 35 the state, districts, and schools must: In order to INVESTIGATE CHEATING, 36 the state, districts, and schools must: Develop and disseminate a standard denition of cheating. Monitor test results for irregularities as part of the testing process. Establish procedures for conducting an investigation if one is necessary. Train principals and teachers to administer exams. Ensure that proctors look for evidence of irregularities during assessment administration. Create standards that will trigger an investigation. Keep testing windows short. Use advanced analytic techniques, such as erasure analysis, to check for irregularities. Provide whistleblower protections. Administer tests in controlled environments. Use trained personnel to conduct the investigation. Establish and monitor a chain of custody for testing materials. Make the investigation as transparent as possible. Store and score test materials of-site. Make use of sanctions when wrongdoing is found. In sum, these best practices, created by experts in the feld, will help stop cheating in the frst place, while ensuring a fair process if testing irregularities are found. We emphasize, though, that an ounce of prevention is worth a pound of cure herea healthy testing culture will go a long way toward eliminating this problem. RECOMMENDATION: CREATE OR EXPAND PILOT PROGRAMS OF SCHOOLS USING NONTRADITIONAL TESTS. One serious problem with traditional standardized tests, which often include multiple-choice questions, is that it can be difcult to continually engage students in such exams. For students and teachers, so-called bubble tests have become a chore that must be endured. As discussed earlier, we believe that schools have an important role in changing this culture. At the same time, alternatives to traditional assessments should be explored and tested for their efectiveness. The New York Performance Standards Consortium is a group of 28 schools that have used performance assessments in place of traditional high-stakes tests. 37 The Consortium schools boast impressive results, showing C U L T U R E 15 C U L T U R E Policymakers should embrace a pilot program for portfolio assessment in order to see whether this type of assessment can work. I think that project- based learning and inquiry-based work are things I dont do nearly enough I rely on more traditional assessments, and teachers need to think of ways to cater to all students needs and strengths in terms of assessment. Charlotte Steel, seventh-grade math teacher, Booker T. Washington M.S. 54 their students graduate high school at higher rates than other demographically similar New York City students. 38
But the fact that these schools produce strong graduation rates does not mean that performance assessments are the cause. Moreover, legitimate questions have been raised regarding the ability to fairly and efciently use performance assessments to evaluate teachers and assess student learning. 39
We therefore propose an expanded pilot program that allows more schools to enter into the Performance Standards Consortium, while also determining whether such assessments are compatible with data-driven improvement and accountability. We recommend opening up an application for schools interested in joining the program, and conducting a lottery in order to randomly accept half of the eligible applicant schools into the pilot. Under this approach, schools that adopt the performance assessment model can be evaluated against similar schools that do not. If this system gets positive results for teachers and students, it should be expanded to even more city schools. 16 KEY TAKEAWAYS FROM RESEARCH AND EXPERIENCE Research is clear that assessment data can be used as a tool for teachers and schools to improve. 40
It has been found, for example, that schools that make thoughtful use of data often produce signicant gains in student achievement. 41 Research also suggests that access to data can increase the quantity and quality of conversations that educators have with colleagues, parents, and students. 42 Data can enhance collaboration among educators 43 and can improve teachers instruction. 44 There is also evidence that the most successful charter schools make use of data-driven improvement and instruction. 45 Overall, data can and should be used to help schools and teachers improve. 46 Unfortunately, this is not always happening. One recent study found that a new data system introduced in Cincinnati Public Schools was rarely used by educators and did not lead to observable student gains. 47 A pilot program in Pennsylvania produced similar results. 48 The key, then, is to give teachers the support we need to make good use of testing data. RECOMMENDATION: OFFER HIGH-QUALITY TRAINING THROUGHOUT THE YEAR FOR TEACHERS ON HOW TO IMPROVE INSTRUCTION USING ASSESSMENT DATA. Teachers and administrators need more training on how to use data efectively. The New York City teachers contract recently put in place more time for professional development. 49 Some of that time should be dedicated to high-quality training on understanding and using student USE DATA TO I MPROVE I NSTRUCTI ON SUMMARY OF RECOMMENDATIONS Ofer high-quality training throughout the year for teachers on how to improve instruction using assessment data. Provide each school with a teacher who serves as a data specialist. Ensure that teachers and administrators receive timely, detailed, and disaggregated data in a transparent, accessible format. =STATE =DI STRI CTS =SCHOOLS T E A C H I N G 17 data. It is worth noting that while we support school-based creation of professional development, this may be an area in which schools need outside support and expertise to design appropriate programs. RECOMMENDATION: PROVIDE EACH SCHOOL WITH A TEACHER WHO SERVES AS A DATA SPECIALIST. Teachers need continuous support in using data systems. We need more than a one-time training. We propose that at least one teacher in each school receive the designation of data specialist. This role should come with extensive training, as well as the responsibility of supporting and working with staf to use data and integrate this information into their regular assessment ofand feedback fortheir students. Additionally, data specialists should receive compensation for this role that is either monetary or in the form of a lighter class load. A fnal beneft is that this position could potentially serve as an additional rung on a teacher career ladder, a concept that past E4E Teacher Policy Teams have endorsed. 50 RECOMMENDATION: ENSURE THAT TEACHERS AND ADMINISTRATORS RECEIVE TIMELY, DETAILED, AND DISAGGREGATED DATA IN A TRANSPARENT, ACCESSIBLE FORMAT. To make full use of assessments, teachers and administrators need timely, detailed, and disaggregated data in order to tailor their instruction to address their students needs. The current system does not supply educators with sufciently detailed feedback on these exams. Compounding this problem is the fact that the results do not come back until the summer, and thus teachers often cannot act on the data. A high priority must be placed on giving educators actionable, disaggregated, and timely results from standardized assessments. Teachers also need access to a high-quality, easily navigable interface in which we can access all relevant data. Georgia, in particular, is a state that has been highlighted for its success in making data accessible and easy to use for teachers, 51 and New York should follow suit. T E A C H I N G It is particularly important that teachers receive thorough and useful training in data-driven instruction. Unless the results of assessments are used to move teaching and learning forward, they serve little value. Michelle Knifn, ninth- to 12th-grade math teacher, High School of Telecommunication Arts and Technology COMMON CORE ASSESSMENTS CONSORTIUM As the Common Core State Standards are being implemented across the country, new testing consortia are being rolled out that are aligned to the new standards. There are two testing groups: Smarter Balanced Assessment Consortium (SBAC), 52 which has been adopted at least in part by 20 states, 53 and the Partnership for Assessment of Readiness for College and Careers (PARCC), 54 which has been adopted by 14 states and the District of Columbia. 55 Field tests took place in the spring of 2014, 56 and the full assessments will be available for use beginning in the 20142015 school year. New York State has adopted PARCC, 57 but has not yet determined when the new exams will be rolled out. 58 Below, we discuss aspects of PARCC and how they align with our recommendations: We are encouraged that PARCC assessments appear to test higher-order thinking skills. Although it is too early to determine for sure, the sample questions 59 leave us optimistic that rigorous skills will be tested, and low-level multiple-choice tests will be deprioritized. It is very important that PARCC continuously involves teachers in the creation and revision of the exams. PARCC has already shown evidence of having engaged teachers throughout this process, and we are pleased to see such a clear commitment to teacher input. 60 Moreover, we recommend that PARCC distribute surveys to teachers at the end of each year to garner feedback on the years assessments. Although PARCC tests will be completed using computers, they will not be computer adaptive, 61
with the important exception of optional diagnostic exams. It is disappointing that this valuable technology will not be utilized for the summative assessments, as PARCC is missing an opportunity to get accurate growth measures of high- and low-achieving students. Although a PARCC frequently asked questions document 62 claims that the assessments will measure the full range of student performance, including the performance of high- and low-achieving students, it is not clear how they will manage to do so. We urge PARCC to consider moving to computer-adaptive assessments, particularly in light of the fact that SBAC will be utilizing this technology. 63
An advantage of computer-based assessments is that cheating will be more difcult, since school staff will not handle or transport physical testing materials. 64 However, new threats to testing security such as access to the Internetmay exist, and PARCC, in partnership with schools and districts, must ensure teachers and school leaders are prepared to administer the tests fairly and monitor for irregularities. An additional advantage of using computer-based assessments is timely feedback to schools, teachers, and students. For many questionsones that have clear right or wrong answersthe data should be available almost immediately. Though for othersperformance tasks, essays, or any items that require manual gradingthe turnaround will understandably be longer. However, we are glad that PARCC has stated that its goal is to have data from the performance-based assessments returned before the end of the school year. 65 It is crucial that PARCC ensures that teachers receive timely, disaggregated, and user-friendly data. As we have argued, transparency is a necessary aspect for all important exams, in part to ensure that the public is given an opportunity to offer feedback on the content and quality of assessments, and in part to ensure public trust in such assessments. So far, we are encouraged that PARCC has already released sample tests 66 and plans to release 40 percent of test items each year. We hope the commitment to transparency continues and expands as full-scale tests are implemented. 20 I NCLUDE DATA I N CRI TI CAL DECI SI ONS KEY TAKEAWAYS FROM RESEARCH AND EXPERIENCE There is now abundant evidence that using test score growth as part of a multiple-measure evaluation and accountability system can benet students. Multiple peer-reviewed studies 67, 68, 69
have found that students benet when adults are held accountable for results. 70 There is also research showing that teacher evaluation that considers evidence of student learning can be benecial to students. 71 Finally, and most importantly, evidence suggests that, when designed and implemented well, accountability systems can impact school quality in a way that leads to long-term positive effects on students adult incomes. 72 All that being said, the current way that test scores are used to make important decisions needs to be improved to ensure they are fair to students, teachers, and schools. RECOMMENDATION: ISOLATE THE EFFECTS OF TEACHERS AND SCHOOLS TO ENSURE THAT THOSE SERVING AT-RISK STUDENT POPULATIONS ARE NOT PENALIZED BY OUT-OF-SCHOOL FACTORS. One of the most difcult, but most important, aspects of using student test score growth in an evaluation system is isolating the efects of schools and teachers. After all, many factorsincluding poverty and parental involvement afect a given students achievement, and only a fraction can be attributed to his teachers or his school. Indeed, only about one-ffth to one-quarter of student test scores are explained by the quality of their schools, and of that, about one-half to two-thirds are the result of the students individual teachers. 73 SUMMARY OF RECOMMENDATIONS Isolate the efects of teachers and schools to ensure that those serving at-risk student populations are not penalized by out-of-school factors. Evaluate teachers of non-tested subjects based on authentic assessmentsdeveloped and validated by teachersusing growth measures or student learning objectives. Make high-stakes decisions based on multiple sources and multiple years of evidence. =STATE =DI STRI CTS =SCHOOLS We are not saying that teachers and schools do not matter. But we also cannot blame those same teachers and schools for all the factors that can contribute to low student achievement. If we simply look at absolute test scores, as often happens, 74 with no accounting for growth or student background, the schools and teachers working with our most challenging students will be unfairly penalized. Moreover, some struggling schools and teachers who work with high-achieving students will be overlooked. 75 With the use of value-added modeling, 76 we can go a long way toward isolating teachers and schools efects by controlling for students prior tests scores, as well as other factors outside teachers control. WHAT IS VALUE ADDED? Value added is a statistical method that attempts to isolate teachers inuence on their students test score growth. Value-added models can take into account a variety of variables that afect students performance, including prior achievement, socioeconomic status, disability status, special education status, attendance, disciplinary record, and class size. 77 Although some critics of value-added measures correctly point out that teachers ratings can vary from year to year, 78 others respond that this can be ameliorated through multiple years of data, and that similar variance exists in performance metrics of other professions. 79 Value-added scores are particularly reliable for teachers at the extremes of the distribution. 80 Research also suggests that teachers value-added scores predict their efects on students long-term outcomes such as income and college attendance. 81 I N-SCHOOL FACTORS At l east hal f of i n-school efect i s based on students i ndi vi dual teachers. OUT-OF-SCHOOL FACTORS UNEXPLAI NED VARI ATI ON 20% 20% 60% FACTORS CONTRIBUTING TO STUDENT ACHIEVEMENT Source: Di Carlo, M. (2010, July 14) Teachers Matter, But So Do Words. Shanker Blog. Retrieved from http://shankerblog.org/?p=74. (Note that these percentages are approximations.) 22 EXAMPLE: TWO-STEP VALUE-ADDED MODEL In recent years, as New York has started using a student growth model to evaluate teachers, concerns have been raised about the extent to which it fairly accounts for factors outside of educators and schools control. 82 A report on the subject found evidence that the 20122013 New York State growth measure may have been partially biased against some teachers and principals who serve certain student populations. 83 With New York State likely to use value-added scores as 25 percent of teacher evaluation in the 20142015 school year, 84 now is the time to consider the ideal model. We recommend an approach that more fully accounts for factors outside teachers and schools control. This methodknown as a two-step value- added model, or proportionalityis designed to make apples-to-apples comparisons. 85 In other words, this model eliminates any correlation between teachers and schools value-added scores and the student populations they teachit guarantees that educators of, for example, students in poverty or students with disabilities will not receive disproportionately low ratings. This will address the concern that student achievement measures penalize teachers and schools who serve certain student populations. It will also ensure that evaluation measures will not exacerbate persistent inequities in those schools high-poverty schools will have a tougher time recruiting and retaining teachers if those educators face a higher chance of a low evaluation score. We recognize that genuine inequalities persist between and within our schools, 86 and that correlations between teacher efectiveness scores and student populations likely reect some genuine diferences in teacher quality. But our goal in an evaluation system is not just to get an accurate picture of teacher quality, but also to design a system that provides useful information to support teacher and school improvement, while helping districts and principals make retention and dismissal decisions. We are convinced that the two-step model does just that. 87 23 Sample for all growth measures is 1,846 schools Source: Ehlert, M., Koedel, C., Parsons, E., et al. Selecting Growth Measures for School and Teacher Evaluations: Should Proportionality Matter? National Center for Analysis of Longitudinal Data in Education Research. Retrieved from http://www.caldercenter.org/publications/upload/wp-80-updated-v3.pdf The following graphs show three diferent ways of measuring schools student achievement growth. The x-axis is a measure of school poverty, while the y-axis is a measure of school efectiveness based on the given growth measure. The shaded areas are scatter plots showing the range of schools scores. The line shows the correlation between schools level of poverty and their level of efectiveness. Note that these are examples based on schools in Missouri, so representations of New York schools may vary in certain ways. A C C O U N T A B I L I T Y PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH MEDIAN STUDENT GROWTH PERCENTILE SGP ONESTEP VALUEADDED MODEL TWOSTEP VALUEADDED MODEL M E D I A N
S G P 75 50 25 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH 0. 5 0 0. 5 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH S T A N D A R D
D E V I A T I O N S
S T A N D A R D
D E V I A T I O N S 0. 5 0 0. 5 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH MEDIAN STUDENT GROWTH PERCENTILE SGP ONESTEP VALUEADDED MODEL TWOSTEP VALUEADDED MODEL M E D I A N
S G P 75 50 25 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH 0. 5 0 0. 5 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH S T A N D A R D
D E V I A T I O N S
S T A N D A R D
D E V I A T I O N S 0. 5 0 0. 5 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH MEDIAN STUDENT GROWTH PERCENTILE SGP ONESTEP VALUEADDED MODEL TWOSTEP VALUEADDED MODEL M E D I A N
S G P 75 50 25 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH 0. 5 0 0. 5 PERCENT OF STUDENTS ELI GI BLE FOR FREE OR REDUCED- PRI CE LUNCH S T A N D A R D
D E V I A T I O N S
S T A N D A R D
D E V I A T I O N S 0. 5 0 0. 5 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 COMPARING DIFFERENT GROWTH MEASURES RECOMMENDATION: EVALUATE TEACHERS OF NON-TESTED SUBJECTS BASED ON AUTHENTIC ASSESSMENTS DEVELOPED AND VALIDATED BY TEACHERS USING GROWTH MEASURES OR STUDENT LEARNING OBJECTIVES. Many educators do not teach in grades or subjects that have annual state tests, and therefore cannot be evaluated using value-added measures. In order to comply with the new evaluation law, some teachers are being rated based on students or subjects they do not teachfor example, in some cases gym teachers are being rated on English scores. 88 This practice must stop, because it violates a core tenet 89 of any accountability system: Teachers should not be held accountable for outcomes outside of our control. 90
We are glad that the New York City teachers contract will move accountability in this direction. 91 Districts need to invest in authentic performance measures for teachers in non-tested subjects, particularly ones like music, art, and physical education. In many cases, these performance assessments may be combined with more traditional written tests. Results should not only be considered in individual teachers evaluations, but school evaluations as well. The creation of standardized performance assessments for these subjects has been experimented with, 92 though the evidence is limited on how successful such programs have been. In all non-tested subjects, evaluations should be based on student learning objectives 93 or measures of student growth that ensure fair comparisons are being made across classrooms. Our top priority is to ensure that any such assessments are designed by and with teachers, and are validated by teachers. Educators should have a hand in the design, the administration, and the revision of these assessments. This is absolutely essential. When teachers are involved in the creation of exams, the tests are more likely to refect what is being taught in the classroom. EXAMPLE: HIGH SCHOOL GRADUATION EXAMS The current requirement that all New York State students pass a series of exams in order to graduate high school is an example of a policy that fails to consider multiple measures. Under the current system, students will only receive a high school diploma if they pass ve state-mandated Regents exams. 98 (Students with disabilities or IEPs have some limited additional options.) This policy is designed to create high expectations for studentsan admirable goalbut it ends up harming some of them. Anecdotal 99 and empirical 100, 101, 102, 103 evidence show that high school graduation exams have little or no positive efects and signicant negative consequences for students who fail such tests. There is even alarming research showing that mandated graduation exams can lead to increased incarceration rates. 104 With this evidence in mind, we take the position that high school graduation exams should never be the sole basis for denying students their diplomas. It is appropriate for such tests to be part of a multiple-measure graduation system, but not as inexible roadblocks for students trying to graduate. It is outside our scope to discuss what precisely such a system should look like, but we will note that, holistic multi-measure graduation models exist and should be studied. 105
RECOMMENDATION: MAKE HIGH-STAKES DECISIONS BASED ON MULTIPLE SOURCES AND MULTIPLE YEARS OF EVIDENCE. We believe in the value of test scores to inform and evaluate students, teachers, principals, and schools, but we also are convinced that a single test score should not be the sole basis for any high-stakes decision. A broad array of theory and evidence suggests that multiple measures are always preferable in high-stakes circumstances. 94 We are encouraged, then, that New York Citylike all districts and states that have adopted the new wave of teacher evaluation 95 has used a multiple measure system, with student growth as one factor among others. 96 Similarly, we are glad that the New York City Department of Education recently adopted a multiple-measure system for student promotion and retention decisions. 97 We think the city and state have done a good job ensuring that important decisions are based on multiple sources of evidence. Nevertheless, there is room for improvement. Using multiple measures for high-stakes decisions is particularly important to me and my students because so many ELLs often struggle on tests but are bright, capable students. Maura N. Henry, sixth- to 12th-grade English as a Second Language teacher, The Young Women's Leadership School of Astoria UNIQUE STUDENT POPULATIONS One important aspect of assessment that is not discussed enough is the efect on unique populations of students, including those receiving special education, students with disabilities, English-language learners, and gifted and talented students. A thorough discussion of issues surrounding testing with each of these student populations is beyond the scope of this paper. However, we were very cognizant of these students while crafting our recommendations. Here, we highlight and elaborate on how specic components of our recommendations afect these students. In the design of tests, the needs of unique populations of students must be carefully considered. First and foremost, teachers of a variety of student populations should be represented on the panel of educators who design and review assessments. Particular care must be given in writing test items to ensure that certain students are not disadvantaged. For example, math tests should not, in most cases, include idioms that English-language learners might not be familiar with, since such a question would not measure those students mathematical ability. As we have previously articulated, we believe in the value of computer- adaptive testing. These assessments will benet unique student populationsspecically those who are low- and high-achievingby gauging their growth accurately. This needs to be a high priority. If we want students and teachers to believe in the value of the assessments, we need to make them useful to all students. Computer-adaptive tests will signicantly help in this regard. Our recommendation regarding the use of multiple measures in making high-stakes decisionsspecically graduation decisionswill have a positive effect on unique populations of students. 106 English-language learners and special education students have long graduated at lower rates than other students. The move to a multiple-measure system will not solve this problem, but it will give all students multiple avenues to demonstrate their knowledge of the content necessary to graduate. When teacher input is sought out and reflected in assessments and their implementation, tests will become an effective tool to accurately gauge student achievement and growth, as well as an empowering tool for the teachers to improve their teaching practices. I RI S WON, ninth- to 12th-grade mathematics and technology teacher, Renaissance High School for Musical Theater & Technology As teachers, this is our vision for making full use of standardized assessments for taking advantage of a powerful tool that requires careful execution. Tests can be a force for good, and we would be unwise to throw them out of our toolbox. At the same time, they cannot be our only tool. We cannot use a hammer when a wrench is necessary, and we will usually need both. Improving how tests are used is a shared responsibility. As teachers, we must do our partadminister tests with fdelity, use data to improve when it is available, and advocate for better assessments when necessary. But policymakers must also step upthey must provide us with the support we need, and they must make wise decisions about how often tests are administered and how results are used. This will take time, money, refection, and a lot of work. Lets get started. 28 KEY TAKEAWAYS Tests are useful, though imperfect, measures of students learning and teachers efectiveness. The accuracy of tests is directly related to test qualitywell- designed assessments provide important information, but poorly designed tests have little to no use. RECOMMENDATIONS When designing tests, follow best practices such as ensuring alignment to standards, testing higher-order thinking, and actively soliciting teacher input. Prioritize higher-order instruction, and eliminate excessive test preparation that does not contribute to meaningful learning. Use computer-adaptive assessments, which improve tests accuracy by measuring the growth of low- and high- performing students. Release the vast majority of state test items publicly after the assessment window has closed so that all stakeholders can monitor the quality of the exams. CULTURE CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS TEACHING USE DATA TO IMPROVE INSTRUCTION DESIGN IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS ACCOUNTABILITY INCLUDE DATA IN CRITICAL DECISIONS CULTURE CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS TEACHING USE DATA TO IMPROVE INSTRUCTION DESIGN IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS ACCOUNTABILITY INCLUDE DATA IN CRITICAL DECISIONS KEY TAKEAWAYS Student achievement is a useful measure that should be a part of a multi-measure evaluation framework that holds teachers and schools accountable for student performance. Holding schools and teachers accountable for students performance produces positive results. RECOMMENDATIONS Isolate the efects of teachers and schools to ensure that those serving at-risk student populations are not penalized by out-of-school factors. Make high-stakes decisions based on multiple sources and multiple years of evidence. Evaluate teachers of non-tested subjects based on authentic assessmentsdeveloped and validated by teachersusing growth measures or student learning objectives. KEY RESEARCH TAKEAWAYS AND OVERVI EW OF RECOMMENDATI ONS 29 KEY TAKEAWAYS The toxic culture of testing that pervades some schools undermines the value of assessments and harms teachers morale. A positive culture begins with viewing assessments as opportunities for growth, and also requires policymakers to create an environmentthrough support and thoughtful decision-makingthat encourages a healthy culture. RECOMMENDATIONS Measure time spent, by both students and teachers, on testing, and eliminate unnecessary and redundant exams. Implement best practices, such as administering tests in controlled environments and monitoring for test irregularities, to prevent and detect cheating. Create or expand pilot programs of schools using nontraditional tests to determine whether they lead to positive results for students, and can be used to evaluate and support teachers and schools. CULTURE CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS TEACHING USE DATA TO IMPROVE INSTRUCTION DESIGN IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS ACCOUNTABILITY INCLUDE DATA IN CRITICAL DECISIONS CULTURE CREATE AND MAINTAIN A POSITIVE TESTING ENVIRONMENT IN SCHOOLS TEACHING USE DATA TO IMPROVE INSTRUCTION DESIGN IMPROVE THE ACCURACY OF STANDARDIZED ASSESSMENTS ACCOUNTABILITY INCLUDE DATA IN CRITICAL DECISIONS KEY TAKEAWAYS When used properly, assessment data is valuable for improving teachers practice, and provides helpful information to administrators, parents, and students. Teachers and administrators need more support in using data to inform their practice and ensure it is meaningful. RECOMMENDATIONS Ofer high-quality training throughout the year for teachers on how to improve instruction using assessment data. Provide each school with a teacher who serves as a data specialist. Ensure that teachers and administrators receive timely, detailed, and disaggregated data in a transparent, accessible format. KEY RESEARCH TAKEAWAYS AND OVERVI EW OF RECOMMENDATI ONS 30 IDENTIFYING E4ES POLICY FOCUS E4E surveyed members and held focus groups with E4E-NY members to determine the most important policy issues from teachers perspective. OUR PROCESS We met for eight weeks to review research on diferent facets of testing and assessment, particularly as they relate to New York City and State. We considered evidence from diferent perspectives, held small and large group discussions, and regularly challenged each others thinking. We ended up with four main categories under which we elaborate upon specifc recommendations. P R O C E S S
A N D
M E T H O D O L O G Y 31 N O T E S 1 For one example, see: D. Ravitch. (2014, January 18). Do International Test Scores Matter? (Weblog post). Retrieved from http://dianeravitch. net/2014/01/18/do-international-test-scores-matter/ (Readers of this blog know that I have repeatedly argued that standardized scores on international tests predict nothing about the future.) 2 Short, A., Campanile. C. (2014, April 9). Bloomberg-era tests no longer top criteria for student promotion: Faria. New York Post. Retrieved from http://nypost.com/2014/04/09/city-scraps-bloombergs- standardized-tests/ 3 Sackett. P.R., Kuncel, N.R., Beatty, A.S., et al. (2012, April 2). The Role of Socioeconomic Status in SAT-Grade Relationships and in College Admissions Decisions. Psychological Science, 23(9), 1000-1007. doi: 10.1177/0956797612438732 4 Schmitt, N., Keeney, J., Oswald, F.L., et al. (2009, November). Prediction of 4-year college student performance using cognitive and noncognitive predictors and the impact on demographic status of admitted students. Journal of Applied Psychology, 94(6), 1479-97. doi: 10.1037/a0016810. 5 Robertson, K.F., Smeets, S., Lubinski, D., et al. (2010, December). Beyond the Threshold Hypothesis Even Among the Gifted and Top Math/Science Graduate Students, Cognitive Abilities, Vocational Interests, and Lifestyle Preferences Matter for Career Choice, Performance, and Persistence. Current Directions in Psychological Science, 19(6), 346-51. doi: 10.1177/0963721410391442 6 Kuncel, N.R., Hezlett, S.A. (February, 2007). Standardized Tests Predict Graduate Students Success.315(5815).DOI: 10.1126/science.1136618 7 Hanushek, E.A., Jamison, D.T., Jamison, E.A., et al. (2008, Spring). Education and Economic Growth. Education Next, 8(2). Retrieved from http://educationnext.org/education-and-economic-growth/ 8 Chetty, R., Friedman, J.N, Rockof, J.E. (2011). The Long-Term Impact of Teachers: Teacher Value-Added and Student Outcomes in Adulthood. American Economic Review. Retrieved from http://obs.rc.fas.harvard. edu/chetty/value_added.html 9 Master, J. (2014, June). Stafng for Success. Education Evaluation and Policy Analysis. Retrieved from http://epa.sagepub.com/ content/36/2/207.abstract?rss=1 10 Jennings, J.L., DiPrete, T.A. (2009, March 15). Teacher Efects on Social/ Behavioral Skills in Early Elementary School. Retrieved from http:// www.columbia.edu/~tad61/Jennings%20and%20DiPrete_3_15_2009_ Final.pdf 11 Adland, J., Braslow, D., Brosbe, R., et al. (Spring, 2011). Beyond Satisfactory: A New Teacher Evaluation System for New York. Retrieved from http://educators4excellence.s3.amazonaws.com/8/3f/b/1362/ E4E_Evaluation_Paper_Final.pdf 12 Barraclough, N., Farnum, C., Loeb, M., et al. (Spring, 2014). A Path Forward: Recommendations from the classroom for efectively implementing the Common Core. Retrieved from http:// educators4excellence.s3.amazonaws.com/8/0b/a/2258/03.24.14_TAT_ CCSS_Memo.pdf 13 New York State Department of Education. (2014, July 9). New York State Education Department Test Development Process. Retrieved from http://www.p12.nysed.gov/assessment/teacher/home.html#process 14 See for example: Phillips. E. (2014, April 9). We Need to Talk About the Test: A Problem With the Common Core. The New York Times. Retrieved from http://www.nytimes.com/2014/04/10/opinion/the- problem-with-the-common-core.html; and Hartocollis, A., (2012, April 20). When Pineapple Races Hare, Students Lose, Critics of Standardized Tests Say. The New York Times. Retrieved from http://www.nytimes. com/2012/04/21/nyregion/standardized-testing-is-blamed-for-question- about-a-sleeveless-pineapple.html?pagewanted=all 15 King, F.J., Goodson, L., Rohani, F. Higher Order Thinking Skills. Center for Advancement of Learning and Assessment. Retrieved from http://www.cala.fsu.edu/fles/higher_order_thinking_skills.pdf 16 Newmann, F.M., Bryk, A.S., Nagaoka, J. (2001, January). Authentic Intellectual Work and Standardized Tests: Confict or Coexistence? Retrieved from http://ccsr.uchicago.edu/publications/authentic- intellectual-work-and-standardized-tests-confict-or-coexistence 17 UChicagoNews. (2008, May 27). Intensive ACT test prep during class leads to lower scores; students dont connect grades, study habits to exam scores. Retrieved from http://news.uchicago.edu/article/2008/05/27/ intensive-act-test-prep-during-class-leads-lower-scores-students-don-t- connect-gr 18 Rafter, D. (2014, January 2). De Blasio picks a schools chancellor. Queens Chronicle. Retrieved from http://www.qchron.com/editions/ queenswide/de-blasio-picks-a-schools-chancellor/article_687e9c54- a168-54a6-9df7-ebed13034cc2.html 19 Spector, J. (2014, March 24). John King on upcoming Common Core tests: The best preparation for testing is good teaching. Politics on the Hudson. Retrieved from http://polhudson.lohudblogs.com/2014/03/24/ john-king-upcoming-common-core-tests-best-preparation-testing-good- teaching/ 20 Graduate Record Examinations. How the test is scored. Retrieved from https://www.ets.org/gre/revised_general/scores/how/ 21 Graduate Management Admission Test. (2010, January 13). The CAT in the GMAT. Retrieved from http://www.mba.com/us/the-gmat-blog- hub/the-ofcial-gmat-blog/2010/jan/the-cat-in-the-gmat.aspx 22 Smarter Balanced Assessment Consortium. Computer Adaptive Testing. Retrieved from http://www.smarterbalanced.org/wordpress/wp- content/uploads/2011/12/Smarter-Balanced-CAT.pdf 23 Brown, E. (2014, March 2). D.C. mulling over Common Core test switch. The Washington Post. Retrieved from http://www. washingtonpost.com/local/education/dc-mulling-over-common-core- test-switch/2014/03/02/29478710-a0b3-11e3-a050-dc3322a94fa7_story. html?wprss=rss_education 24 Strauss, V. (2014, April 25). AFT asks Pearson to stop gag order barring educators from talking about tests. The Washington Post. Retrieved from http://www.washingtonpost.com/blogs/answer-sheet/wp/2014/04/25/ aft-asks-pearson-to-stop-gag-order-barring-educators-from-talking- about-tests/ 25 Times Union. (2014). Times Union/Siena College Poll [Data File]. Retrieved from http://www.timesunion.com/7dayarchive/item/Times- Union-Siena-College-education-poll-30096.php 26 We say this with the understanding that it may not be possible for 100% of all items to be released publicly. We are comfortable with a small number of itemsno more than 10%being held from public view to ensure comparability from tests year to year. 27 McIntire, M.E.. (2014, June 11) As Pearsons annual feld testing ends, some want them never to start again. Chalkbeat. Retrieved from http:// ny.chalkbeat.org/2014/06/11/as-pearsons-annual-feld-testing-ends- some-want-them-never-to-start-again/#.U8Le9FNyjec 28 Roediger, H.L. and Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science. Retrieved from http://learninglab.psych.purdue.edu/ downloads/2006_Roediger_Karpicke_PsychSci.pdf 29 Litow, S.S., Flanagan, J., Nolan, C., et al. (2014, March). Putting Students First: Common Core Implementation Panel Recommendation Report to Governor Andrew M. Cuomo. Retrieved from http://www. governor.ny.gov/sites/default/fles/Common_Core_Implementation_ Panel_3-10-14.pdf 30 S. 6356D, (2013). Retrieved from http://open.nysenate.gov/legislation/ bill/A8556d-2013 31 Ibid. 32 Resmovits. J. (2011, August 8). Schools Caught Cheating in Atlanta, Around the Country. The Hufngton Post. Retrieved from http://www. hufngtonpost.com/2011/08/08/atlanta-schools-cheating-scandal- ripples-across-country_n_919509.html 33 Alpert, T., Amrein-Beardsley, A., Bruce, W., et al. (2013). Testing Integrity Symposium: Issues and Recommendations for Best Practice. Symposium conducted at meeting of U.S. Department of Education. 32 34 Alpert et al. (2013) 35 Alpert et al. (2013) 36 Alpert et al. (2013) 37 New York Performance Standards Consortium. Retrieved from http:// performanceassessment.org/index.html 38 Educating for the 21st Century: Data Report on the New York Performance Standards Consortium. Retrieved from http://www.nyclu. org/fles/releases/testing_consortium_report.pdf 39 Mathews, J. (2004, Summer). Portfolio Assessment: Can it be used to hold schools accountable? Education Next, 4(3). Retrieved from http:// educationnext.org/portfolio-assessment/ 40 Wayman, J.C. (2005). Involving Teachers in Data-Driven Decision Making: Using Computer Data Systems to Support Teacher Inquiry and Refection. Journal of Education for Students Placed at Risk, 10(3), 295 308. Retrieved from http://myclass.nl.edu/tie/tie533/teacherdatause.pdf 41 Wayman. (2005) 42 Light, D., Honey, M., Heinze, J. (2005, January). Linking Data and Learning: The Grow Network Study. Center for Children and Technology. Retrieved from http://cct.edc.org/publications/linking-data-and- learning-grow-network-study 43 Chen, E., Heritage, M., Lee, J. (2005). Identifying and Monitoring Students Learning Needs with Technology. Journal of Education for Students Placed at Risk, 10(3), 3o9322. Retrieved from http://www. tandfonline.com/doi/abs/10.1207/s15327671espr1003_6#.U4ijD1Nyjec 44 Datnow, A., Park. V., Wohlstetter, P. (2007). Achieving with Data: How high-performing school systems use data to improve instruction for elementary students. Retrieved from http://www.newschools.org/fles/ AchievingWithData.pdf 45 Fryer, R.G. (2012, September). Learning from the Successes and Failures of Charter Schools. Retrieved from http://scholar.harvard.edu/ fles/fryer/fles/haUsingmilton_project_paper_2012.pdf 46 Data Quality Campaign. (2012, January). Retrieved from http://www. dataqualitycampaign.org/fles/1357_DQC-TE-primer.pdf 47 Tyler, J.H. (2013). If You Build it Will They Come? Teachers Online Use of Student Performance Data. Education Finance and Policy, 8(2), 168-207. http://www.mitpressjournals.org/doi/abs/10.1162/ EDFP_a_00089#.U4intlNyjec 48 McCafrey, D.F., Hamilton, L.S. (2007). Value-Added Assessment in Practice: Lessons from the Pennsylvania Value-Added Assessment System Pilot Project. Retrieved from http://www.rand.org/content/dam/rand/ pubs/technical_reports/2007/RAND_TR506.sum.pdf 49 United Federation of Teachers. Repurposed workday. Retrieved from http://www.uft.org/proposed-contract/repurposed-workday 50 Consentino, L., DAmico, J., Fazio, C., et al. (Spring 2014). A Passing Grade: Teachers Evaluate the NYC Contract. Retrieved from http:// www.educators4excellence.org/nycontract/report 51 Data Quality Campaign. (2014, February). Teacher Data Literacy: Its About Time. Retrieved from http://www.dataqualitycampaign.org/fles/ DQC-Data%20Literacy%20Brief.pdf 52 Smarter Balanced Assessment Consortium. Retrieved from http:// www.smarterbalanced.org/ 53 Smarter Balanced Assessment Consortium. Member States. Retrieved from http://www.smarterbalanced.org/about/member-states/ 54 PARCC. PARCC Online. Retrieved from http://www.parcconline. org/ 55 PARCC. PARCC States. Retrieved from http://www.parcconline.org/ parcc-states 56 Gewertz, C. (2014, March 21). Field-testing Set to Begin on Common Core Exams. Education Week. Retrieved from http://www.edweek.org/ ew/articles/2014/03/21/26feldtests_ep.h33.html 57 PARCC. New York. Retrieved from https://www.parcconline.org/ new-york 58 Ed Week. (2014, May 19). The National K12 Testing Landscape. Retrieved from http://www.edweek.org/ew/section/multimedia/map- the-national-k-12-testing-landscape.html 59 PARCC. PARCC Task Prototypes and Sample Questions. Retrieved from http://www.parcconline.org/samples/item-task-prototypes 60 PARCC. Item Development. Retrieved from http://www.parcconline. org/assessment-development" 61 Brown, E. (2014, March 2). D.C. Mulling Over Common Core Test Switch. The Washington Post. Retrieved from http://www. washingtonpost.com/local/education/dc-mulling-over-common-core- test-switch/2014/03/02/29478710-a0b3-11e3-a050-dc3322a94fa7_story. html?wprss=rss_education 62 PARCC. (2013, August). PARCC Fact Sheet and FAQs. Retrieved from http://www.parcconline.org/sites/parcc/fles/ PARCCFactSheetandFAQsBackgrounder_FINAL.pdf 63 Smarter Balanced Assessment Consortium. Computer Adaptive Testing. Retrieved from http://www.smarterbalanced.org/smarter-balanced- assessments/computer-adaptive-testing/ 64 Alpert et al. (2013) 65 PARCC. (2013, August). PARCC Fact Sheet and FAQs. Retrieved from http://www.parcconline.org/sites/parcc/fles/ PARCCFactSheetandFAQsBackgrounder_FINAL.pdf 66 PARCC. Practice Tests. Retrieved from http://www.parcconline.org/ practice-tests 67 Hanushek, E.A., Raymond, M.E. (2005). Does School Accountability Lead to Improved Student Performance? Journal of Policy Analysis and Management, 24(2), 297327. Retrieved from http://hanushek.stanford. edu/sites/default/fles/publications/hanushek%2Braymond.2005%20 jpam%202 4-2.pdf 68 Chiang, H. (2009, October). How accountability pressure on failing schools afects student achievement. Journal of Public Economics, 93(9- 10), 104557. Retrieved from http://www.sciencedirect.com/science/ article/pii/S0047272709000693 69 Rouse, C.E., Hannaway, J., Goldhaber, D., et al. (2013, May). Feeling the Florida Heat? How Low-Performing Schools Respond to Voucher and Accountability Pressure. American Economic Journal: Economic Policy, 5(2), 251-81. Retrieved from http://www.aeaweb.org/articles. php?doi=10.1257/pol.5.2.251 70 All of these studies measure scores based on assessments other than the state exam, so cheating, gaming, or test prep cannot explain these results. 71 Rockof, J.E., Staiger, D.O., Kane, T.J, et al. (2010, July). Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools. The National Bureau of Economic Research Working Paper No. 16240. Retrieved from http://www.nber.org/papers/w16240 72 Deming, D.J., Cohodes, S., Jennings, J., et al. (2013, September). School Accountability, Postsecondary Attainment and Earnings. The National Bureau of Economic Research Working Paper No. 19444. Retrieved from http://www.nber.org/papers/w19444 73 DiCarlo, M. (2010, July 14). Teachers Matter, But So Do Words. (Weblog). Retrieved from http://shankerblog.org/?p=74 74 Di Carlo, M. (2012, February 2). The Perilous Confation of Student and School Performance. (Weblog). Retrieved from http://shankerblog. org/?p=4980 75 Di Carlo, M. (2013, October 3). Are There Low-Performing Schools With High-Performing Students? (Weblog). Retrieved from http:// shankerblog.org/?p=8887 76 Value-Added Modeling 101. (2012, September). Rand Education. Retrieved from www.rand.org/education/projects/measuring-teacher- efectiveness/value-added-modeling.html 77 McCafrey, D. (2012, October 15). Do Value-Added Methods Level the Playing Field for Teachers? Carnegie Knowledge Network. Retrieved from http://www.carnegieknowledgenetwork.org/briefs/value-added/ level-playing-feld/ N O T E S 33 78 Baker, E., Barton, P., et al. (2010, August 27). Problems with the use of student test scores to evaluate teachers. Economic Policy Institute. Retrieved from http://www.epi.org/publication/bp278/ 79 Glazerman, S., Loeb, S., et al. (2010, November 17). Evaluating Teachers: The Important Role of Value-Added. Brown Center on Education Policy at Brookings. Retrieved from http://www.brookings. edu/~/media/research/fles/reports/2010/11/17%20evaluating%20 teachers/1117_evaluating_teachers.pdf 80 Di Carlo, M. (2010, December 7). The War on Error. (Weblog) Retrieved from http://shankerblog.org/?p=1383 81 Chetty, R., Friedman, J., Rockof, J. (2011, December). The Long- Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood. National Bureau of Economic Research. Retrieved from http://www.nber.org/papers/w17699 82 Stern, G. (2013, October 15). N.Y.s Teacher Evaluation Faulted in Study. The Journal News. Retrieved from http://archive.lohud.com/ article/20131015/NEWS/310150042/N-Y-s-teacher-evaluations- faulted-study 83 Lower Hudson Council of School Superintendents. (2013, October). Review and Analysis of the New York State Growth Model. Retrieved from http://www.lhcss.org/positionpapers/nysgrowthmodel.pdf 84 Decker, G. (2013, June 18). State to Use Value-Added Growth Model without Calling it That. Chalkbeat. Retrieved from http://ny.chalkbeat. org/2013/06/18/state-to-use-a-value-added-growth-model-without- calling-it-that/#.U61JpVNyjec 85 Ehlert, M., Koedel, C., Parsons, E., et al. Selecting Growth Measures for School and Teacher Evaluations: Should Proportionality Matter? National Center for Analysis of Longitudinal Data in Education Research. Retrieved from http://www.caldercenter.org/publications/upload/wp- 80-updated-v3.pdf 86 Lankford, H., Loeb, S., Wyckof, J. (2002, March). Teacher Sorting and the Plight of Urban Schools. Education Evaluation and Policy Analysis. Retrieved from http://epa.sagepub.com/content/24/1/37.short 87 Koedel, C. (2014, May 27). The Proportionality Principle in Teacher Evaluation. Shanker Blog. Retrieved from http://shankerblog. org/?p=9924 88 Cramer, P., Decker, G. (2013, September 16). Instead of Telling Teachers Apart, New Eval Lumps Some Together. Chalkbeat. Retrieved from http://ny.chalkbeat.org/2013/09/16/instead-of-telling-teachers-apart- new-evals-lump-some-together/#.U4-VpFNyjec 89 Di Carlo, M. (2012, May 29). We Should Only Hold Schools Accountable for Outcomes They Can Control. (Weblog). Retrieved from http://shankerblog.org/?p=5959 90 We distinguish this practice from evaluation systems that have school- wide rating components, meaning that all teachers in a school are judged by a schools overall components. This practice has several pros and cons; in this paper, we do not take a position on it. 91 Decker, G. (2014, May 14). Appeal Process in New Evaluation Plan Shifts Weight from Student Scores for Some. Chalkbeat. Retrieved from http://ny.chalkbeat.org/2014/05/14/appeal-process-in-new-evaluation- plan-shifts-weight-from-student-scores-for-some/#.U4-ViVNyjec 92 Goldstein, D. (2012, June 13). No More Ditching Gym Class. Slate. Retrieved from http://www.slate.com/articles/double_x/ doublex/2012/06/standardized_tests_for_the_arts_is_that_a_good_idea_. html 93 EngageNY. Overview of Student Learning Objectives. Retrieved from http://www.engageny.org/sites/default/fles/resource/attachments/ overview_of_student_learning_objectives.pdf 94 For one example, among many others, of this argument, see: https:// www.aft.org/pdfs/teachers/devmultiplemeasures.pdf 95 Worrell, C. (2013, October 25). In Teacher Evaluations, Student Data and Multiple Measures Show Progress. Data Quality Campaign. Retrieved from http://www.dataqualitycampaign.org/blog/2013/10/in- teacher-evaluations-student-data-show-progress/ 96 New York City Department of Education. NY State Policy Context: Education Law 3012-c. Retrieved from http://schools.nyc.gov/Ofces/ advance/Background/Policy+Context/default.htm 97 New York City Department of Education. (2014, April 9). Chancellor Faria Announces New Promotion Policy for Students in Grades 3-8. Retrieved from http://schools.nyc.gov/Ofces/mediarelations/ NewsandSpeeches/2013-2014Chancellor+Fari%C3%B1a+Announces+ New+Promotion+Policy+for+Students+in+Grades+3-8.htm 98 New York State Department of Education. (2013, June). Diploma/ Credential Requirements. Retrieved from http://www.p12.nysed.gov/ ciai/gradreq/diploma-credential-summary.pdf 99 Wall, P. (2013, November 14). Tougher Diploma Rules Leave Some Students in Graduation Limbo. Chalkbeat. Retrieved from http:// ny.chalkbeat.org/2013/11/14/tougher-diploma-rules-leave-some- students-in-graduation-limbo/#.U44Jm1Nyjec 100 Jacob, B. (2001, June). Getting Tough: The Impact of High School Graduation Exam. Educational Evaluation and Policy Analysis. Retrieved from http://epa.sagepub.com/content/23/2/99.short 101 Marchant, G., Paulson, S., (2005, January). The Relationship of High School Graduation Exams to Graduation Rates and SAT Scores. Education Policy Analysis Archives. Retrieved from http://fles.eric. ed.gov/fulltext/EJ846516.pdf 102 Grodsky, E., Warren, J., Kalogrides, D. (2009, May). State High School Exit Examinations and NAEP Long-Term Trends in Reading and Mathematics, 19712004. Education Policy. Retrieved from http://epx. sagepub.com/content/early/2008/06/13/0895904808320678.abstract 103 Reardon, S., Arshan, N., Atteberry, A., Kurlaender, M. (2010, December). Efects of Failing a High School Exit Exam on Course Taking, Achievement, Persistence, and Graduation. Education Evaluation and Policy Analysis. Retrieved from http://epa.sagepub.com/ content/32/4/498 104 Baker, O., Lang, K. (2013, June). The Efect of High School Exit Exams on Graduation, Employment, Wages, and Incarceration. National Bureau of Economic Research. Retrieved from http://www.nber.org/ papers/w19182 105 Darling-Hammond, L., Rustique-Forrester, E., Pecheone, R. (2005). Multiple Measures Approaches to High School Graduation. The School Redesign Network at Stanford University. Retrieved from https:// edpolicy.stanford.edu/sites/default/fles/publications/multiple-measures- approaches-high-school-graduation.pdf 106 New York State Education Department. (2014). Graduation Rate Data. Retrieved from http://www.p12.nysed.gov/irs/pressRelease/20140623/ home.html N O T E S THE 2014 EDUCATORS 4 EXCELLENCE NEW YORK TEACHER POLICY TEAM ON TESTING AND ASSESSMENT Trevor Baisden Founding Fifth-Grade ELA and History Lead Teacher Success Academy Bronx 2 Middle School Elizabeth Barrett-Zahn Kindergarten to Fifth-Grade Science Facilitator Columbus Elementary School, New Rochelle Rachael Beseda First-Grade Special Education Teacher Global Community Charter School Ezekiel Cruz Ninth- to 12th-Grade Social Studies Teacher Manhattan Bridges High School Suraj Gopal Ninth-Grade STEM Special Education Teacher Hudson High School of Learning Technologies Vivett Hemans English and Language Arts Teacher Eagle Academy for Young Men of Southeast Queens Maura N. Henry Sixth- to 12th-Grade ESL Teacher The Young Women's Leadership School of Astoria Michelle Knifn Ninth- to 12th-Grade Math Teacher High School of Telecommunication Arts and Technology Jason Koo Math Teacher Albert Einstein Junior High School I.S. 131 Christine Montera Social Studies Teacher East Bronx Academy for the Future Liliana Ruiz Sixth- to Eighth-Grade Special Education Teacher Bea Fuller Rodgers School I.S. 528 Charlotte Steel Seventh-Grade Math Teacher Booker T. Washington M.S. 54 Blackfoot U-Ahk Fourth- and Fifth-Grade Teacher of Students with Severe Emotional Disabilities Coy L. Cox School P.369k Iris Won Ninth- to 12th-Grade Mathematics and Technology Teacher Renaissance High School for Musical Theater & Technology This report, graphics, and fgures were designed by Kristin Girvin Redman and Tracy Harris at Cricket Design Works in Madison, Wisconsin. The text face is Bembo Regular, designed by Stanley Morison in 1929. The typefaces used for headers, subheaders, and pull quotes are Futura Bold, designed by Paul Renner, and Museo Slab, designed by Jos Buivenga. Figure labels are set in Futura Regular, and fgure callouts are set in Museo Slab. For far too long, education policy has been created without a critical voice at the tablethe voice of classroom teachers. Educators 4 Excellence (E4E), a teacher-led organization, is changing this dynamic by placing the voices of teachers at the forefront of the conversations that shape our classrooms and careers. E4E has a quickly growing national network of educators united by our Declaration of Teachers Principles and Be liefs. E4E members can learn about education policy and re search, network with like- minded peers and policymakers, and take action by advocating for teacher-created policies that lift student achievement and the teaching profession. Learn more at Educators4Excellence.org.