This is article has been written by guest author Terry Salinger of American Institutes for Research. Terry is an AIR Institute Fellow and chief scientist for literacy research. For over 15 years at AIR, she has conducted research that has focused on interventions for struggling readers of all ages; teacher professional knowledge of instructional practice in reading; and measurement of literacy skills through both standardized and informal assessments.
In simple terms, evaluations like the one conducted by the American Institutes for Research (AIR) as part of CLI’s i3 grant compare schools that receive specific programs (the treatment) with schools that receive business-as-usual services. In the CLI study, all kindergarten to grade 2 teachers in study schools received the same business-as-usual services from their schools and districts, with the treatment teachers receiving CLI resources and services. AIR researchers collected data from CLI and comparison schools to look for indications of whether the presence of CLI services seemed to have a positive impact on teachers or students.
And indeed, data from the second year of implementation showed several statistically significant differences between CLI and comparison schools on variables such as teacher practice and kindergarteners’ overall reading achievement as measured by a valid and reliable test. These were important findings and very affirming of CLI’s promise for improving teaching and learning.
…CLI’s i3 work has shown statistically significant results while meeting the standards of high quality research, which in today’s education climate is difficult to achieve.
Terry Salinger – AIR
The findings were also important because evaluations of early reading interventions and other content areas rarely find significant differences between treatment and comparison conditions. A 2003 meta-analysis found only nine studies out of over 1,300 that met standards for high quality, rigorous research, and this shortage of well-designed studies makes it difficult to generalize how strong an impact professional development can really have on teacher and student outcomes. Studies that didn’t meet the standards for rigor may have had positive results, but their findings are not reliable because of flaws in their research design or in their approach to measuring outcomes.
Criteria for high quality research are relatively easy to understand. The first is randomization of a large number of schools (or other groups) that are equivalent on important characteristics such as size, demographics, achievement, language or SES status to treatment or control conditions. Doing this provides as much assurance as possible that introducing a program into the treatment schools will indeed be a change to the status quo. The second is selecting or creating valid and reliable assessments or other tools to measure the outcomes of particular interest, such as student achievement or teacher practice. The third is the duration of the evaluation: new educational programs or interventions must be given enough time to allow changes to happen.
But even the best-designed study cannot stop the inevitable and often rapid changes in schools and districts. Teacher, student, and administrator mobility is high, and when a program is intended to have a cumulative impact over time, high mobility results in an experience drain that threatens outcomes. There’s also a program clutter issue: schools confronting poor student performance on state reading tests often search for whatever is new or special or extra, rather than focusing on the slow and steady process of building internal capacity among teachers. These other programs can cloud the story evaluation data tell about treatment and comparison schools, making it difficult to interpret whether the focal program has produced real change.
Other challenges involve program dosage and fidelity. Most developers set the threshold amount of exposure to a program needed for effectiveness – such as certain number of hours of PD or coaching. They also specify instructional or other practices that must be followed with fidelity to achieve desired outcomes. Evaluators and program developers hope that practitioners respect dosage and fidelity recommendations, but they aren’t inside every treatment classroom giving teachers daily reminders about coaches’ visits or instructional best practices. Low levels of dosage or fidelity detract from the likelihood that an intervention will show its full potential.
Finally there’s the reality that the business-as-usual PD and training or overall instructional procedures in study districts may be strong in and of themselves. In this case, all teachers are gaining the support they need to improve their skills, and the value an intervention that treatment teachers receive must be huge to impact the outcomes of interest.
Schools are places of constant flux. Teachers and administrators know this and accept the messiness as routine. The interplay of these factors doesn’t go away when an evaluation starts. Evaluators like to assume that the messiness will be equally distributed across treatment and comparison schools, but evaluators and the people who read their report need to recognize how these dynamic factors of school life may have influenced results.
Unlike the bulk of early literacy evaluations that don’t meet rigorous research standards or achieve significant results, the evaluations AIR has performed surrounding CLI’s i3 work has shown statistically significant results while meeting the standards of high quality research which in today’s education climate is difficult to achieve.