Screening for Reading Problems in an RTI Framework

With the increased use of Response to Intervention (RTI) as a framework for the early identification and prevention of reading problems comes a growing interest in improving how students are screened for potential reading difficulties. The primary purpose of screening in an RTI framework is to identify those students who without further intervention will be likely to develop reading problems at a later time. Screening measures are generally characterized by the administration of brief assessments that are strongly predictive of the condition we are attempting to identify and that typically result in classification into one of two groups: (a) those who are at risk for developing the condition (in this case, poor reading outcomes) and (b) those not at risk.


A screening instrument is judged primarily by its ability to accurately categorize students into these two groups. However, no screen will be 100% accurate. Therefore, the goal is to minimize the number of misclassified cases (StatSoft, Inc., 2007). Accurately identified students can be either true positives (TPs; those correctly identified as at risk) or true negatives (TNs; those correctly identified as not at risk). Misclassified cases are either false positives (FPs; those students identified as at risk who later perform satisfactorily on reading outcomes) or false negatives (FNs; those students not identified by the screen as at-risk but who later perform poorly on reading outcomes). Screening results in the categorization of students into one of these four groups (see Figure 1). From this 2 × 2 table, we can calculate several statistics that provide an overall indication of a screening instrument's utility.


The first statistic is the classification accuracy. Classification accuracy is simply the total number of correctly classified cases (TP + TN) divided by the total number of students screened (TP + TN + FP + FN). In the example in Figure 1, 65 of 85 students have been correctly classified, resulting in 76% classification accuracy. The next useful statistic is sensitivity. Sensitivity is the proportion of students at risk who are correctly identified as such by the screen. It is calculated by dividing the number of true positives by the total number of students at risk (TP + FN). In Figure 1, 15 of 25 students at risk have been identified, resulting in 60% sensitivity. Finally, a screening instrument's specificity tells us the percentage of students not at risk who are correctly identified by the process. It is calculated by dividing the number of true negatives by the total number of students not at risk (TN + FP). In Figure 1, 50 of 60 students have been correctly identified as not at risk, or 83% specificity.


Figure 1: A 2 × 2 table of screening results


In an RTI framework, the goal of screening for reading problems is to have very few false negatives by using instruments that yield true-positive rates approaching 100% (Compton, Fuchs, Fuchs, & Bryant, 2006; Jenkins, 2003; Jenkins & Johnson, 2008). In other words, we want a screening instrument to identify all or nearly all of the students who are at risk. This must be balanced, however, by maintaining a manageable number of false positives. Errors will always occur during a screening process, but there is little consensus on what acceptable levels of accuracy and error are. Most practitioners would agree that minimizing false negatives is paramount. Students who are at risk for poor reading outcomes who do not receive intervention early on may continue to develop reading problems that later become intractable. In reading, this has been termed the "Matthew effect" (Stanovich, 1986), and it is precisely this phenomenon that an RTI process can prevent through the focus on early identification and intervention.


However, overidentification of students at risk presents a significant challenge for schools. False positives accrue a cost that is difficult to discern. Though most practitioners would argue that little harm is done to the student who receives an intervention that was not absolutely necessary, a recent meta-analysis of research on reading interventions for students in grades K–3 indicates that moderate to large gains in reading achievement were achieved when teacher-to-student ratios were no more than 1:5, and most interventions used groupings of 1:1 or 1:3 (Scammacca, Vaughn, Roberts, Wanzek & Torgesen, 2007). Identifying too many false positives may negatively impact the efficacy of intervention efforts by forcing intervention program ratios to greatly exceed these numbers.


What Factors Have an Impact on Accuracy?


Numerous factors make accurate screening for reading problems challenging. These include the amount of time between screening and outcome measures, the complexity of the reading construct, and the reliance on brief measures to predict these complex outcomes. Each of these is discussed in turn.

Time Between Screening and Outcome. In an RTI model, screening predicts a future state: Will the student be successful on the end of year reading test? This gap in time presents challenges for screening procedures because high predictive validity is much more difficult to obtain than high concurrent validity. Screening usually takes place in the beginning of the school year, and it attempts to predict outcomes on assessments given at the end of the school year. In the time between screening and outcome measures, instruction, absences, maturation, and other variables occur with variation at individual rates that make predictions very difficult. Predicting outcomes as close to the time period as they occur can help a screening process be more accurate, but if we do not identify students at risk early enough, there may not be sufficient time for intervention efforts to be successful, as students at risk for reading problems face the "tyranny of time" (Kame'enui, 1998).


One approach to improving screening accuracy has been to limit the amount of time between the prediction and outcome by linking progress-monitoring (PM) benchmarks to state outcomes, using a sequential rearward benchmarking approach (Hintze & Silberglitt, 2005). This allows a PM measure to serve as a predictor for a later PM measure and so on, until the criterion measure is administered. For example, a fall PM measure predicts performance on a winter PM measure, which predicts performance on an early spring PM measure, which is linked to the state assessment. The benchmarking procedure provides a helpful guide for monitoring progress and adjusting intervention plans based on individual student growth, and it sets useful interim goals for predicting student outcomes. However, the research demonstrates that even with the close proximity of the spring PM measure to the state outcome measure, classification accuracy does not approach the optimal levels for an RTI framework—sensitivity was 79% and specificity was 76% (Hintze & Silberglitt, 2005).

The Complex Nature of Reading. Reading is a complex construct that is composed of many different components, the most familiar of which are outlined by the National Reading Panel as the "Big 5"—phonemic awareness, decoding, vocabulary, comprehension, and fluency (National Institute of Child Health and Human Development, 2000). Individual differences in important aspects of reading ability mean that students with reading difficulties may have different strengths and weaknesses in one or more of the Big 5 components. More specifically, students may require targeted instruction in specific aspects of reading (e.g., vocabulary knowledge) to make optimal progress in reading achievement. However, many screening instruments assess only one component of reading and therefore may not identify students with difficulties in other aspects of reading. Initial screening results require a thorough follow-up assessment of students' reading ability to inform intervention development and implementation.


Research indicates that the use of a combination of measures that assess different components can improve the accuracy of a screening process (Compton et al., 2006; Johnson, Jenkins, Petscher, & Catts, in press; O'Connor & Jenkins, 1999). For example, Johnson et al. (in press) found that including a vocabulary assessment with an oral reading fluency (ORF) assessment significantly increased classification accuracy when predicting performance on a third grade state assessment. Although the combination of measures improves classification accuracy in a prediction model, in practice, it is difficult to discern the combination of cut scores and patterns of performance that would identify a student as at risk for not passing the criterion measure. For one student an ORF score of 100 might be an important predictor of criterion performance, but for another student this isn't critical because this student has a high score on another measure that compensates for the low ORF.

Reliance on Brief Measures to Predict Reading Outcomes. Screening is typically characterized by the administration of brief measures that are easy to implement reliably. A typical screening task for reading includes the use of ORF. Research on ORF shows that while it is highly correlated with overall reading comprehension, the correlation is not perfect (Fuchs, Fuchs, Hosp, & Jenkins, 2001). In research that examines what factors account for individual differences in reading ability, ORF accounts for a significant portion, but not all, of the variation in performance. In earlier grades, word identification fluency (WIF) is also a strong predictor of early reading ability, but again, not a perfect one (Fuchs, Fuchs, & Compton, 2004; Schatschneider, 2006) This means that screening instruments that rely on single measures such as ORF or WIF will provide fairly good information about student performance, but alone they can't provide information at levels needed for an effective RTI process (>90% sensitivity; >80% specificity).


Achieving high accuracy levels from a single, brief screening instrument is difficult, yet current school-based models of RTI make achieving high accuracy critical. Many schools move a student straight from screening to intervention placement. This approach to screening and intervention has been described as a "direct route" (DR) model (Jenkins, Hudson, & Johnson, 2007; Jenkins & Johnson, 2008). The DR approach to screening and intervention is expedient, but it requires even greater accuracy levels from a screening instrument since there is no further assessment to correct initial classification errors (Johnson, Humphrey, & Mclenna, 2008). The DR approach based on single screening measures may compromise classification accuracy and, more importantly, may compromise the investigation of the student's specific difficulties so that interventions aligned with student needs can be provided.


What Screening Approaches Result in Higher Accuracy?


Four main approaches to improving the accuracy of screening procedures have been described in the literature. These include the following:


  1. Using an assessment battery comprising various component skills (Compton et al., 2006; Davis, Lindo, & Compton, 2007; Johnson et al., in press; O'Connor & Jenkins, 1999). When multiple measures are used to classify students as at risk for reading problems, the accuracy of the classification improves significantly. In some studies (e.g., O'Connor & Jenkins, 1999), the sensitivity levels obtained through this process have reached nearly 100%.
  2. Following initial screening results with PM. Studies (e.g., Compton et al., 2006) have shown that following students initially identified through screening as in need of intervention with PM improves the accuracy of the initial measure.
  3. Relying on a "multiple gate-keeping" process, where students identified initially as at risk are further assessed to determine the full nature and extent of their reading difficulties. This is the process used by the Texas Primary Reading Inventory (TPRI) developed by Foorman and colleagues (Foorman et al., 1998).
  4. Dynamic assessment procedures in which student response to instruction during the assessment is measured (Fuchs et al., 2007). Dynamic assessment is not yet widely used and processes and instruments are still being researched, but it shows strong promise for use within an RTI framework.

What Tools Are Available for Screening for Reading Problems?


In this overview, we've summarized and discussed the challenges of screening for reading problems in an RTI framework. In three forthcoming related articles, we will review existing screening measures targeted at different grade levels; P–K, Grades 1–3, and Grades 4–12. While the reviews are not exhaustive, they are meant to serve as guides to help practitioners understand the benefits and limitations of a variety of instruments so that practitioners may become more critical consumers of screening measures as they move forward with RTI implementation.




Compton, D. L., Fuchs, D., Fuchs, L. S., & Bryant, J. D. (2006). Selecting at-risk readers in first grade for early intervention: A two-year longitudinal study of decision rules and procedures. Journal of Educational Psychology, 98, 394–409.


Davis, G. N., Lindo, E. J., & Compton, D. (2007). Children at-risk for reading failure: Constructing an early screening measure. Teaching Exceptional Children, 39 (5), 32–39.


Foorman, B. R., Francis, D. J., Fletcher, J. M., Schatschneider, C., & Mehta, P. (1998). The role of instruction in learning to read: Preventing reading failure in at-risk children. Journal of Educational Psychology, 90, 37-55.


Fuchs, L. S, Fuchs, D., & Compton, D. L. (2004). Monitoring early reading development in first grade: Word identification fluency versus nonsense word fluency. Exceptional Children, 71, 7–21.


Fuchs, D., Fuchs, L. S., Compton, D. L., Bouton, B., Caffrey, E., & Hill, L. (2007). Dynamic assessment as responsiveness to intervention. Teaching Exceptional Children, 39 (5), 58–63.


Fuchs, L.S., Fuchs, D., Hosp, M. K., & Jenkins, J. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical and historical analysis. Scientific Studies of Reading, 5 (3), 239–256.


Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of the diagnostic accuracy and predictive validity of R-CBM and high stakes testing. School Psychology Review, 34 (3), 372–386.


Jenkins, J. R. (2003, December). Candidate measures for screening at-risk students. Paper presented at the National Research Center on Learning Disabilities Responsiveness-to-Intervention Symposium, Kansas City, MO. Retrieved April 3, 2006.


Jenkins, J. R., Hudson, R. F., & Johnson, E. S. (2007). Screening for service delivery in an RTI framework: Candidate measures. School Psychology Review, 36, 582–599.


Jenkins, J. R., & Johnson, E. S. (2008). Universal screening for reading problems: Why and how should we do this? Retrieved April 16, 2008.


Johnson, E. S., Humphrey, M., & Mclenna, R. (2008, Spring). How should we screen for reading problems? Academic Exchange Quarterly, 12 (1), 105–109.


Johnson, E. S., Jenkins, J. R., Petscher, Y., & Catts, H. W. (in press). How can we improve the accuracy of screening instruments? Learning Disabilities Research and Practice.


Kame'enui, E. J. (1998). The rhetoric of all, the reality of some, and the unmistakable smell of mortality. In J. Osborn & F. Lehr (Eds.), Literacy for all: Issues in teaching and learning (pp. 319–338). New York: Guilford.


National Institute of Child Health and Human Development. (2000). Report of the National Reading Panel: Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Reports of the subgroups. Washington, DC: U.S. Government Printing Office.


O'Connor, R. E., & Jenkins, J. R. (1999). Prediction of reading disabilities in kindergarten and first grade. Scientific Studies of Reading, 3 (2), 159–197.


Scammacca, N., Vaughn, S., Roberts, G., Wanzek, J., & Torgesen, J. K. (2007). Extensive reading interventions in grades K-3: From research to practice. Portsmouth, NH: RMC Research Corporation, Center on Instruction.


Schatschneider, C. (2006). Reading difficulties: Classification and issues of prediction. Paper presented at the Pacific Coast Regional Conference, San Diego, CA.


Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21, 360–407.


StatSoft, Inc. (2007). Electronic statistics textbook. Tulsa, OK: StatSoft. Retrieved August 1, 2008.

Back To Top