A data-driven framework for identifying high school students at risk of not graduating on time

Published in Bloomberg Data for Good Exchange Conference, 2015

Some students, for a variety of factors, struggle to complete high school on time. To address this problem, school districts across the US use intervention programs to help struggling students get back on track academically. Yet in order to best apply those programs, schools need to identify off-track students as early as possible and enroll them in the most appropriate intervention. Unfortunately, identifying and prioritizing students in need of intervention remains a challenging task. This paper describes work that builds on current systems by using advanced data science methods to produce an extensible and scalable predictive framework for providing partner US public school districts with individual early warning indicator systems. Our framework employs machine learning techniques to identify struggling students and describe features that are useful for this task, evaluating these techniques using metrics important to school administrators. By doing so, our framework, developed with the common need of several school districts in mind, provides a common set of tools for identifying struggling students and the factors associated with their struggles. Further, by integrating data from disparate districts into a common system, our framework enables cross-district analyses to investigate common early warning indicators not just within a single school or district, but across the US and beyond.