Gradient Boosting Public Data Modeling for the Policy Planning in Education




Yu, June

Journal Title

Journal ISSN

Volume Title



Teacher retention rate and student learning gain rates in the U.S. public school systems plummeted during COVID-19 pandemic, erasing years of improvements. In this body of research, we collect, integrate, and analyze all available public data in the data science pipeline to see if public data can inform and impact the factors of teacher attrition and learning loss. This is the first known study of the public data to address the post-COVID educational policy crisis from a data science perspective. To this end, we have developed an end-to-end large-scale educational data modeling pipeline that (i) integrates, cleans, and analyzes educational data; (ii) implements automated attribute importance analysis to draw meaningful conclusions; and (iii) develops a suite of interpretable teacher attrition and learning loss prediction models utilizing all data points and attributes. We demonstrate a novel data-driven approach to discover insights from a large collection of heterogeneous public data sources and to offer an actionable understanding to policymakers about the (1) recruitment and retention of public teachers, and (2) identifying learning loss tendencies and prevention of them in public schools.



educational data science, teacher attrition, teacher retention, learning loss, predictive modeling, education policy, tabular data, machine learning, gradient boosting, education policy


Yu, J. (2022). Gradient boosting public data modeling for the policy planning in education (Unpublished thesis). Texas State University, San Marcos, Texas.


Rights Holder

Rights License

Rights URI