Interpretable AI techniques reveal that K-12 online learning can be divided into three distinct stages with unique predictors of student risk, enabling targeted interventions at optimal points in the semester.
Objective: This study aimed to address three key research gaps in K-12 online education early warning systems: determining the best prediction timing based on course requirement changes, comparing early warning models with appropriate evaluation indicators, and interpreting complex predictive models using interpretable AI techniques to identify at-risk student types and enable personalized interventions.
Methods: The researchers analyzed data from 16,011 K-12 online school students, encompassing 1,491,497 behavior logs and 172,703 forum posting logs from 162 courses. The study employed a multi-step approach: (1) Behavioral data was aggregated weekly and categorized into eight variables (assignment submission, quiz/test taking, grade checking, discussion participation, posting frequency, reply frequency, post word count, and reply word count); (2) A variational autoencoder and time series segmentation were used to detect changes in learning patterns resulting from course requirement changes; (3) The semester was divided into three stages based on learning pattern transition points; (4) Complex ensemble machine learning classifiers (Random Forest, Gradient Boosting, LightGBM, and XGBoost) were employed to predict at-risk students at each stage; (5) Interpretable AI techniques, particularly SHAP (SHapley Additive exPlanations), were used to explain the predictive model results and identify at-risk student types.
Key Findings:
- The semester was naturally divided into three distinct learning stages: Stage 1 (weeks 1-5), Stage 2 (weeks 6-12), and Stage 3 (weeks 13-18), with each stage having different key behavioral predictors reflecting changing course requirements.
- XGBoost was the best-performing algorithm across all stages, with recall rates of 71% in Stage 1, 75% in Stage 2, and 81% in Stage 3, demonstrating that early warning can be most effectively conducted at the end of each stage.
- A student's at-risk probability from a previous stage was the strongest predictor of performance in subsequent stages, suggesting the persistence of learning patterns over time.
- Five types of at-risk students were identified at each stage: high-engaged at-risk (students with high activity but missing key behaviors), low-engaged at-risk (students with minimal participation), testing at-risk (students who only check grades and take tests), low interaction at-risk (students with minimal discussion participation), and un-persistent at-risk (students with inconsistent learning patterns).
- Key behavioral predictors shifted across stages: in Stage 1, assignment submission and discussion reply word counts were most important; in Stage 2, grade checking became most important; and in Stage 3, assignment submission, grade checking, and quiz taking were critical.
- Stage 2 emerged as the most critical intervention point, as it could identify 75% of at-risk students and represented students' last opportunity to reverse negative learning trajectories.
- Low-engaged students tended to maintain their disengagement across all stages, while other at-risk types showed more strategy adjustments and potential for improvement.
Implications: This research demonstrates that complex machine learning models, when properly interpreted, can provide valuable insights into K-12 online learning patterns and risk factors. The findings suggest that early warning systems should consider both the timing of predictions (aligned with course design) and the specific types of at-risk behaviors when designing interventions. The identification of five distinct at-risk types enables more personalized support strategies tailored to students' specific learning challenges. By incorporating interpretable AI techniques, the study bridges the gap between complex predictive algorithms and actionable educational insights, making the early warning system more transparent and useful for teachers, administrators, and parents in the K-12 online learning environment.
Limitations: The study does not evaluate the effectiveness of interventions based on the early warning predictions and at-risk type identification. While the analysis identifies optimal prediction points and at-risk categories, it does not test whether interventions targeting these specific patterns actually improve student outcomes. Additionally, the research is limited to analyzing behavioral data without incorporating other potentially important factors such as prior academic performance, demographics, or psychological factors that might influence online learning success.
Future Directions: The researchers suggest developing and testing personalized interventions based on the identified at-risk types, observing whether these interventions can influence students' learning strategies and guide them to better align with course requirements. They mention that they are currently working with state-level K-12 online schools to implement these early warning models and collaborate with teachers on personalized interventions. Future studies should focus on evaluating the effectiveness of these early warning systems and interventions, particularly in improving students' engagement levels and learning outcomes.
Title and Authors: "Interpretable AI techniques unveil the factors and types of at-risk early warning: a case study in K-12 online learning" by Jui-Long Hung, Kerry Rice, and Mingyan Zhang.
Published On: March 31, 2025
Published By: Data Technologies and Applications (Emerald Publishing Limited)