Students primarily use generative AI for creative content generation and language support, with emerging applications in personal development areas like career guidance and emotional support that have received little academic attention.
Objective: The main goal of this study was to conduct a comprehensive analysis of how K-12 students and teachers actually use generative AI in real-world educational settings by systematically categorizing their interactions across both content themes and task types. The researchers aimed to bridge the gap between theoretical potential and classroom practice by examining over 17,000 anonymized messages to identify patterns, emerging applications, and provide evidence-based insights for educators, policymakers, and researchers about authentic GenAI usage in educational contexts.
Methods: The study employed a novel hierarchical topic modeling approach using state-of-the-art large language models (LLMs) to analyze interaction data from a pilot program across multiple schools in a German-speaking European country spanning 11 months (August 2024 to June 2025). The dataset comprised 17,294 individual messages from 1,339 conversations across various subjects including mathematics, literature, and science. Researchers used ChatGPT-4o accessed through specialized educational software rather than the standard web interface. The methodology involved two-dimensional categorization: content analysis focused on the 400 most common nouns to identify discussion topics, while task analysis examined initial prompts from 1,014 unique conversations after removing duplicates. The researchers employed LLMs (ChatGPT 4.5 and o3-pro) for topic modeling with explicit instructions for hierarchical structuring, which proved superior to classical computational approaches like Latent Dirichlet Allocation (LDA). All data were anonymized to protect student privacy and translated to English using ChatGPT for presentation purposes.
Key Findings: The analysis revealed seven major content categories: Education & Learning, Nature & Environment, Family & Relationships, Arts & Entertainment, Career & Professional Development, Health & Wellbeing, and Culture & Society. Task categorization identified eight primary areas: Creative Content Generation (including poetry, storytelling, and songwriting), Language Learning & Grammar Support, Lesson & Curriculum Planning, Assessment & Feedback Support, Reflective & Experiential Writing, Factual & Explanatory Support, Visual & Multimedia Content Generation, and Motivational & Emotional Support. Most conversations (91%) contained 30 or fewer messages, with students using GenAI almost exclusively during school hours. The study found that while some categories like Education & Learning and Arts & Entertainment align with well-studied areas, others like Career & Professional Development, Family & Relationships, and Health & Wellbeing represent novel applications with significant potential impact. Creative tasks were particularly prominent, highlighting GenAI's role in fostering creativity and imaginative thinking. The research also demonstrated that modern LLMs significantly outperformed classical topic modeling approaches, providing better hierarchical categorization and human-aligned results.
Implications: This research provides crucial empirical evidence for the field of AI in education by revealing authentic usage patterns rather than hypothetical scenarios. The findings support educators in understanding student interests and aligning instructional resources accordingly, while informing curriculum designers about areas requiring greater support or enrichment. The content-based categorization enables identification of topics students engage with most frequently, providing insights into genuine interests. The task categorization confirms existing GenAI applications while revealing novel, emerging uses not yet documented in academic literature. The study highlights GenAI's potential to support not only academic learning but also personal development and social-emotional growth, particularly in areas like career guidance, health awareness, and emotional support. For policymakers, the research underscores the need for clear guidelines regarding GenAI use in sensitive personal topics and the importance of ensuring equitable access to these tools. The findings also demonstrate the methodological value of using LLMs for topic modeling, providing researchers with more effective tools for analyzing large text collections.
Limitations: The study acknowledges several important constraints that affect generalizability and interpretation. The dataset, while substantial, comes from schools within a single German-speaking European country under one school authority, limiting geographic and cultural diversity. The focus on classroom interactions during school hours excludes potential out-of-school usage patterns that might reveal different applications or disparities in access. The anonymization requirements, while necessary for ethical reasons, prevented the provision of raw data for reproducibility and follow-up research, despite sophisticated deanonymization risks. The study lacks quantitative assessment metrics and coding reliability measures that would enhance scientific rigor. The translation process from German to English, while validated on a sample, may have introduced subtle changes in meaning or eliminated regional linguistic characteristics. Additionally, the research does not address long-term educational outcomes or the sustained impact of GenAI usage on learning achievement, focusing instead on usage patterns and categorization.
Future Directions: The researchers recommend several critical areas for future investigation to build upon these foundational findings. Similar studies should be conducted in other countries and educational systems to uncover additional use cases and validate findings across different cultural and linguistic contexts. Future research should incorporate quantitative assessment metrics and longitudinal studies to examine the educational effectiveness and learning outcomes associated with different GenAI applications. Studies should investigate the development of robust digital literacy and critical thinking skills to help students evaluate AI-generated content, particularly given concerns about hallucinations and misinformation. Research is needed on ethical guidelines and responsible usage protocols, especially for sensitive applications involving health, emotional support, and career guidance. Future work should explore the equity implications of GenAI access, examining how disparities in out-of-school access might exacerbate educational inequalities. The methodological insights about LLM-based topic modeling should be further validated with larger datasets and different domains. Finally, research should investigate age-specific usage patterns by partitioning K-12 data into different developmental groups to understand how GenAI interaction patterns evolve with student maturity.
Title and Authors: "Thematic and Task-Based Categorization of K-12 GenAI Usages with Hierarchical Topic Modeling" by Johannes Schneider, Béatrice S. Hasler, Michaela Varrone, Fabian Hoya, Thomas Schroffenegger, Dana-Kristin Mah, and Karl Peböck.
Published On: 2025
Published By: International Conference on Computer-Human Interaction Research and Applications (CHIRA), arXiv preprint arXiv:2508.09997v1