Generative AI is transforming K–12 learning by enabling personalized, dialogic, and creative educational experiences, yet success depends critically on aligning learning goals with appropriate activities, human-AI role configurations, and careful attention to risks including hallucinations, over-reliance, and equity concerns.
Objective: This systematic review examined how generative AI (GenAI) is being used in K–12 education (ages 3–18) by synthesizing 84 empirical studies published between 2020 and 2025. The researchers used Biggs's 3P model (Presage–Process–Product) combined with CIMO logic (Context–Intervention–Mechanism–Outcome) to systematically map learning objectives, pedagogical activities, AI role paradigms, and learning outcomes. The study aimed to move beyond tool-centric or macro-level analyses by providing a theoretically grounded, classroom-operational account of how GenAI reshapes teaching and learning across diverse subject areas and settings.
Methods: The researchers conducted a PRISMA-guided systematic review, searching Scopus and Web of Science databases in May–July 2025. They included peer-reviewed empirical studies involving learners aged 3–18 who used GenAI tools (ChatGPT, DALL·E, LLMs, etc.) in formal or informal learning contexts. After removing duplicates and screening 605 records, they retained 84 studies that scored ≥3 on the Mixed Methods Appraisal Tool (MMAT) quality assessment.
The analytic framework integrated the 3P model with CIMO logic: Presage mapped learning objectives using Bloom's taxonomy and 21st-century skills frameworks, later incorporating generative learning theory and situated learning perspectives; Process examined learning activities (initially using UNICEF's framework, later adopting activity theory) and AI roles (starting from Ouyang and Jiao's three paradigms—AI-directed, AI-supported, AI-empowered—and enriching them with distributed cognition theory); Product analyzed outcomes, reframing the traditional OECD knowledge-skills-attitudes structure through constructivist and sociocultural lenses as Epistemic, Practice, and Affective/Identity outcomes.
Two researchers independently coded all studies using this scheme, achieving over 85% initial inter-rater agreement before reaching full consensus through discussion. The coding captured learning objectives, activity patterns, human-AI role configurations, and both opportunities and risks in outcomes.
Key Findings:
Learning Objectives (Presage): Seven recurring objectives emerged across the corpus. Language and literacy enhancement (26 studies) focused on deep textual understanding, scaffolded support, generation-revision cycles, and equity for learners with dysgraphia or minority language backgrounds. STEM inquiry and practice emphasized understanding abstract concepts, developing computational thinking, and authentic problem-solving. Additional objectives included creativity and artistic expression, social-emotional skills and collaboration, motivation and affect regulation, feedback literacy and self-regulated learning, and AI literacy with ethical reasoning. Notably, objectives reflected situated and generative views of learning rather than traditional "basics-then-application" sequencing.
Learning Activities (Process): Five dominant activity patterns were identified. AI-facilitated dialogic tutoring involved multi-turn question-answer exchanges with immediate feedback across subjects. Generative iterative co-creation engaged students as creators who repeatedly refined text, images, or multimedia with AI through prompt adjustments and teacher scaffolding. Project-based problem-solving featured sustained inquiry cycles including problem definition, research design, debugging, and collaborative argumentation. Simulation and game-based learning situated learners in immersive, narrative-rich virtual worlds with AI as game characters or design partners. Formative feedback and adaptive scaffolding provided immediate, personalized feedback with tiered difficulty adjustment, creating practice-feedback-revision cycles.
AI Roles (Process): The review documented a clear evolution in AI roles. In AI-directed configurations, GenAI orchestrated instruction through cue-response-feedback chains, acting as grader, just-in-time tutor, or materials planner, with students primarily reacting to system prompts while teachers maintained quality oversight. In AI-supported configurations, learning activities were accomplished through coordination of humans, AI, and artifacts, with AI serving as cognitive scaffold, co-creator, or reflective evaluator—providing dynamic support that faded as competence grew. In AI-empowered configurations (fewer studies), learner agency was central, with GenAI functioning as an augmentation to human intelligence within negotiated partnerships; students actively set prompts and controlled generation while AI continuously adapted to individual needs, and teachers facilitated rather than directed.
The distribution across activities varied: dialogic tutoring featured 10 AI-directed, 15 AI-supported, and 1 AI-empowered studies; generative co-creation showed 11 AI-directed, 7 AI-supported, and 11 AI-empowered; project-based problem-solving had 2 AI-directed, 6 AI-supported, and 6 AI-empowered; formative feedback displayed 10 AI-directed, 6 AI-supported, and 1 AI-empowered; and simulation/game-based learning exhibited 4 AI-directed, 1 AI-supported, and 1 AI-empowered studies.
Learning Outcomes (Product): Affective and identity outcomes were most frequently reported (60+ studies), followed by practice outcomes (55+ studies) and epistemic outcomes (50+ studies). Each domain showed both opportunities and risks.
Epistemic outcomes (knowledge construction and reasoning) showed opportunities including conceptual depth across STEM and humanities, enhanced critical thinking, and provenance-aware sense-making through high-quality concept maps, analogies, and personalized feedback. Risks included surface interaction when pedagogy was weak, overtrust leading to misinformation acceptance, and cognitive overload from insufficient explanatory depth. Studies emphasized that hallucinations could be either hazards or teachable moments depending on instructional design.
Practice outcomes (situated participation in authentic activities) demonstrated improved practice quality in writing, problem-solving, and experimental design, fostering of generative practices beyond "right answers," and enhanced social collaboration among peers and families. Risks involved shallow engagement creating illusions of participation, over-reliance weakening originality (with some traditional instruction outperforming AI-integrated courses), and hindered development of core competencies when use was decoupled from authentic community practices.
Affective and identity outcomes (motivation, self-understanding, agency) showed sustained engagement with higher enthusiasm and satisfaction, increased self-efficacy particularly in programming and mathematics, and stronger agency as learners shifted from passive consumers to active creators. Risks included novelty effects that faded over time, opaque or inconsistent feedback causing confusion and frustration, and ethical/trust concerns around integrity, age-appropriateness, and technical limitations.
Implications: The findings advance both theory and practice in AI education research. Theoretically, the study demonstrates that learning objectives shape activity patterns and role allocation, which in turn determine outcomes—a dynamic systems view where agency is reallocated across students, teachers/caregivers, and AI through feedback loops rather than vested in any single actor. The review documents a shift from AI-as-tool toward human-AI ensembles involving co-regulation and distributed cognition, though most cases have not yet reached full hybrid intelligence.
The reframing of outcomes as Epistemic, Practice, and Affective/Identity—grounded in constructivist and sociocultural learning theories—clarifies GenAI-specific risks within theoretical context: hallucinations threaten epistemic outcomes by undermining knowledge construction; illusions of participation hinder practice outcomes when students use AI as mere answer generators; and technical/ethical limitations impact affective well-being and trust.
For practice, the researchers propose three key recommendations: (1) Teacher professional development must focus on designing tasks fostering authentic disciplinary practice and critical epistemic inquiry, not just technical skills, orchestrating human-AI collaboration that avoids superficial engagement. (2) Feedback literacy should center on a calibrate-critique-act cycle, treating AI output as testable hypotheses rather than final truths. (3) Ethical guardrails remain essential, including age-appropriate privacy protections, routine audits for bias and hallucination, and clear integrity protocols.
The study offers a goal-activity-role alignment heuristic for instructional design: different learning goals determine appropriate activities and AI participation levels. For example, language/literacy and STEM objectives most often employ dialogic tutoring, generative co-creation, or project-based problem-solving with AI-supported/empowered roles, producing practice and epistemic outcomes. Feedback literacy objectives typically link to formative feedback with AI-directed roles, promoting affective/identity outcomes. Motivation, social-emotional, and creativity objectives usually involve simulation/game-based learning with diverse AI roles producing outcomes across all three categories.
Limitations: The review acknowledges several important constraints. All 84 studies involved convenience sampling from specific contexts, and most focused on short-term interventions rather than longitudinal impact, making it difficult to assess sustained effects on cognitive, skill, and attitudinal development. Few studies systematically assessed algorithmic bias or privacy risks, potentially underestimating negative outcomes. The review did not distinguish between formal and informal learning environments or among different age groups (3–18), which may mask moderating effects of context and developmental level. The reliance on peer-reviewed empirical studies from 2020–2025 means rapidly evolving GenAI capabilities may not be fully captured.
Research corpus limitations include that the AIEd literature remains skewed toward higher education, with rigorous K–12 empirical studies remaining relatively sparse. Most AI research in K–12 still privileges traditional AI (e.g., Intelligent Tutoring Systems) over GenAI, and many studies predated public release of powerful large language models. Data were drawn exclusively from Scopus and Web of Science databases using English-language publications, potentially missing relevant work in other languages or databases.
Future Directions: The researchers outline several critical research priorities. Longitudinal and cross-cultural studies are needed to trace how teachers' and students' perceptions and practices with AI evolve over extended periods, especially as classroom experience increases, and to assess whether findings hold across different educational systems, cultural contexts, and developmental stages (elementary vs. middle vs. high school).
Content-specific research should investigate how AI supports unique disciplinary practices—for example, AI-assisted mathematical problem-solving versus AI-enhanced writing feedback versus AI-powered historical analysis—to provide more tailored guidance for tool development and professional learning suited to each subject area's distinct epistemologies and pedagogies.
Microprocess studies using co-design approaches with all stakeholders (students, teachers, parents, developers) are needed to identify specific scaffolding techniques that foster co-regulated human-AI partnerships and understand critical dynamics of when AI should provide support versus handing off to peers and teachers. This includes examining how feedback loops actually operate in practice and how agency negotiation unfolds in real time.
Assessment paradigm development is crucial—creating and validating new frameworks capable of capturing complex, process-oriented learning identified in this review, particularly epistemic outcomes and shifts in learner identity, moving beyond traditional knowledge-acquisition metrics.
Equity and access research should systematically examine how GenAI integration affects different student populations, including learners with disabilities, English language learners, students from low-resource schools, and those from diverse cultural backgrounds, to ensure tools reduce rather than exacerbate existing educational disparities.
Ethical and governance studies need to rigorously assess algorithmic bias, privacy risks, and age-appropriateness across developmental stages, informing policy frameworks for responsible K–12 AI adoption.
Title and Authors: "A Systematic Review of Generative AI in K–12: Mapping Goals, Activities, Roles, and Outcomes via the 3P Model" by Xiaoling Lin and Hao Tan (School of Design, Hunan University, Changsha, China).
Published On: Received August 14, 2025; Revised September 13, 2025; Accepted September 23, 2025; Published September 25, 2025.
Published By: Systems journal, Volume 13, Issue 840, 2025. Published by MDPI, Basel, Switzerland. DOI: https://doi.org/10.3390/systems13100840. This is an open access article distributed under the Creative Commons Attribution (CC BY) license.