Detecting AI-Generated Pseudocode in High School Online Programming Courses Using an Explainable ApproachK12
This program is tentative and subject to change.
Despite extensive research on code plagiarism detection in higher education and for programming languages like Java and Python, limited work has focused on K-12 settings, particularly for pseudocode. This study aims to address this gap by building explainable machine learning models for pseudocode plagiarism detection in online programming education. To achieve this, we construct a comprehensive dataset comprising 7,838 pseudocode submissions from 2,578 high school students enrolled in an online programming foundations course, along with 6,300 pseudocode samples generated by three versions of ChatGPT. Utilizing this dataset, we develop an explainable model to detect AI-generated pseudocode across various assessments. The model not only identifies AI-generated content but also provides insights into its predictions at both the student and problem levels, thus enhancing our understanding of AI-generated pseudocode in K-12 education. Furthermore, we analyzed SHAP values and key features of the model to pinpoint student submissions that closely resemble AI-generated pseudocode, supplementing our findings with human evaluation. This research offers implications for developing robust educational technologies and methodologies to uphold academic integrity in online programming courses.