Leveraging LLM for Detecting and Explaining LLM-generated Code in Python Programming Courses
This program is tentative and subject to change.
As large language models (LLMs) have become more advanced, generating code to solve exercises in programming courses has become significantly easier. However, this convenience raises the concern of over-reliance on these tools, potentially hindering students from developing independent coding skills. To address this concern, we introduce an LLM-based detector that not only detects LLM-generated code but also explains the reasons for its judgments. These reasons provide insight into the characteristics of LLM-generated code, enhancing transparency in the detection process. We evaluate the detector in an introductory Python programming course, achieving over 99% accuracy. Additionally, instructors manually reviewed the reasons provided by the detector and verified that 64.7% of reasons for classifying code as LLM-generated were appropriate. These reasons can also serve as feedback, helping students improve their coding skills by understanding the characteristics of expert-level LLM-generated code.