Midterm Exam Outliers Efficiently Highlight Potential Cheaters on Programming Assignments
The ubiquitous use of online tools, contractors and homework sites, has made plagiarism a concerning topic in computer science education. With the introduction of ChatGPT, it poses a threat now more than ever. Many cheating detection tools, such as similarity checkers and style anomaly checkers, help instructors decide whether a student has plagiarized. However, these tools are commonly not scalable to large classes. For example, similarity tools can produce high rates of suspected cheating and ineffectively use an instructor’s time, especially in the early weeks of CS courses where programs can be small and student solutions can be very similar. We developed a new tool using outlier detection to filter possible cheaters based on their lab scores throughout the course and their midterm exam scores. Instructors can then manually analyze a manageable amount of students even with large class sizes. We performed our experiment on two large course offerings of CS1 (a total of 177 students) using our automated tool and compared it to a manual analysis performed by an experienced CS1 instructor. The automated tool identified 11 students in the first offering (Winter 2019) and 12 students in the second offering (Spring 2023). With an average precision of 83%, our tool produces a list of concerning students with high precision. This significantly helps teachers efficiently allocate their time and pursue cheating early in the term in order to address and prevent further issues.