Investigating Autograder Usage in the Post- Pandemic and LLM Era
This program is tentative and subject to change.
This work investigates the impact of Large Language Models (LLMs) and the COVID-19 pandemic on student behavior with autograder systems in three programming-heavy courses. We examine whether the release of LLMs like ChatGPT and GitHub Copilot, along with post-pandemic effects, has modified student interactions with autograders. Using data from student submissions over five years, totalling over 4,500 students across over 420,000 submissions, we analyze trends in submission behaviors before and after these events. Our methodology involves tracking submission patterns, focusing on timing, frequency, and score.
Contrary to expectations, our findings reveal that metrics remain relatively consistent in the post-ChatGPT and post-pandemic era. Despite yearly fluctuations, no significant shift in student behaviors is attributable to these changes. Students continue to rely on a combination of manual debugging and autograder feedback without noticeable changes in their problem-solving approach.
These findings highlight the resilience of the educational practices in these courses and suggest that integrating LLMs into mid-level CS curriculum may not necessitate the significant paradigm shift previously envisioned. Future work should extend these analyses to courses with different structures to determine if these results are generalizable. If not, the specific course aspects contributing to our observed ChatGPT and pandemic resilience should be identified.