Blogs (1) >>

Code reading and comprehension skills are essential for novices learning programming, and explain-in-plain-English tasks (EiPE) are a well-established approach for assessing these skills. However, manual grading of EiPE tasks is time-consuming and this has limited their use in practice. To address this, we explore an approach where students explain code samples to a large language model (LLM) which generates code based on their explanations. This generated code is then evaluated using test suites, and shown to students along with the test results. We are interested in understanding how automated formative feedback from an LLM guides students’ subsequent prompts towards solving EiPE tasks. We analyzed 177 unique attempts on four EiPE exercises from 21 students, looking at what kinds of mistakes they made and how they fixed them. We found that when students made mistakes, they identified and corrected them using either a combination of the LLM-generated code and test case results, or they switched from describing the purpose of the code to describing the sample code line-by-line until the LLM-generated code exactly matched the obfuscated sample code. Our findings suggest both optimism and caution with the use of LLMs for unmonitored formative feedback. We identified false positive and negative cases, helpful variable naming, and clues of direct code recitation by students. For most students, this approach represents an efficient way to demonstrate and assess their code comprehension skills. However, we also found evidence of misconceptions being reinforced, suggesting the need for further work to identify and guide students more effectively.