Evaluating Language Models for Generating and Judging Programming Feedback (SIGCSE TS 2025 - Papers)

Who

Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Syed Ashraf, Paul Denny

Track

SIGCSE TS 2025 Papers

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 27 Feb 2025 15:45 - 16:03 at Meeting Rooms 403-405 - LLMs in CS1 Chair(s): Jake Renzella

Abstract

The emergence of large language models (LLMs) has transformed research and practice in a wide range of domains. Within the computing education research (CER) domain, LLMs have received plenty of attention especially in the context of learning programming. Much of the work on LLMs in CER has however focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments, and in judging the quality of the programming feedback, contrasting the results against proprietary models. Our evaluations on a dataset of students’ submissions to Python introductory programming exercises suggest that the stateof-the-art open-source LLMs (Meta’s Llama3) are almost on-par with proprietary models (GPT-4o) in both the generation and assessment of programming feedback. We further demonstrate the efficiency of smaller LLMs in the tasks, and highlight that there are a wide range of LLMs that are accessible even for free for educators and practitioners.

Link to Preprint

https://arxiv.org/abs/2407.04873

DOI

https://doi.org/10.1145/3641554.3701791

Charles Koutcheme

Aalto University

Finland

Nicola Dainese

Aalto University

Finland

Sami Sarsa

University of Jyväskylä

Arto Hellas

Aalto University

Finland

Juho Leinonen

Aalto University

Finland

Syed Ashraf

Aalto University

Finland

Paul Denny

The University of Auckland

New Zealand

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 27 Feb
Displayed time zone: Eastern Time (US & Canada) change

15:45 - 17:00	LLMs in CS1Papers at Meeting Rooms 403-405 Chair(s): Jake Renzella University of New South Wales, Sydney

15:45 18m Talk		Evaluating Language Models for Generating and Judging Programming FeedbackGlobal Papers Charles Koutcheme Aalto University, Nicola Dainese Aalto University, Sami Sarsa University of Jyväskylä, Arto Hellas Aalto University, Juho Leinonen Aalto University, Syed Ashraf Aalto University, Paul Denny The University of Auckland DOI Pre-print
16:03 18m Talk		Exploring Student Reactions to LLM-Generated Feedback on Explain in Plain English Problems Papers Chris Kerslake Simon Fraser University, Paul Denny The University of Auckland, David H. Smith IV University of Illinois at Urbana-Champaign, Brett Becker University College Dublin, Juho Leinonen Aalto University, Andrew Luxton-Reilly The University of Auckland, Stephen MacNeil Temple University
16:22 18m Talk		On Teaching Novices Computational Thinking by Utilizing Large Language Models Within Assessments Papers Mohammed Hassan University of Illinois at Urbana-Champaign, Yuxuan Chen University of Illinois at Urbana-Champaign, Paul Denny The University of Auckland, Craig Zilles University of Illinois at Urbana-Champaign
16:41 18m Talk		Large Language Models in Computer Science Education: A Systematic Literature Review Papers Nishat Raihan George Mason University, Mohammed Latif Siddiq University of Notre Dame, Joanna C. S. Santos University of Notre Dame, Marcos Zampieri George mason University Pre-print