Design and Evaluation of an AI-Assisted Grading Tool for Introductory Programming Assignments: An Experience Report (SIGCSE TS 2025 - Papers)

Who

Goda Nagakalyani, Saurav Chaudhary, Varsha Apte, Ganesh Ramakrishnan, Srikanth Tamilselvam

Track

SIGCSE TS 2025 Papers

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 28 Feb 2025 14:03 - 14:22 at Meeting Rooms 315-316 - Assessment #1 Chair(s): Priyanka Kumar

Abstract

In a typical introductory programming course, grading student- submitted programs involves an autograder which compiles and runs the programs and tests their functionality with predefined test cases, with no attention to the source code. However, in an educational setting, grading based on inspection of the source code is required for two main reasons (1) awarding partial marks to ‘partially correct’ code that may be failing the testcase check (2) awarding marks (or penalties) based on source code quality or spe- cific criteria that the instructor may have laid out in the problem statement (e.g. ‘implement sorting using bubble-sort’). However, grading based on studying the source code can be highly time con- suming when the course has a large enrollment. In this paper we present the design and evaluation of an AI Assistant for source code grading, which we have named TA Buddy. TA Buddy is powered by Code LLama, a large language model especially trained for code related tasks, which we fine-tuned using a graded programs dataset. Given a problem statement, student code submissions and a grading rubric, TA Buddy can be asked to generate suggested grades, i.e. rat- ings for the various rubric criteria, for each submission. The human teaching assistant (TA) can then accept or overrule these grades. We evaluated the TA Buddy-assisted manual grading against ‘pure’ manual grading and found that the time taken to grade reduced by 24% while maintaining grade agreement in the two cases at 90%.

Goda Nagakalyani

IIT BOMBAY

India

Saurav Chaudhary

Indian Institute of technology - Bombay

India

Varsha Apte

Indian Institute of technology - Bombay

India

Ganesh Ramakrishnan

Indian Institute of technology - Bombay

India

Srikanth Tamilselvam

IBM India Research Labs

India

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 28 Feb
Displayed time zone: Eastern Time (US & Canada) change

13:45 - 15:00	Assessment #1Papers at Meeting Rooms 315-316 Chair(s): Priyanka Kumar University of Texas at San Antonio

13:45 18m Talk		A Multi-Institutional Assessment of Oral Exams in Software Courses Papers Peter Ohmann College of St. Benedict / St. John's University, Ed Novak Franklin and Marshall College
14:03 18m Talk		Design and Evaluation of an AI-Assisted Grading Tool for Introductory Programming Assignments: An Experience ReportGlobal Papers Goda Nagakalyani IIT BOMBAY, Saurav Chaudhary Indian Institute of technology - Bombay, Varsha Apte Indian Institute of technology - Bombay, Ganesh Ramakrishnan Indian Institute of technology - Bombay, Srikanth Tamilselvam IBM India Research Labs
14:22 18m Talk		Designing LLM-Resistant Programming Assignments: Insights and Strategies for CS Educators Papers Bradley McDanel Franklin and Marshall College, Ed Novak Franklin and Marshall College
14:41 18m Talk		Exploring Different Specifications Grading PoliciesGlobal Papers Igor dos Santos Montagner Insper, Rafael Corsi Ferrao Insper , Craig Zilles University of Illinois at Urbana-Champaign, Mariana Silva University of Illinois at Urbana Champaign