"I'm not sure, but...": Expert Practices that Enable Effective Code Comprehension in Data Science (SIGCSE TS 2025 - Papers)

Who

Christopher Lum, Guoxuan Xu, Sam Lau

Track

SIGCSE TS 2025 Papers

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 27 Feb 2025 14:22 - 14:41 at Meeting Rooms 315-316 - Data Science #1 Chair(s): Seth Poulsen

Abstract

Data scientists often need to read and understand messy and undocumented code that relies on large software libraries. What makes data science experts more effective than novices at this task? To understand expert practices, we conducted a think-aloud study where 4 novice and 5 expert data scientists reasoned about an unfamiliar data analysis script with realistic complexity that used the Python pandas library. Surprisingly, familiarity of the pandas package had relatively minor importance for experts. Instead, experts consistently performed three practices that novices did not: experts examined the data in detail rather than fixating on surface-level code features; experts consistently verified their assumptions about how the data was transformed; and experts navigated lengthy program outputs in a goal-directed way. Using these findings, we provide a practical set of guidelines for data science pedagogy and for future tools to support data science learners.

Christopher Lum

UC San Diego

United States

Guoxuan Xu

UC San Diego

United States

Sam Lau

University of California at San Diego

United States

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 27 Feb
Displayed time zone: Eastern Time (US & Canada) change

13:45 - 15:00	Data Science #1Papers at Meeting Rooms 315-316 Chair(s): Seth Poulsen Utah State University

13:45 18m Talk		Approachable Machine Learning Education: A Spiral Pedagogy Approach with Experiential Learning Papers Meiying Qin York University
14:03 18m Talk		A Window into DataWorks: Developing an Integrated Work-Training Curriculum for Novice Adults Papers Lara Karki Georgia Institute of Technology, Dana Priest DataWorks at Georgia Tech, Gabe Dubose Emory University, Zajerria Godfrey Maynard Jackson High School, Annabel Rothschild Georgia Institute of Technology, Benjamin Shapiro Georgia State University, Betsy Disalvo Georgia Institute of Technology Media Attached
14:22 18m Talk		"I'm not sure, but...": Expert Practices that Enable Effective Code Comprehension in Data Science Papers Christopher Lum UC San Diego, Guoxuan Xu UC San Diego, Sam Lau University of California at San Diego
14:41 18m Talk		Larger than Life In-Class Demonstrations for Introductory Machine Learning Papers Henry Chai Carnegie Mellon University, Matthew R. Gormley Carnegie Mellon University