Blogs (1) >>

As data science and artificial intelligence continue to impact society, more and more people are learning how to manipulate data with code. To support these learners, program visualization tools automatically generate diagrams to show how code transforms data, in contrast to tools based on large language models (LLMs) that primarily focus on textual explanations. Although program visualization tools are popular among instructors, do novices find these tools usable and useful for data science programs that often manipulate datasets with many rows? To address this, we evaluate a popular, publicly available tool that generates diagrams for Python pandas code through a randomized, in-lab usability study with 17 data science novices. Despite minimal instruction on how to use the tool, novices found that program visualizations increased their confidence in comprehending and debugging code. In addition, even though the tool sometimes produced diagrams with many visual elements, participant performance on the study tasks was not negatively impacted. These findings suggest design guidelines for program visualization tools to help manage cognitive load for data science novices. To our knowledge, this is the first empirical study that investigates how novices use program visualization tools to understand code that manipulates data tables, and suggests a future where novices can use automatically generated diagrams as a complement to LLM tools for effectively understanding unfamiliar programs in data science.