Investigating the Capabilities of Generative AI in Solving Data Structures, Algorithms, and Computability Problems
There is both great hope and great concern about the future of Computer Science practice and education with respect to the recent advent of large language models (LLMs).
We present the first study to extensively evaluate the ability of such a model to solve problems in Computer Science Theory. We tested 183 exam-level problems across 16 specific topics in various Computer Science Theory related topics, ranging from preliminary data structures to algorithm design paradigms to Theory of Computation. Our results use the recent popular models (GPT-4 and GPT-4o) without image recognition abilities. This is a rapidly evolving field, with model performance continuously improving. We present our results primarily as an indication of what they can already achieve—equivalently how they can already be useful—today, fully expecting them to improve even further in the near future.
Our results show that what was very recently a state-of-the-art model (GPT-4) is able to solve a majority (78%) of free-response problems in Data Structure and algorithm with little to no guidance. The latest model, GPT-4o, was able to solve nearly half the Theory of Computation problems we posed, with predictable categories for problems it could not solve. When broken down by topic, the model was able to solve 85% of problems in four out of the nine topics in data structures and algorithms and at least half in four others. Other problems, namely more visual problems, either required more substantial coaching, or seem to still be beyond the capabilities of the language model–- for now.