Optimizing Prompt Engineering for Automated Text Summarization of Student Reflections: A Comparative Study Using GPT-4 LLM
This program is tentative and subject to change.
In the educational domain, extracting insights from student-written text has shown to be valuable for instructors. Activities such as efficiently summarizing students’ reflections in a course offers instructors valuable insights to enhance students’ learning experience. In other words, quickly understanding students’ impressions about the course could be very helpful to instructors for in-time and/or personalized one-on-one discussions. To achieve this goal, engaging text and natural language processing (NLP) techniques is common in such studies. Understanding capabilities of LLMs through a series of comparative experiments involving prompt engineering is the goal of this work. We compare the summarization outputs of GPT-4 with a temperature of 0.75 through a variety of experiments that include different levels of prompts, starting with base level and proceeding to increase context in the prompt. We evaluate and compare the outputs of these summaries based off a rubric, evaluated by human annotators. In this study, our findings suggest that providing more context in prompts enables the LLM to uncover rarer challenges from student reflections and offer more detailed explanations. However, all prompts faced issues with misrepresenting the distribution of student challenges, sometimes overstating their frequency. Further refinement is needed to address this limitation. Despite this, the approach shows potential. It offers valuable insights to instructors and could help in supporting students more effectively.