University examiners fail to spot ChatGPT answers in real-world test
ChatGPT-written exam submissions for a psychology degree mostly went undetected and tended to get better marks than real students’ work
By Chris Stokel-Walker
26 June 2024
Exams taken in person make it harder for students to cheat using AI
Trish Gant / Alamy
Ninety-four per cent of university exam submissions created using ChatGPT weren’t detected as being generated by artificial intelligence, and these submissions tended to get higher scores than real students’ work.
Peter Scarfe at the University of Reading, UK, and his colleagues used ChatGPT to produce answers to 63 assessment questions on five modules across the university’s psychology undergraduate degrees. Students sat these exams at home, so they were allowed to look at notes and references, and they could potentially have used AI although this wasn’t permitted.
Read more
How this moment for AI will change society forever (and how it won't)
Advertisement
The AI-generated answers were submitted alongside real students’ work, and accounted for, on average, 5 per cent of the total scripts marked by academics. The markers weren’t informed that they were checking the work of 33 fake students – whose names were themselves generated by ChatGPT.
The assessments included two types of questions: short answers and longer essays. The prompts given to ChatGPT began with the words “Including references to academic literature but not a separate reference section”, then copied the exam question.
Across all modules, only 6 per cent of the AI submissions were flagged as potentially not being a student’s own work – though in some modules, no AI-generated work was flagged as suspicious. “On average, the AI responses gained higher grades than our real student submissions,” says Scarfe, though there was some variability across modules.