AI and Exams
Professor Roberto Serrano, who is the Harrison S. Kravis University Professor of Economics at Brown University, has detected a massive fraud in one of the classes he teaches, ECON 1170, an advanced undergraduate course in mathematical economics. He has conclusive evidence that at least 50 students cheated on the March midterm exam, making it the biggest known scandal at Brown and in the entire Ivy League
…
“Academic integrity is a value worth defending. The faculty cannot be left on its own in a battle that is decisive if we want to preserve the future of higher education,” explains the 61-year-old professor … (El Pais)
It was a closed book take-home exam; the problems were designed to test the student’s ability but proved doable by an AI. After Serrano changed the final from take-home to in-person about half the students who had gotten perfect scores on the midterm chose not to take the final.
The existence of AI, like the earlier problem of students buying papers online, reduces the ability of teachers to test their students but does not eliminate it, is inconvenient but not catastrophic. It makes some kinds of testing more difficult but not impossible; Serrano could have asked students whose midterms were suspiciously good to explain some of their answers and failed any obviously unable to do so. That would have been additional work for him and, judging by the article, not a policy Brown would have endorsed. Unwilling or unable to do that that he can base his future grading on work done in-person and adequately monitored.
Who Are Exams For
The story raises the question of the function of exams, more generally of grading. There are at least three possible answers.
The first is that the purpose of grading is to generate information used by potential employers or by academic officials current or future, information many students would like to distort in a way favorable to themselves. That creates a conflict of interest between the school and professor, trying to generate accurate information, and the student, trying to distort the information with AI or older technologies.
Another function of testing is producing information for the student, telling him which parts of what he is supposed to have learned he has learned. This is, in my experience, especially important in economics courses. Economics deals with things students believe they understand, uses terms such as “competition” and “efficient” whose meaning they believe they know, with the result that students sometime badly overestimate how much of what the course has covered they have learned. Graded quizzes reduce that problem; if they do not contribute to the final grade there is no incentive to cheat.
A third reason to test students is to produce information for the professor; if most of my students turn out not to know something I have covered I might want to cover it again, perhaps differently. Here again, as long as the results of the testing are not used for grading there should be no incentive to cheat.
I conclude that forms of testing for which it is hard to prevent cheating, such as Professor Serrano’s take-home exam, are appropriate for generating information for students and professor, provided that they are not used, and students know they will not be used, for grading. Testing used for grading should either be in-person and monitored or with cheating deterred by serious efforts to detect and punish it.
What Should Count as Cheating and What Is Wrong With It?
It is not immediately obvious what is wrong with using AI on a test. If the purpose of the test is to generate information for potential future employers, why should they want the student tested without a tool that, if they hire him, he will have? A basketball coach does not evaluate potential team members by how well they can play with one hand tied behind their back.
Arguably the skills the employer wants tested are those that an AI cannot replace and it was up to Professor Serrano to find ways of testing for them. His take-home midterm, taken without the assistance of AI, might have provided information for him and his students about how far they had come along a path that would eventually produce skills an AI could not substitute for but not information for a future employer about the skills of the students taking the exam.
An older version of the same issue was the use of calculators on tests, permitted in some contexts, forbidden in others. If what is being tested is grade school arithmetic it makes no sense to allow calculators but, in a world where anyone doing arithmetic has a calculator to do it on, that looks like evidence that being able to do arithmetic is no longer a useful skill hence no longer something worth testing.
The argument the other way, the argument for the conventional approach to grading, is that the information being generated is not what the student has learned but how well he can learn, some combined effect of intelligence, willingness to work, and whatever else makes a successful student and a valuable employee. Specific skills are different for different jobs, the ability to acquire skills valuable for all.
My web page, with the full text of multiple books and articles and much else
Past posts, sorted by topic
A search bar for past posts and much of my other writing
A draft of my next book, Consequences of Climate Change, webbed for comments.

What is a closed-book, take-home test? Do students affirm they will not open books? How does on choose not to take the final? Do you mean these students dropped the course? Can one drop a course on final exam day at Brown?
Even at graduate level, the things you're taught in school are basics. You're being tested on solving problems that have already been solved. That isn't what employers want to pay for: they want to pay for applying that understanding toward solving problems that haven't been solved yet. You may be using an AI for those problems too, but it won't be one-shotting them, because if it could then there would be no reason to hire a human. Without an understanding of the academic basics, an employee will have no ability to prompt the AI appropriately or to evaluate its output.
There's a similar answer to your arithmetic analogy. Using a calculator for arithmetic is more efficient than doing it by hand, but if you can't do it by hand at all, then you don't understand arithmetic. And if you don't understand arithmetic, you won't be able to understand more advanced math.