Audio Visione

Multilingual Site about AV, AI, Policy & more. Inspired by MM

Not So Slow

When AI Falsely Flags Students For Cheating

AI Detectors Falsely Accuse Students of Cheating—With Big Consequences

About two-thirds of teachers report regularly using tools for detecting AI-generated content. At that scale, even tiny error rates can add up quickly.

By Jackie Davalos and Leon Yin (Bloomberg)

October 18, 2024 at 5:00 PM GMT+2

After taking some time off from college early in the pandemic to start a family, Moira Olmsted was eager to return to school. For months, she juggled a full-time job and a toddler to save up for a self-paced program that allowed her to learn remotely. Seven months pregnant with her second child, Olmsted enrolled in online courses at Central Methodist University in 2023, studying to become a teacher.

Just weeks into the fall semester, Olmsted submitted a written assignment in a required class—one of three reading summaries she had to do each week. Soon after, she received her grade: zero. When she approached her professor, Olmsted said she was told that an AI detection tool had determined her work was likely generated by artificial intelligence. In fact, the teacher said, her writing had been flagged at least once before.

For Olmsted, now 24, the accusation was a “punch in the gut.” It was also a threat to her standing at the university. “It’s just kind of like, oh my gosh, this is what works for us right now—and it could be taken away for something I didn’t do,” she says.

When AI Falsely Flags Students For Cheating

14:04

Olmsted disputed the accusation to her teacher and a student coordinator, stressing that she has autism spectrum disorder and writes in a formulaic manner that might be mistakenly seen as AI-generated, according to emails viewed by Bloomberg Businessweek. The grade was ultimately changed, but not before she received a strict warning: If her work was flagged again, the teacher would treat it the same way they would with plagiarism.

Olmsted shows an assignment that was flagged as likely written by AI.Photographer: Nick Oxford/Bloomberg

Since OpenAI’s ChatGPT brought generative AIto the mainstream almost two years ago, schools have raced to adapt to a changed landscape. Educators now rely on a growing crop of detection tools to help spot sentences, paragraphs or entire assignments generated by artificial intelligence. About two-thirds of teachers report using an AI checker regularly, according to a survey of more than 450 instructors published in March by the Center for Democracy & Technology.

The best AI writing detectors are highly accurate, but they’re not foolproof. Businessweek tested two of the leading services—GPTZero and Copyleaks—on a random sample of 500 college application essays submitted to Texas A&M University in the summer of 2022, shortly before the release of ChatGPT, effectively guaranteeing they weren’t AI-generated. The essays were obtained through a public records request, meaning they weren’t part of the datasets on which AI tools are trained. Businessweek found the services falsely flagged 1% to 2% of the essays as likely written by AI, in some cases claiming to have near 100% certainty.

Even such a small error rate can quickly add up, given the vast number of student assignments each year, with potentially devastating consequences for students who are falsely flagged. As with more traditional cheating and plagiarism accusations, students using AI to do their homework are having to redo assignments and facing failing grades and probation.

AI Detectors Can Falsely Flag Essays as Likely Written by AI

Bloomberg tests using two leading AI detectors on a sample of 500 essays written before the release of ChatGPT showed that the services falsely flagged 1% to 2% of the essays as likely written by AI.

Sources: Bloomberg Analysis of Texas A&M, GPTZero, CopyLeaks

The students most susceptible to inaccurate accusations are likely those who write in a more generic manner, either because they’re neurodivergent like Olmsted, speak English as a second language (ESL) or simply learned to use more straightforward vocabulary and a mechanical style, according to students, academics and AI developers. A 2023 study by Stanford University researchers found that AI detectors were “near-perfect” when checking essays written by US-born eighth grade students, yet they flagged more than half of the essays written by nonnative English students as AI-generated. OpenAI recently said it has refrained from releasing an AI writing detection tool in part over concerns it could negatively affect certain groups, including ESL students.

Businessweek also found that AI detection services can sometimes be tricked by automated tools designed to pass off AI writing as human. This could lead to an arms race that pits one technology against another, damaging trust between educators and students with little educational benefit.

Turnitin, a popular AI detection tool that Olmsted says was used to check her work, has said it has a 4% false positive rate when analyzing sentences. Turnitin declined to make its service available for testing. In a 2023 blog post, Vanderbilt University, one of several major schools to turn off Turnitin’s AI detection service over accuracy concerns, noted that hundreds of student papers would otherwise have been incorrectly flagged during the academic year as partly written by AI.

Ken Sahib, a multilingual student who spent most of his childhood in Italy, says it was “overwhelming” when he received a zero on an assignment summarizing a reading for his Introduction to Networking course at Berkeley College in New York. When Sahib asked about it, the teacher said: “Every tool I tried produced the same result: those responses were AI-generated,” according to emails viewed by Businessweek. “You know what you are doing.”

Sahib says he ultimately passed the class, but the incident fractured his relationship with his professor. “After that we barely spoke,” he says. The professor didn’t respond to requests for comment.

While some educators have backed away from AI detectors and tried to adjust their curriculato incorporate AI instead, many colleges and high schools still use these tools. AI detection startups have attracted about $28 million in funding since 2019, according to the investment data firm PitchBook, with most of those deals coming after ChatGPT’s release. Deepfake detection startups, which can check for AI-generated text, images, audio and video, raised more than $300 million in 2023, up from about $65 million the year before, PitchBook found.

The result is that classrooms remain plagued by anxiety and paranoia over the possibility of false accusations, according to interviews with a dozen students and 11 teachers across the US. Undergraduates now pursue a wide range of time-consuming efforts to defend the integrity of their work, a process they say diminishes the learning experience. Some also fear using commonplace AI writing assistance services and grammar checkers that are specifically marketed to students, citing concerns they will set off AI detectors.

Eric Wang, Turnitin’s vice president for AI, says the company intentionally “oversamples” underrepresented groups in its data set. He says internal tests have shown Turnitin’s model doesn’t falsely accuse ESL students, and that its overall false positive rate for entire documents is below 1% and improving with each new release. Turnitin doesn’t train specifically on neurodivergent student data or have access to medical histories to assess that classification.

Copyleaks co-founder and Chief Executive Officer Alon Yamin says its technology is 99% accurate. “We’re making it very clear to the academic institutions that nothing is 100% and that it should be used to identify trends in students’ work,” he says. “Kind of like a yellow flag for them to look into and use as an opportunity to speak to the students.”

“Every AI detector has blind spots,” says Edward Tian, the founder and CEO of GPTZero. He says his company has made strides in debiasing results for ESL students in particular, and has taken steps to more clearly indicate the level of uncertainty in its tool’s assessment of written work for teachers.

Tian built GPTZero at the start of 2023. His startup had 4 million users as of July, up from 1 million a year ago, and recently raised $10 million from investors, including Jack Altman, the brother of OpenAI’s CEO. “Last semester was the most active semester,” Tian says. “It shows this problem is not going away, but it has changed. A year ago, the most common question people were asking was: Is this AI?” Now, he says, teachers know AI is in their classroom. The question is: “How do you deal with it?”

It’s challenging to quantify AI use in schools. In one test, Businessweek analyzed a separate set of 305 essays submitted to Texas A&M in the summer of 2023, after ChatGPT launched, and found the same AI detectors flagged about 9% as being generated by artificial intelligence.

AI Detection Startups

Source: PitchBook

Note: Turnitin is a subsidiary of Advance Publications

AI writing detectors typically look at perplexity, a measure of how complex the words are in any given submission. “If the word choices tend to be more generic and formulaic, that work has a higher chance of being flagged by AI detectors,” says James Zou, a professor of biomedical data science at Stanford University and the senior author of the Stanford study on ESL students.

The AI detection service QuillBot, for example, notes that “AI-generated content is likely to contain repetitive words, awkward phrasing, and an unnatural, choppy flow.” GPTZero also factors in a criteria it calls “burstiness,” which measures how much the perplexity varies throughout a written document. Unlike AI, “people tend to vary their sentence construction and diction a lot throughout a document,” according to the company.

AI detection companies stress that their services shouldn’t be treated as judge, jury and executioner, but rather as a data point to help inform and guide teachers.

Olmsted.Photographer: NIck Oxford/Bloomberg

Most of the schools that work with Copyleaks now give students access to the service, Yamin says, “so they can authenticate themselves” and see their own AI scores. Turnitin, meanwhile, is working to expand its AI product portfolio with a service to help students show the process of how they put together their written assignments, in response to feedback from teachers and pupils.

“Students say, ‘I want to be able to show that this is my work, and I want to feel confident that there are no questions about that,’” says Annie Chechitelli, Turnitin’s chief product officer. “And the teachers say, ‘I need more data points to help me understand how the student came up with this.’”

After her work was flagged, Olmsted says she became obsessive about avoiding another accusation. She screen-recorded herself on her laptop doing writing assignments. She worked in Google Docs to track her changes and create a digital paper trail. She even tried to tweak her vocabulary and syntax. “I am very nervous that I would get this far and run into another AI accusation,” says Olmsted, who is on target to graduate in the spring. “I have so much to lose.”

Nathan Mendoza, a junior studying chemical engineering at the University of California at San Diego, uses GPTZero to prescreen his work. He says the majority of the time it takes him to complete an assignment is now spent tweaking wordings so he isn’t falsely flagged—in ways he thinks make the writing sound worse. Other students have expedited that process by turning to a batch of so-called AI humanizer services that can automatically rewrite submissions to get past AI detectors.

“AI Humanizer” Edits a Human-Written Essay to Bypass AI Detection

A Bloomberg test of a service called Hix Bypass found that a human-written essay that GPTZero incorrectly said was 98.1% AI went down dramatically to 5.3% AI after being altered by the service.

Sources: Application essay from Texas A&M; Hix Bypass

The fear of being flagged by AI detectors has also forced students to rethink using popular online writing assistance tools. Grammarly, a startup valued at $13 billion in 2021, helps students with everything from basic spell-checks to structure suggestions. But it has also expanded with options to rewrite an entire submission automatically to meet certain criteria, pushing the limits of what may be deemed acceptable by teachers.

Bloomberg found using Grammarly to “improve” an essay or “make it sound academic” will turn work that passed as 100% human-written to 100% AI-written. Grammarly’s spell checker and grammar suggestions, however, have only a marginal impact on making documents appear more AI-written.

Kaitlyn Abellar, a student at Florida SouthWestern State College, says she has uninstalled plug-ins for such programs as Grammarly from her computer. Marley Stevens, a student at the University of North Georgia, posted a viral TikTok video last year about her experience being penalized after Turnitin flagged her essay as AI-generated. Stevens saidshe was put on academic probation for a year after a disciplinary hearing determined she’d cheated. She insisted she wrote the assignment herself, using only Grammarly’s standard spell-checking and grammar features.

Get the Weekend Edition newsletter.

Big ideas and open questions in the fascinating places where finance, life and culture meet. Coming soon.

By continuing, I agree that Bloomberg may send me news and offers relating to Bloomberg products. I also acknowledge the Privacy Policy and agree to the Terms of Service.

“This was a well-intentioned student who had been using Grammarly in the responsible way and was flagged by a third-party technology saying you did wrong. We can’t help how Turnitin operates, like they understand that they have false flags,” says Jenny Maxwell, head of Grammarly for education. The incident prompted Grammarly to develop a detection tool for students that identifies whether text was typed, pasted from a different source or written by an AI model. “It’s almost like your insurance policy,” Maxwell says.

To some educators and students alike, the current system feels unsustainable because of the strain it places on both sides of the teacher’s desk and because AI is here to stay.

“Artificial intelligence is going to be a part of the future whether we like it or not,” says Adam Lloyd, an English professor at the University of Maryland. “Viewing AI as something we need to keep out of the classroom or discourage students from using is misguided.”

Instead of using Turnitin, which is available to faculty at his school, Lloyd prefers to go with his intuition. “I know my students’ writing, and if I have a suspicion, I’ll have an open discussion,” he says, “not automatically accuse them.”

Read next: Why OpenAI Is at War With an Obscure Idea Man