skip to main content skip to footer

AI

Using AI Detectors to Grade Student Work

Two women's sitting Infront to each other and discussing

Tools flagging AI-generated text can be useful, yet caution is essential.

Jiangang Hao

As students increasingly turn to AI tools for assistance with their homework, teachers and professors have been asked to take on a new role—detectives on a mission to ensure academic integrity. Was it a student or ChatGPT who wrote the essay? AI detectors have come to the rescue, helping educators identify AI-generated text in students’ submissions. Yet, these tools are far from perfect.

As a scientist who has extensively studied detectors of AI-generated essays since they were first developed, I must reiterate: no AI detector can achieve perfect accuracy. Different detectors can misclassify essays—either wrongly flagging authentic human writing as AI-generated or outright failing to catch content generated by AI¬—highlighting the need for careful use and scrutiny.

Informed by the research my colleagues and I conducted earlier this year, here are some tips on how teachers and professors can use AI detectors responsibly.

Read the label. No AI detector is infallible. When using a specific tool, be aware of the detection accuracy it reports. For example, OpenAI stated that its own detector correctly identified AI-generated text only 26% of the time and mistakenly flagged human-written text as AI-generated 9% of the time; they shut the tool down six months later.

Text length matters. Longer texts generally yield more reliable results with AI detectors. In other words, determining whether a single word or a short sentence was generated by AI is practically impossible. Our study suggested that a text length of 50 words is a minimum requirement for reliable detection.

AI detectors will not reliably flag content co-created by humans and AI. The definition and detection of AI-generated text becomes ambiguous when humans and AI collaborate. It is important to establish clear guidelines for students on how much AI-generated content is allowed in each assignment and why they need to declare how they used AI in their contribution.

Be aware: Detectors could be biased. AI detectors, most likely because of the way they are trained, can sometimes exhibit biases against some demographic groups without clearly predictable patterns. Therefore, I advise against solely relying on AI detector outputs when making high-stakes decisions.

Use more than one tool to improve the consistency of the results. Different AI detectors may yield varying results for the same text. Using multiple detectors and cross-referencing their outputs can provide a more comprehensive assessment.

Do not rely solely on text-based detectors. Consider additional sources of information, such as keystroke data or video recordings of the writing process, alongside detector outputs to make more informed decisions.

AI technology will continue to evolve, and students will inevitably incorporate AI tools into their studies. My colleagues and I enjoyed working on TeachAI, and we look forward to more opportunities to help educators as they support their students’ use of evolving technology to succeed in school and life.

Jiangang is a research director and specializes in the assessment of complex skills such as collaborative problem solving, creativity, curiosity, and digital literacy.