Writing and AI’s Uncanny Valley, Part Two

As I mentioned yesterday, the final papers I received this semester read oddly, with intermittent changes in the voice of the writer. In years past, this shift would be a sure sign of plagiarism. The occasional odd word suggested by a tutor or thesaurus isn't usually enough to give me the sense of this kind of shift. It's when whole sentences (or significant parts of sentences) shift that I feel compelled to do a spot search for the original source.

More often than not, this shift in voice is the result a bad paraphrase that's been inappropriately cited (e.g., it's in the bibliography but not cited in the text). More rarely, it's a copy/paste job.

With this semester’s final papers, I have begun to hear when students using AI appropriately are having sections of their paper "improved" in ways that changes their voice.

This matters (for those who are wondering) because our own voices are what we bring to a piece of writing. To lose one's voice through an AI's effort is to surrender that self-expression. There may be times when a more corporate voice is appropriate, but even the impersonal tone of a STEM paper has something of its author there.

To get a sense of how much was being lost when an AI was asked to improve a piece of writing, I took five posts from this blog and asked ChatGPT 4o and Google Gemini to improve them. I uploaded the fifteen files into Lexos, a textual evaluation tool developed at Wheaton College by Dr Michael Drout, Professor of English and Dr. Mark LeBlanc, Professor of Computer Science.

The Lexos tool is sufficiently powerful that I am certain that I’m not yet using it to its full capacity and learning to do so is quickly becoming a summer project. But the initial results from two of the tools were enough to make me expand my initial experiment by adding four additional texts and then a fifth.

The four texts were William Shakespeare's Henry V, Christopher Marlowe's Tamburlaine and Doctor Faustus, and W.B. Yeats' The Celtic Twilight — as found on Project Gutenberg. The first three were spur of the moment choices of texts distant enough from me in time as to be neutral. I added Yeats out of a kind of curiosity to see if my reading and rereading of his work had made a noticeable impact on my writing.

Spoilers: It hadn't — at least not at first blush. But the results made me seek out a control text. For this fifth choice, I added Gil Scott Heron's "The Revolution Will Not Be Televised" because of its hyper-focus on events and word choice of the late sixties and early seventies. This radical difference served as a kind of control for my experiment.

The first Lexos tool that hinted at something was the Dendogram visualization, which shows family tree-style relationships between texts. There are different methodologies (with impressive sounding names ) that Lexos can apply that produce variant arrangements based on different statistical models.

The Dendogram Groupings generated by Lexos.

These showed predictable groupings. Scott Heron was the obvious outlier, as was expected of a control text. The human composed texts by other authors clustered together, which I should have expected (although the close association between Henry V and Tamburlaine — perhaps driven by the battle scenes — was an interesting result). Likewise, the closer association between the ChatGPT rewrites and the originals came as no surprise, as Gemini had transformed the posts from paragraphs to bulleted lists.

What did come as a surprise was the results of the Similarity Query, which I as much stumbled across as sought out. Initially, I had come to Lexos looking for larger, aggregate patterns rather than looking at how a single text compared with the others.

It turned out  the Similarity Queries were the results that showed the difference between human-written text and machine generated text.

Similarity Query for Blog Post Zero. The top of the letters for “The Revolution Will Not Be Televised” can be barely seen at the bottom of the list.

Gil Scott Heron remained the outlier, as a control text should.

The ChatGPT 4o rewrite of any given post was listed as the closest text to the original, as one would expect.

What I did not expect was what came next. in order, what appeared was:

  • The non-control human texts.

  • The ChatGPT texts.

  • The Gemini texts.

The tool repeatedly marked the AI-generated text as different and like itself rather than being like a human.

Here, Lexos dispassionately quantifies what I experienced while reading those essays. The changes made by generative AI change the voice of the writer, supplanting that voice with its own.

This has serious implications for writers, editors, and those who teach them.

It also has implications for the companies that make these AI/LLM tools.

I will discuss some of that in tomorrow’s post.