ChatGPT and Plagiarism

ChatGPT, a recently released artificial intelligence chatbot, is astonishing. I’d encourage you to play around with it if you haven’t.

While I realize the automation fears this technology stokes are valid, I’m excited about the potential. These models can help enable more human creativity and ultimately help us achieve more. Also, it’d be shortsighted for companies to replace humans with language models when their combination would most likely amplify the output.

It will be interesting to see how this technology affects our society. I’m optimistic, but there’s a delicate balance between potential benefits and the dangers this introduces or exacerbates. There’s also an imperative to boost others in our society since the harms disproportionately impact the underrepresented.

These models do bring significant risks to our society. Unfortunately, there’s no shortage of examples of ChatGPT spewing sexist, racist, and biased output to users.

One thing is for sure, though. This innovation is not stopping anytime soon, and it’s not something we can turn our back on. We’ll miss out on incredible opportunities for collective advancement by banning or ignoring breakthroughs like ChatGPT.

While not as critical as above, I want to focus on one impact in this post: plagiarism.

Plagiarism, of course, is using someone else’s work or ideas without giving them proper credit. As with a lot of things, there’s a spectrum. For example, directly copying someone’s work and calling it your own is plagiarism. On the other hand, having a conversation with someone about a topic, refining some ideas from that conversation, and citing them in some of your writing is not plagiarism.

How will our idea of plagiarism change by these models? I’m curious how institutions like universities will respond. Will they ban students’ use, and how would they even detect it? OpenAI is reportedly working on watermarking the output of models, which may help.

We already rely on computer assistance to write

Computer assistance is already ubiquitous in writing today. From rudimentary spell checkers to dedicated grammar tools like Grammarly, many rely on computer programs to improve their writing. While these examples are more syntactic, language models like ChatGPT open another avenue for semantic assistance.

Do we draw the line at this syntax-semantics split? Surely not. If you find inspiration to write about something while conversing with someone, that’s not necessarily plagiarism. Of course, things are arguable depending on how closely this conversation related to the content of your writing and whether or not you even cited them. But it’s not automatically plagiarism.

Does this change if you interact with a language model instead of a human? Probably? Most definitions of plagiarism assume the victim is human. Maybe the model’s owner claims copyright on all outputs of the model?

One important note is these models may reproduce data that they have previously seen, which could be copyrighted material created by a human. In this case, I don’t know how you’d be free to use someone’s copyrighted writing just because there was a model between their writing and your reading. So for this post, I’m assuming all the text generated by the model did not exist previously. But I’m not a lawyer, and this is not a safe assumption for the real world! There’s already a class action lawsuit against Github Copilot, a similar model but for code.

Enablement vs. replacement

A critical distinction in this conversation is enablement vs. replacement.

There are many ways a language model can enable your writing. For example, you can ask language models to help you rephrase an unclear sentence and ask them how to suggest ways to make your pieces more or less formal. They can even provide you with an interface to bounce and refine rough ideas throughout the process.

This is in contrast to replacement, where a language model takes an initial prompt from you and writes the final output without your guidance, producing a final copy without much impact from you.

These language models have the potential for both. Still, the latter runs into the gray area of plagiarism in ways the former does not. But, of course, they’re not mutually exclusive, and it’s hardly ever going to be a case of 100% enablement or 100% replacement.

One scenario I’m not sure about is if you spend ample time engineering your prompt, iterating based on the model outputs, then taking the output of the model verbatim as your writing. You’re technically taking the output from a model and calling it your own, but your prompt engineering greatly impacted the final output. And if this is okay, how can we define the sufficient amount of prompt engineering?

Where is the line?

So, where do we draw the line on plagiarism and language models? I’m not sure, but I don’t think using a language model at all is plagiarism. I hope this won’t be the case. I’m not even sure that copying the model’s output is plagiarism, provided you meet some definition of prompt engineering.

To be clear, having ChatGPT write an entire report you then publish or having it do your homework is bad. It’s impressive in some sense, but most will agree it’s not ideal. But I don’t want us to miss out on a new generation of tools for fear of a different kind of assistance.

Throughout this post, I’ve assumed a fairly strict medium, but these considerations aren’t as important when dealing with less formal writing. We don’t hold more casual mediums like internal documents or emails to the same standard. In the future, replacing some of our writing via language models may be accepted—even encouraged.

I first started thinking about this topic while trying out ChatGPT. I wanted to experience bouncing ideas off it and iterating based on its outputs. I was shocked at how useful it was in my writing process, and I was impressed by the quality of the generated text.

It took some trial and error, but I eventually found how to make it output text I could see myself writing. In fact, the model outputted around half the sentences in this post.

So, did I plagiarize? For what it’s worth, I asked ChatGPT, and this was its response:

I am a tool designed to assist users in generating text based on the information I have been trained on. I do not have personal experiences or opinions, and my responses are generated algorithmically based on the inputs I receive. As such, you do not need to cite me specifically if you use my output in your own work.

Not legal advice!