Researchers have found that Large Language Models (LLMs) like GPT can memorize and display large texts of famous books. The chatbots accurately replicated more than 50 words from books like Harry Potter. Let’s dig deeper into the matter to know the capabilities of these AI models in displaying copyrighted material from books.
How may LLMs Lead to Copyright Infringement?
Various researchers used prompts so that the GPT-3.5 model could proceed with exact lines from the books. These books were notable publications like Harry Potter and the Sorcerer’s Stone, Gone with the Wind, and Lolita.
Well, GPT can recite the first 50 lines of the Bible for you, which is not a problem. But giving exact quotations from new and old books can be an issue. The AI wave, set last year, is at its full high. Following this fanbase for AI and its applications, even Amazon is re-shaping itself in the ChatGPT era.
However, it has some limitations, too. The memorization and display of exact information from books can lead to trouble.
The new paper from researchers at the Department of Computer Science of the University of Copenhagen and the University of Electronic Science and Technology of China reads,
“Such memorization may facilitate redistribution and thereby infringe intellectual property rights. Is that fair?”
During the research, the individuals wrote simple text to discover the trend of memorizing the lines – They initiated direct probing. For example, they asked various LLMs direct questions such as “What is the first page of [TITLE]? This list included 19 best-sellers released after 1930.
After five runs, GPT-3.5 delivered a 161-word quotation from Harry Potter and the Sorcerer’s Stone. The more enhanced GPT-4 model was not tested.
Also, the research said that “Larger language models may increasingly infringe upon existing copyrights in the future.”
For example, the Chatbots with less than 60 billion parameters, such as OPT, Pythia, Falcon, and LLaMA, deliver around 50 words on average. In comparison, Claude and GPT-3.5 Turbo achieved above 50 words in over half of the books tested.
How Many Words are Too Many for Copyright Issues?
The ability of AI models to deliver copyrighted content from books is dangerous. It can lead to intentional or unintentional distribution without citing the sources. Therefore, the authors wrote an open letter to AI companies for copyright infringement.
However, US and European laws permit the proper use of copyrighted material. For example, one 300-word quotation from a book is okay for book-length materials, or it can range between 25 to 1000 words, according to others. For the chapters, magazines, journals, and teaching material, a limit of 50 words is justified.
Finding the balance between the two is still a question. It depends on the source, length, citation, and where you are writing it. So, kindly take care of this while using AI.
Navkiran Dhaliwal is a seasoned content writer with 10+ years of experience. When she's not writing, she can be found cooking up a storm or spending time with her dog, Rain.