Meta Platforms’ (META.O) lawyers has warned the platform about the consequences of using pirated books to train its AI models. But it seems the company is not following this warning, according to a copyright infringement lawsuit filled this summer.

Recently, the new filing consolidates two lawsuits brought against the Instagram and Facebook by comedian Sarah Silverman, Michael Chabon and other reputed authors. They blame Meta for using their works without consent to train its AI language model, Llama.

Last month, a California judge dismissed the Silverman lawsuit partially saying that he would give authors permission to amend their claims. Another new complaint against Meta includes chat logs of a Meta-affiliated researcher discussing procurement of the dataset in a Discord server. This serves as major evidence indicating that Meta was aware that its use of the books may not be protected by U.S. copyright law.

In the chat logs, researcher Tim Dettmers describes his back-and-forth with Meta’s legal department over whether use of the book files as training data would be “legally ok.” “At Facebook, there are a lot of people interested in working with (T)he (P)ile, including myself, but in its current form, we are unable to use it for legal reasons,” Dettmers wrote in 2021, referring to a dataset Meta has acknowledged using to train its first version of Llama, according to the complaint.

Tech companies have been facing a slew of lawsuits for ripping off copyright-protected works to build generative AI models. Such cases could dampen the generative AI hype, as their success could force AI firms to compensate content creators for using their work to build data-hungry models.

Meta released a first version of its Llama large language model in February and published a list of datasets used for training, including “the Books3 section of ThePile.” The person who assembled that dataset has said elsewhere that it contains 196,640 books, according to the complaint.

Llama 2 is free to use for organizations with fewer than 700 million monthly active users. It’s a potential game-changer in the generative AI software market, thus causing concerns about upending the dominance of players like OpenAI and Google that charge for use of their models.

Navkiran Dhaliwal

Author at Good e-Reader | navkiran@goodereader.com

Navkiran Dhaliwal is a seasoned content writer with 10+ years of experience. When she's not writing, she can be found cooking up a storm or spending time with her dog, Rain.

Vote

Authors Sue Meta For Using Copyrighted Books for AI Training Despite Lawyers’ Warnings

Up next

Barnes and Noble Nook Glowlight 4 Plus Sold Out

Author

Navkiran Dhaliwal

Tags

Navkiran Dhaliwal