Think of shadow libraries and the names likely to come to mind include the likes of Z Library, Anna’s Archive, Bibliotik, Sci-Hub, or Libgen. Almost all have been sued for distributing copyrighted content without sparing a thought on ways to compensate the authors who created the content in the first place. Now guess who has come in the defence of these shadow libraries? It’s none other than Nvidia.
It isn’t without reason that the chipmaker has come out in defense of the shadow libraries. As ArsTechnica points out, the Book3 dataset that Nvidia is using to train its AI platform NeMo has been created by scraping of the huge repositories of data that the shadow libraries have come to offer. No wonder, Nvidia is justifying such shadow libraries as legitimate sources of online information in response to a lawsuit filed by the authors.
“Nvidia denies the characterization of the listed data repositories as ‘shadow libraries’ and denies that hosting data in or distributing data from the data repositories necessarily violates the US Copyright Act,” Nvidia said in its court filing.
“Nvidia denies that it has improperly used or copied the alleged works,” the court filing said, arguing that “training is a highly transformative process that may include adjusting numerical parameters including ‘weights,’ and that outputs of an LLM may be based, at least in part, on such ‘weights.'”
Nvidia however didn’t elaborate further on how it would define shadow libraries or what its views are on the primary grouse against such sites, that of hosting copyrighted material without bothering to pay the authors of such works. Its legal strategy seems to hinge on convincing the court that the process of AI models assimilating published materials to generate algorithms for AI outputs constitutes fair use. However, authors opposing Nvidia contend that these algorithms are exclusively derived from copyrighted expressions within the training data, without consent or compensation to the original creators.
In response to such copyright concerns, certain entities like OpenAI have taken proactive measures by securing licenses for publishers’ content. This preemptive action is seen as a method to circumvent potential legal disputes. Notably, The New York Times, currently embroiled in litigation against OpenAI, has cited OpenAI’s recent licensing agreement with News Corp. as evidence supporting the argument that publishers deserve compensation for the utilization of their content in AI training processes, as reported by MediaPost.
It is going to be interesting to see how the lawmakers or the courts respond to the situation.
With a keen interest in tech, I make it a point to keep myself updated on the latest developments in technology and gadgets. That includes smartphones or tablet devices but stretches to even AI and self-driven automobiles, the latter being my latest fad. Besides writing, I like watching videos, reading, listening to music, or experimenting with different recipes. The motion picture is another aspect that interests me a lot, and I'll likely make a film sometime in the future.