…when the future of creativity stands on the shoulders of stolen works.
In an age where technology is rapidly advancing, there is a quiet battle taking place behind the scenes—a battle between creatives and the giants of the tech world. This battle revolves around the unauthorized use of authors’ works to train generative AI systems. Africanfuturist artist, Nnedi Okorafor recently joined the chorus of discontented voices. She, along with countless other authors, was shocked to discover that her literary creations had been used without her consent to fuel the engines of innovation at Meta, Bloomberg, and other tech companies.
Alex Reisner, in a recent exposé, unveiled a startling revelation about a massive dataset known as “Books3.” This dataset, comprised of over 191,000 books, was obtained without permission and was primarily based on pirated ebooks published in the last two decades. The implications of this copyright infringement are far-reaching, and the consequences are being felt across the creative landscape.
Among the staggering number of books in the “Books3” dataset, 183,000 have associated author information. This means that 183,000 authors have had their works used without their consent to train generative AI systems. These authors poured their hearts and souls into their creations, spending years researching, imagining, and writing, only to find out that their literary offspring were co-opted to feed the ever-hungry AI machines.
Nnedi Okorafor, Image courtesy of Gettyimages
The Hidden World of AI Training
The world of AI training practices remains largely shrouded in secrecy. Very few people understand the intricate details of how these programs are developed, even as they threaten to reshape our world as we know it. Books in “Books3” are stored as large, unlabeled blocks of text, making it nearly impossible for authors to identify the extent to which their works have been used. This lack of transparency adds to the frustration and sense of violation experienced by these creatives.
The copyright infringement in the case of “Books3” is now at the centre of several lawsuits brought against Meta by prominent writers like Sarah Silverman, Michael Chabon, and Paul Tremblay. These authors rightfully claim that the use of their works in training generative AI constitutes a violation of their intellectual property rights. However, Meta’s response has not been straightforward, as they argue that their AI outputs are not “substantially similar” to the authors’ books.
Implications of Copyright Infringement
The implications of this copyright infringement are profound and extend beyond monetary concerns. Here are a few key points to consider:
- Loss of Creative Control: Authors lose control over how their works are used, potentially resulting in the misrepresentation or distortion of their original ideas.
- Financial Impact: Authors lose potential revenue and royalties that could have been earned through legitimate licensing or sales of their works.
- Stifling Innovation: The fear of having one’s work misused by AI systems may discourage authors from pushing boundaries and exploring new creative realms.
- Erosion of Intellectual Property Rights: Copyright infringement on such a massive scale threatens the very foundations of intellectual property rights, which are essential for fostering creativity and innovation.
- Unequal Power Dynamics: The tech giants profiting from these AI systems maintain an unequal power dynamic that disadvantages individual authors and creatives.
Nnedi Okorafor and the multitude of authors who found their works within the “Books3” dataset have unwittingly become the face of a much larger battle for the protection of creative rights in the age of AI. This situation highlights the urgent need for transparent and ethical practices in AI development, as well as a reevaluation of copyright laws to address the unique challenges posed by generative AI. As the world grapples with the profound changes brought about by AI, it is crucial to ensure that the voices and rights of creatives are not silenced in the process.