Will Copyright Law Wipe Out Generative AIs Such As ChatGPT?

Wednesday, May 17th, 2023

Do you fear generative AIs such as ChatGPT will destroy your earning power? Do you worry about AIs turning on mankind?

If you’re hoping for a savior to smite generative AI, could copyright law be it?

By analogy, think of H. G. Wells’ The War of the Worlds. In it, Martians, who have superior military technology, invade the Earth. The Martians begin laying waste to Earth’s cities. It appears human civilization is doomed.

But the Martians ultimately were laid low, not by Earthling military weapons, but by something in the background: Earth germs. The Martians died off because the environment was inhospitable.

Is copyright law inhospitable to generative AI? First, some background is necessary.

In training, generative AIs are fed millions (perhaps billions) of documents and other media. These data sources are mainly on the Internet, and AI providers admit they haven’t purchased licenses to use these sources.

When AIs are trained, they copy every item of training data, such as every online article studied. But the AI does not store the training data in its online form and use it to generate answers to user requests.

From training data, it learns about connections between words and the structure of human speech (in addition to connections in non-verbal media). It uses that learning to set “weights” in its neural network. Once training is completed, the final setting of the weights is the summation of the AI’s study of a massive volume of information. For that reason, if you ask an AI to generate something, it’s highly unlikely its output will match or be nearly identical to any one piece of training data, such as an article.

How does this generative AI process match up with copyright law?

Copyright is a set of exclusive rights of an author to his or her creative work, such as an article, painting, or video. Among the exclusive rights is the right to copy the work and to build upon it, which is called a “derivative work.” Almost everything on the Internet is someone’s copyright property.

Copyright protects only specific expressions, not ideas. Rewriting someone else’s ideas in your own words is not copyright infringement.
If someone copies someone else’s copyright property without permission, sometimes this is excused as fair use. Fair use is a squishy concept. It’s a case-by-case analysis based on four factors: (1) the nature of the use, including whether the use is commercial, educational, or nonprofit, (2) the nature of the copyrighted work, (3) the amount and portion of the copyrighted work being used, and (4) the effect of the activity on the copyrighted work’s market value.

Generative AI gets copyright scrutiny at two stages: training and output. As noted above, in AI training, the copyright property of third parties is copied without their permission. Is this fair use?

It probably is. The copying is done just as a transitory step for the AI to understand the structure and connections in the document studied. Once that study is complete, the copy is no longer used. To my knowledge, no court has held that such temporary copying for analysis is not protected by fair use.

Now, let’s look at the AI output. Here, with possible rare exceptions, the output does not copy a single item of someone else’s copyright property, such as an article. Producing something “in the style of” a particular copyright owner might happen frequently, but such style mimicry isn’t copyright infringement.

Overall, generative AI can hurt or destroy the market for some human-generated works. For example, why buy a license for a photograph from Getty Images if you can generate a stock photo for free using an art-generating AI, such as Midjourney?

But this destruction of market value happens at the output stage, and the output isn’t a copyright infringement with rare exception. The copying occurs at the learning stage, and no court has held that this copying for study purposes is copyright infringement.

Ultimately, copyright law isn’t suited to address this harm. A federal court could greatly extend the current boundaries of copyright law to hold that the generative AI process adds up to copyright infringement. It could hold that the marketplace effect of the outputs means the copying occurring in training is not fair use – that you have to consider training and output collectively. That would be a big stretch and is unlikely.

Congress could amend the copyright laws to require generative AIs to buy licenses to use the information of others as training data, but Congress usually lets the courts deal with challenging IP issues rather than getting involved.

So, in this movie in which we live, it appears the copyright germs cannot kill off the Martian generative AIs, so the generative AIs will thrive and shape (at the least) humanity’s future. To quote Kent Brockman from The Simpsons, “I, for one, welcome our new alien overlords.”

Written on May 17, 2023

by John B. Farmer

© 2023 Leading-Edge Law Group, PLC. All rights reserved.