Let's not flip sides on IP maximalism because of AI

Stop trying to bring the "on a computer" problem to copyright

Nov 28, 2023

Copyright policy is a sticky tricky thing, and there are battles that have been fought for decades among public and corporate interests. Typically, it’s the corporate interests that win — especially the content industry. We’ve seen power, and copyrights, collect among a small group of content companies because of this. But there is one significant win that the public interest has been able to defend all these years: Fair Use.

Fair use’s importance has only grown over the years. Put simply, fair use allows people limited use of copyrighted material without permission. Fair use’s foundations are in commentary, criticism, and parody. However, fair use has arguably filled in important gaps to allow us to basically exist on social media. That’s because there are open questions on what is and isn’t copyright infringement, and things as simple as retweeting or linking could theoretically get us in trouble. Fair use also allows a lot of art to exist, because a lot of art critiques or comments on older art. On the flip side, when fair use was ruled to not cover music sampling it basically killed a lot of creative sampling in hip hop music. Now popular sample-based music is relatively tame and tends to use the same library of samples.

Fair use (probably) also protects the creator industry. Many people make a living streaming video games or making content around playing video games. All of that could violate copyright laws. We don’t know the extent of risk here, because it hasn’t been fully tested, but we do know that videogame makers have claimed videogame streaming content as copyrighted material. We also know that in Japan, which doesn’t have fair use, that a streamer got two years in jail for posting Let’s Play videos. A lot of creators also make “react” content, which also relies on fair use protection.

Blowing up Fair Use

Considering the importance of fair use, and the historically bad behavior of the content industry towards ordinary people, it’s surprising that a lot of public interest advocates want to blow it up to hurt AI companies. This is unfortunate, but not particularly surprising. Content industry lobbying has inflated copyright protections into a pretty big sledgehammer, and when you really want to smash something you often look for a sledgehammer. For example, copyright and right of publicity (a somewhat related state-level IP regime) were the first tools people turned to to protect victims when revenge porn first became a big problem.

Similarly, some public interest advocates are turning to copyright to stop AI from being trained on content without permission. However, that use is almost certainly a fair use (if it’s copyright infringement at all) and that’s a good thing. The ability of people to use computers to analyze content without permission is extremely useful, and it would be bad to weaken or destroy fair use just to stop companies from doing that in a socially problematic way. The best way to stop bad things is with policy purposefully made to address the whole problem. And these uses of copyright law often plays into the hands of powerful interests — the copyright industry would love the chance to turn the public interest advocacy community against itself in order to kill fair use.

I’m not saying that there aren’t issues with AI that need to be addressed, especially worker exploitation. AI art generators can be especially infuriating for artists: they use a lot while giving back little. In fact, these generators are arguably being built to replace artists rather than to provide artists with new tools. It can be attractive to throw anything in the way to slow it down. But copyright, especially copyright maximalism, has done a terrible job of preventing artist exploitation.

Porting “on a computer” to copyright

One of the biggest public interest fights in patent law has been against “on a computer” software patents that clogged up the system and led to a number of patent infringement suits against small businesses for silly claimed inventions. The basics of the problem is this: it was initially allowed to claim an invention in doing something that was already known, but on a computer. These on a computer patents have been greatly restricted through Supreme Court rulings (which special interests would like to overturn). However, the bad effects of software patents still exist today, as do patent trolls seeking to exploit them.

This current fight over copyright in training data reminds of this same problem. For example, if a writer wanted to study romance novels to find out what is popular it would be perfectly acceptable under copyright policy for them to read and analyze a lot of popular romance novels and to use that analysis to take the most successful parts of those novels to create a new novel. It is also perfectly acceptable under copyright law for an artist to study a particular artist and replicate that artists style in their own works. But using an AI to do that analysis, doing it “on a computer,” is now suspect.

This is short sighted for a number of reasons, but one I’d like to highlight is how this shrinking of fair use is difficult to contain. We are talking about an area in which the question of whether loading files into RAM is “copying” under copyright law (and therefore needs permission or is a violation) is an actual policy debate that public interest advocates have to fight. If using content as training data becomes a copyright violation, what’s the limiting principle? What kinds of computer analysis would no longer be protected under fair use?

I should also point out that IP maximalization is the easiest way to build oligopolies. Big companies will be able to figure out how to navigate the maze of rights necessary to build a model, and existing models will likely be grandfathered in (with a few lawsuits to get through). However, it will be impossible for any new company or new open source model to be created. Dealing with rights at scale is a problem so significant that even the rightsholder industry has trouble tracking them. And information about rights has been withheld to leverage better deals due to the risk (and high costs) of accidentally infringing someone’s rights.

Matthew’s Substack

Discussion about this post