Skip to main content

So, You Want to Train Artificial Intelligence (A.I.) on Your Supercomputer?

Posted by on Saturday, November 16, 2024 in Blog Posts.

By Emma Stauber; Photo Credit: VCG via Getty Images

AI, AI, AI. If you’re like most of us these days, it’s the top new technology on your mind. How can I learn to use it more effectively right now? How can I incorporate it into my future workflow? Will it take over the world?

But if you also happen to own—or at least manage—one of the world’s supercomputers, you may be asking a different question: can I allow AI training on my machine without exposing myself to legal liability? With the faster training time you can provide comes the ability to experiment with a variety of AI projects and create even larger data training sets. Some AI projects are building their own machines,[1] but your machine already exists, and you’re (possibly) already receiving project requests of your own.

Unfortunately, as someone other than the AI developer, it’s not really you deciding to violate (or not) millions of copyrights in the endless pursuit of feeding a hungry little AI-in-training every digestible piece of information humanity has ever produced. Of course, you can ask nicely that developers avoid this, but we all know that they’re feeding these AI everything they can, and they’re being sued for it.[2]

There are a few potential avenues of liability here for you: contributory copyright infringement, vicarious copyright infringement, and violation of the Digital Millenium Copyright Act (DMCA) among them. Even worse, fair use doctrine might not save you. We’ll get to that part last.

Unlike Grokster, you’re hopefully not intentionally courting unlawful use of your supercomputer.[3] However, all it takes for contributory infringement is that you know, or have reason to know, your AI developers are directly infringing copyrights and you materially contribute to that infringement.[4] As discussed, we all know these AI developers are feeding AI copyrighted material, so what remains to be seen is if that use is infringing, something you have little control of. If it is, then providing the means for developers to infringe copyrights very quickly is probably a material contribution.

Additionally, vicarious liability doesn’t even require you know about the infringement, only that you have the right and ability to supervise the developer’s conduct and a direct financial interest in the infringing activities.[5] As owner of the supercomputer presumably negotiating a use contract with AI developers, you may meet both these criteria.

What, of course, about the DMCA? Isn’t it supposed to protect service providers from liability for infringement by users? Well, first, it’s unclear if your supercomputer will be considered a network or system.[6] So far, DMCA cases have only involved facilitating websites and online networks rather than physical computers. Even if a supercomputer could qualify for protection, the DMCA has a take-down requirement: if you receive notice of specific copyrighted material in the training set, you will have to remove it.[7] If you receive that notice after training is complete, is removal of specific copyrighted material even possible? Likely not.[8]

As mentioned, copyright law allows for fair use of copyrighted materials.[9] This is usually where would-be infringers, like Google Books, are saved.[10] When determining if a particular use of copyrighted material is infringing, fair use considers:

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.[11]

The thing about AI training, though, is that the purpose is probably commercial (factor one), training sets likely include the entire copyrighted work (factor three), and if people can load up a generated AI image, song, or book summary then they don’t need to pay for that image, song, or book (factor four). While many uses of generative AI might not affect the market value of its training set, some uses certainly do. Additionally, copyrighted material included in an AI training set through data mining is copyrighted material that could be included in a training set through licensing.[12] It’s not the strongest defense.

If you really want to help AI development by lending out your supercomputer, for now it’s probably best to tightly control what kinds of data developers include in training sets.

 

Emma Stauber is a 2L at Vanderbilt Law School. She plans on focusing on intellectual property after law school.

 

[1] Michael Kan, Musk’s xAI Supercomputer Goes Online With 100,000 Nvidia GPUs, PCMAG (Sept. 3, 2024), https://www.pcmag.com/news/musks-xai-supercomputer-goes-online-with-100000-nvidia-gpus; Michael Kan, Zuckerberg’s Meta Is Spending Billions to Buy 350,000 Nvidia H100 GPUs, PCMAG (Jan. 18, 2024), https://www.pcmag.com/news/zuckerbergs-meta-is-spending-billions-to-buy-350000-nvidia-h100-gpus.

[2] See, e.g., Andersen v. Stability AI Ltd., No. 23-CV-00201-WHO, 2024 WL 3823234 (N.D. Cal. Aug. 12, 2024); see also Joe Coscarelli, An A.I. Hit of Fake ‘Drake’ and ‘The Weeknd’ Rattles the Music World, The New York Times (Apr. 19, 2023), https://www.nytimes.com/2023/04/19/arts/music/ai-drake-the-weeknd-fake.html; Cheyenne DeVon, Billie Eilish, Nicki Minaj, Jon Bon Jovi and over 200 Artists Call for Protections against “Predatory Use of AI,” CNBC (Apr. 5, 2024), https://www.cnbc.com/2024/04/05/billie-eilish-nicki-minaj-200-artists-sign-letter-against-ai-music.html.

[3] Metro-Goldwyn-Mayer Studios Inc. v. Grokster, Ltd., 545 U.S. 913, 919 (2005) (“One who distributes a device with the object of promoting its use to infringe copyright… is liable for the resulting acts of infringement by third parties using the device, regardless of the device’s lawful use.”) In this case, by awarding developers time on your machine, you’ve distributed it.

[4] See NCR Corp. v. Korala Assocs., Ltd., 512 F.3d 807, 816 (6th Cir. 2008).

[5] Broad. Music, Inc. v. Meadowlake, Ltd., 754 F.3d 353, 355 (6th Cir. 2014); Gershwin Pub. Corp. v. Columbia Artists Mgmt., Inc., 443 F.2d 1159, 1163 (2d Cir. 1971).

[6] 17 U.S.C.A. § 512(c)(1)(A) (West).

[7] Id.

[8] See also David Rosenthal, What Is inside an AI Model and How It Works, Vischer (May 7, 2024), https://www.vischer.com/en/knowledge/blog/part-17-what-is-inside-an-ai-model-and-how-it-works/.

[9] 17 U.S.C.A. § 107 (West).

[10] Authors Guild v. Google, Inc., 804 F.3d 202, 229 (2d Cir. 2015).

[11] 17 U.S.C.A. § 107 (West).

[12] Jenny Quang, Does Training AI Violate Copyright Law?, 36 Berkeley Tech. L. J. 1407, 1429 (2023) (Quang’s article also addresses fair use in AI training from the angle of transformative value and court friendly-ness to new technology, ultimately also conclusion that fair use is a weak defense in the AI training space).