Meta Faces Copyright Claims Over AI Training Data: Survives Motion to Dismiss Stage
By Dale Weiford; Photo Credit: REUTERS/Dado Ruvic
Artificial Intelligence’s (AI) rapid advancement and the endless lucrative possibilities that stem from such advancement, have created many legal battles, particularly concerning intellectual property. One case unfolding in the Northern District of California, Richard Kadrey, et al. v. Meta Platforms, Inc., highlights the contentious issue of using copyrighted materials to train large language models (LLMs) and has potential ramifications for how AI developers approach data acquisition and disclosure requirements on that data acquired. The core assertion in Kadrey, is that Meta Platforms, Inc. (Meta), the creator of the LLaMA set of large language models, infringed upon the copyrights of authors and journalists, including Sarah Silverman, Ta-Nehisi Coates, and a proposed class of similarly situated authors.[1] But the legal arguments can be situated into two buckets, the first being claims under the Computer Data Access and Fraud Act (CDAFA) and the second being claims arising under the Digital Millennium Copyright Act (DMCA).[2]
Regarding the CDAFA claim, the Plaintiffs alleged that Meta’s downloading of their books constituted a violation of this act.[3] However, the court granted Meta’s motion to dismiss this claim.[4] The court reasoned that, there was no way to truly understand the CDAFA claim as based on any right “qualitatively different” from the Plaintiffs’ rights under the Copyright Act, thus making such a CDAFA claim inapplicable in this context.[5]
The Plaintiffs’ claim under the Digital Millennium Copyright Act (DMCA), specifically 17 U.S.C. § 1202(b)(1) concerning the removal of copyright management information (CMI), however, was allowed to proceed.[6] CMI includes information such as the title and the author’s name.[7] The Plaintiffs argued that Meta intentionally removed CMI from their copyrighted books during the training process of LLaMA.[8] Judge Chhabria found that the Plaintiffs adequately alleged that Meta intentionally removed CMI to conceal copyright infringement.[9] The complaint stated that Meta was aware LLaMA was prone to outputting CMI unless it was removed from the training data and that Meta took other steps to reduce the likelihood of LLaMA generating outputs revealing the inclusion of copyrighted material.[10] This led the court to a “reasonable, if not particularly strong, inference” that the removal was an effort to hide the fact that LLaMA was trained on copyrighted material.[11] This could have major implications for the fair use doctrine: that intentional concealment of copyrighted material usage could undermine a defense based on fair use for purposes like research and/or teaching.[12]
Fair use allows certain ‘fair’ uses of copyrighted works for purposes like research and teaching, without infringement.[13] However, this ruling in Kadrey regarding the alleged intentional removal of CMI to conceal infringement, introduces a potential rejection of those fair use arguments: that in training AI models, companies are using copyrighted information to “teach,” as the statute defines (and how courts interpret) the term.[14] It follows that the argument that AI training is similar (or synonymous to) traditional “teaching, scholarship, or research” becomes more difficult to sustain, at least at the motion to dismiss stage, if there is evidence of intentional efforts to conceal the use of copyrighted material by removing identifying information.[15] The court’s decision to allow the DMCA claim to proceed (past motion to dismiss stage) based on the alleged intent to conceal infringement, could signal a narrower interpretation of fair use in cases where such concealment is deliberate, potentially reinforcing a judicial policy prioritizing transparency and complete disclosure about how models are made and trained.[16] An interpretation that furthered such goals could help ensure that proper credit is assigned to authors of those used in training models, fostering greater trust in the systems and methods utilized by AI companies, ultimately benefiting both authors of the works used, and users of the AI products alike.
Dale Weiford is a 2L at Vanderbilt Law School. He plans to focus his legal career around Entertainment and Transactional work in Las Vegas after law school.
[1] Isaiah Poritz, Meta Fails to Beat Copyright Notice Removals Claim in AI Case, Bloomberg Law (Mar. 7, 2025), https://www.bloomberglaw.com/product/ip/bloomberglawnews/ip-law/BNA%20000001957310d7d1a195771ce33a0000?bna_news_filter=ip-law.
[2] See Kadrey v. Meta Platforms Inc., No. 3:23-cv-03417, at *1 (N. D. Cal. July 7, 2023) (order granting in part and denying in part motion to dismiss).
[3] See Amended Complaint at 1, Kadrey, No. 3:23-cv-03417 (N.D. Cal. July 7, 2023) (No. 407).
[4] Kadrey, No. 3:23-cv-03417, at *3 (order granting in part and denying in part motion to dismiss)
[5] Id.
[6] Id. at 2.
[7] Id. at 2–3.
[8] Id.
[9] See id.
[10] Id.
[11] Id.
[12] See 17 U.S.C. § 107.
[13] See id.
[14] See Bobby Allyn, ‘The New York Times’ takes OpenAI to court. ChatGPT’s future could be on the line, NPR (Jan. 14, 2025), https://www.npr.org/2025/01/14/nx-s1-5258952/new-york-times-openai-microsoft.
[15] See id.; Kadrey, No. 3:23-cv-03417, at 2–3.
[16] See id. at 3.