New Court Filings Claim Meta Knew About Copyright Infringement in AI Training Dataset
Meta Platforms allegedly used pirated versions of copyrighted books to train its artificial intelligence systems, with approval from CEO Mark Zuckerberg, according to authors involved in a lawsuit against the company.
Ta-Nehisi Coates, comedian Sarah Silverman, and other authors suing Meta for copyright infringement made these accusations in newly disclosed court filings in California federal court. They stated that internal documents produced by Meta during the discovery process showed that the company was aware that the books were pirated.
The authors sued Meta in 2023, claiming that the tech giant misused their books to train its large language model, Llama.
They presented new evidence suggesting that Meta used the AI training dataset LibGen, which allegedly contains millions of pirated works, distributed via peer-to-peer torrents. The authors further claimed that internal Meta communications showed Zuckerberg “approved Meta’s use of the LibGen dataset despite concerns within Meta’s AI executive team (and others at Meta) that LibGen is ‘a dataset we know to be pirated.'”
In 2023, U.S. District Judge Vince Chhabria dismissed claims that Meta’s chatbots infringed the authors’ copyrights and unlawfully stripped their books’ copyright management information (CMI).
On Wednesday, the authors requested permission to revive their infringement claims and introduce a new computer fraud claim. Chhabria stated that he would allow the authors to file an amended complaint but expressed skepticism about the fraud and CMI claims’ merit.