New York Times-OpenAI
AP Photo
A sign for The New York Times hangs above the entrance to its building, Thursday, May 6, 2021, in New York. The New York Times filed a federal lawsuit against OpenAI and Microsoft on Wednesday, Dec. 27, 2023, seeking to end the practice of using published material to train chatbots.

A group of news organizations, led by The New York Times, took ChatGPT maker OpenAI to federal court on Tuesday in a hearing that could determine whether the tech company has to face the publishers in a high-profile copyright infringement trial.

Three publishers' lawsuits against OpenAI and its financial backer Microsoft have been merged into one case. Leading each of the three combined cases are the Times, The New York Daily News and the Center for Investigative Reporting.

Other publishers, like the Associated Press, News Corp. and Vox Media, have reached content-sharing deals with OpenAI, but the three litigants in this case are taking the opposite path: going on the offensive.

The hearing on Tuesday centered on OpenAI's motion to dismiss, a critical stage in the case in which a judge will either clear the litigation to proceed to trial, or toss it.

The publishers' core argument is that the data that powers ChatGPT has included millions of copyrighted works from the news organizations, articles that the publications argue were used without consent or payment — something the publishers say amounts to copyright infringement on a massive scale.

"We have to follow the data," said Times lawyer Jennifer Maisel in court Tuesday. "Similar to how in criminal cases you follow the money."

And if you follow the data, the publishers' legal team argued, ChatGPT and Microsoft are profiting from journalistic work that was scanned, processed and recreated without payment or consent. Microsoft has incorporated OpenAI technology into its Bing search engine.

"It's substitional," said Times attorney Ian Crosby, meaning that ChatGPT and Bing have become, for some people, a substitute for the publishers' original work. That point, if proven, is key to winning a copyright infringement case.

In court papers, Crosby expanded by writing that OpenAI's "unlawful use of The Times's work to create artificial intelligence products that compete with it threatens The Times's ability to provide that service."

"Using the valuable intellectual property of others in these ways without paying for it has been extremely lucrative" for OpenAI, he continued.

OpenAI has argued that the vast amount of data used to train its artificial intelligence bot has been protected by "fair use" rules. That is a doctrine in American law that allows copyrighted material to be used for things like educational, research or commentary purposes.

In order to clear the fair use test, the work in question has to have transformed the copyrighted work into something new, and the new work cannot compete with the original in the same marketplace, among other factors.

To make the case that their use of the text is transformative, OpenAI and Microsoft's legal team explained how large language models, like ChatGPT, work to Judge Sidney Stein, an appointee of President Bill Clinton.

Attorneys for the companies said that when OpenAI's artificial intelligence models are fed data, it is then sorted into a series of "tokens," units that make analyzing the data more manageable. Eventually, the model can recognize patterns.

Joseph Gratz, an OpenAI lawyer, said regurgitating entire articles "is not what it is designed to do and not what it does" when it comes to how ChatGPT operates.

"This isn't a document retrieval system. It is a large language mode," Gratz said.

Gratz claimed the examples of infringement cited by the Times in the lawsuit had to have occurred only after "thousands of tens of thousands" of queries. In essence, Gratz argued that the publishers primed the chatbot to spit out text that was lifted from the publishers' websites.

Microsoft says the Times is using its 'might and its megaphone' to challenge threatening technology

Writing in their motion to dismiss, lawyers for Microsoft, OpenAI's largest investor, wrote that it was not illegal for OpenAI to ingest that journalistic text.

"In this case, The New York Times uses its might and its megaphone to challenge the latest profound technological advance: the Large Language Model, or LLM," they wrote in the court filing, describing the technology that underpins ChatGPT. "Despite The Times's contentions, copyright law is no more an obstacle to the LLM than it was to the VCR (or the player piano, copy machine, personal computer, internet, or search engine)."

But the news organizations argue that not only has ChatGPT's global success hinged in part on vacuuming up troves of copyrighted articles, but that ChatGPT is now effectively a competitor as a source of reliable information.

This was part of the argument at court Tuesday, when another aspect of how ChatGPT works became the subject of debate. It's known as "retrieval augmented generation." In plain English: It integrates up-to-date and more specific information from the web into the chatbot's answers.

While some of this information, like large sections from news stories, may not have been part of the chatbot's training data, it still can appear in ChatGPT's outputs.

Steven Lieberman, attorney for The New York Daily News, said: "This allows for free riding," a reference to readers who turn to OpenAI recreations of newspaper articles, rather than going to a publisher's website.

What could happen next?

According to the complaint filed by the Times, OpenAI should be on the hook for billions of dollars in damages over illegally copying and using the newspaper's archive. The lawsuit also calls for the destruction of ChatGPT's dataset.

That would be a drastic outcome. If the publishers win the case, and a federal judge orders the dataset destroyed, it could completely upend the company, since it would force OpenAI to recreate its dataset relying only on works it has been authorized to use.

Federal copyright law also carries stiff financial penalties, with violators facing fines of up to $150,000 for each infringement "committed willfully."

"If you're copying millions of works, you can see how that becomes a number that becomes potentially fatal for a company," Daniel Gervais, the co-director of the intellectual property program at Vanderbilt University who studies generative AI, told NPR in August 2023, when the Times was considering legal action against OpenAI before filing suit that December. "Copyright law is a sword that's going to hang over the heads of AI companies for several years unless they figure out how to negotiate a solution."

While he did not issue a decision on Tuesday, Judge Stein said he would soon rule on whether the case against OpenAI can go forward, or if it will be dismissed.

300x250 Ad

300x250 Ad

Support quality journalism, like the story above, with your gift right now.

Donate