New York Times sues OpenAI, Microsoft for using articles to train AI

The New York Times sued OpenAI and Microsoft on Wednesday over the tech companies’ use of its copyright articles to train their artificial intelligence technology, joining a growing wave of opposition to the tech industry’s using creative work without paying for it or getting permission.

OpenAI and Microsoft used “millions” of Times articles to help build their tech, which is now extremely lucrative and directly competes with the Times’s own services, the newspaper’s lawyers wrote in a complaint filed in federal court in Manhattan.

“For months, The Times has attempted to reach a negotiated agreement,” the Times’s lawyers said in the lawsuit. “These negotiations have not led to a resolution.”

Spokespeople for OpenAI and Microsoft did not immediately respond to requests for comment.

The “large language models” (LLMs) behind AI tools such as ChatGPT work by ingesting huge amounts of text scraped from the internet, learning the connections between words and concepts, and then developing the ability to predict what word to say next in a sentence, allowing them to mimic human speech and writing. OpenAI, Microsoft and Google have refused to reveal what goes into their newest models, but previous LLMs have been shown to include large amounts of content from news organizations and catalogues of books.

Inside the secret list of websites that make AI like ChatGPT sound smart

The tech companies have steadfastly said that the use of information scraped from the internet to train their AI algorithms falls under “fair use” — a concept in copyright law that allows people to use the work of others if it is substantially changed. The Times’s lawsuit, however, includes multiple examples of OpenAI’s GPT-4 AI model outputting New York Times articles word for word.

Legal experts have said that plaintiffs will have stronger cases of copyright infringement if they can show that AI tools are directly reproducing copyrighted works, rather than paraphrasing the information from them.

The news industry has been grappling with its relationship to this rapidly evolving technology. Several media companies have started internal conversations on how to use emerging automated tools to assist with newsgathering and production. And some, such as Sports Illustrated, have faced backlash for using AI to generate news articles that were passed off as written by humans.

Other online publishing companies have already begun using AI to churn out huge amounts of new content with a goal of winning Google search traffic to gin up ad revenue. These include fake news sites that publish false information. Since May, the number of websites showing fake AI-written articles has jumped by more than 1,000 percent, according to NewsGuard, an organization that tracks misinformation.

But the use of this technology also presents a possible existential crisis for the news industry, which has struggled to find ways to replace the revenue it once generated from its profitable print products. The number of journalists working in newsrooms declined by more than 25 percent between 2008 and 2020, according to the Pew Research Center.

By suing OpenAI and Microsoft, the Times is joining a growing group of artists, authors, musicians, filmmakers and other creative professionals who want credit and compensation from tech companies that took their work to build tools that they say are already undermining their work.

Some of them, including blockbuster writers such as George R.R. Martin, Jodi Picoult, Jonathan Franzen and George Saunders, have also sued OpenAI. And since August, at least 583 news organizations, including the Times, The Washington Post and Reuters, have installed blockers on their websites to prevent tech companies from scraping their articles. But it’s likely that their online catalogues, going back decades, already have been used to create AI tools.

Meanwhile, OpenAI has been negotiating deals with news organizations over the past year to pay them for content. In July, it signed a deal with the Associated Press for access to its archive of news articles. But in October, a spokesperson for OpenAI said that the company’s practices do not violate copyright laws and that the deals it was negotiating would be intended only for accessing content that it couldn’t get online or for showing links or full sections of articles in ChatGPT.

German publishing company Axel Springer, which owns Politico and Business Insider, earlier this month also signed a deal with OpenAI, under which the tech company will pay to show parts of articles in ChatGPT answers. And earlier this year, Google pitched media outlets on building and selling AI tools that could assist journalists.

Leave a Reply Cancel reply