I wonder if NYT is being 'overly non-technically and non-legally balanced' here (in the full article)... 🤔 Quote: "Mr. Balaji, 25, who has not taken a new job and is working on what he calls “personal projects,” is among the first employees to leave a major A.I. company and speak out publicly against the way these companies have used copyrighted data to create their technologies. A former vice president at the London start-up Stability AI, which specializes in image- and audio-generating technologies, has made similar arguments.
Over the past two years, a number of individuals and businesses have sued various A.I. companies, including OpenAI, arguing that they illegally used copyrighted material to train their technologies. (...)
In December, The New York Times sued OpenAI and its primary partner, Microsoft, claiming they used millions of articles published by The Times to build chatbots that now compete with the news outlet as a source of reliable information. Both companies have denied the claims.
Many researchers who have worked inside OpenAI and other tech companies have cautioned that A.I. technologies could cause serious harm. But most of those warnings have been about future risks, like A.I. systems that could one day help create new bioweapons or even destroy humanity.
Mr. Balaji believes the threats are more immediate. ChatGPT and other chatbots, he said, are destroying the commercial viability of the individuals, businesses and internet services that created the digital data used to train these A.I. systems.
“This is not a sustainable model for the internet ecosystem as a whole,” he told The Times." (...) Mr. Balaji does not believe these criteria have been met. When a system like GPT-4 learns from data, he said, it makes a complete copy of that data. From there, a company like OpenAI can then teach the system to generate an exact copy of the data. Or it can teach the system to generate text that is in no way a copy. The reality, he said, is that companies teach the systems to do something in between.
“The outputs aren’t exact copies of the inputs, but they are also not fundamentally novel,” he said. This week, he posted an essay on his personal website that included what he describes as a mathematical analysis that aims to show that this claim is true. (...) The technology violates the law, Mr. Balaji argued, because in many cases it directly competes with the copyrighted works it learned from. Generative models are designed to imitate online data, he said, so they can substitute for “basically anything” on the internet, from news stories to online forums.
The larger problem, he said, is that as A.I. technologies replace existing internet services, they are generating false and sometimes completely made-up information — what researchers call “hallucinations.” The internet, he said, is changing for the worse." Source: https://lnkd.in/djJAzGac Johan Cedmar-Brandstedt , Axel C.