Mira Murati, OpenAI’s longtime chief technology officer, sat down with The Wall Street Journal’s Joanna Stern this week to discuss Sora, the company’s forthcoming video-generating AI.
It’s a bad look all around for OpenAI, which has drawn wide controversy — not to mention multiple copyright lawsuits, including one from The New York Times — for its data-scraping practices.
After the interview, Murati reportedly confirmed to the WSJ that Shutterstock videos were indeed included in Sora’s training set.
But when you consider the vastness of video content across the web, any clips available to OpenAI through Shutterstock are likely only a small drop in the Sora training data pond.
Others, meanwhile, jumped to Murati’s defense, arguing that if you’ve ever published anything to the internet, you should be perfectly fine with AI companies gobbling it up.
Whether Murati was keeping things close to the vest to avoid more copyright litigation or simply just didn’t know the answer, people have good reason to wonder where AI data — be it “publicly available and licensed” or not — is coming from.
The original article contains 667 words, the summary contains 178 words. Saved 73%. I’m a bot and I’m open source!
Funny how we have all this pissing and moaning about stealing, yet nobody ever complains about this bot actually lifting entire articles and spitting them back out without ads or fluff. I guess it’s different when you find it useful, huh?
I like the bot, but I mean y’all wanna talk about copyright violations? The argument against this bot is a hell of a lot more solid than just using data for training.
This is the best summary I could come up with:
Mira Murati, OpenAI’s longtime chief technology officer, sat down with The Wall Street Journal’s Joanna Stern this week to discuss Sora, the company’s forthcoming video-generating AI.
It’s a bad look all around for OpenAI, which has drawn wide controversy — not to mention multiple copyright lawsuits, including one from The New York Times — for its data-scraping practices.
After the interview, Murati reportedly confirmed to the WSJ that Shutterstock videos were indeed included in Sora’s training set.
But when you consider the vastness of video content across the web, any clips available to OpenAI through Shutterstock are likely only a small drop in the Sora training data pond.
Others, meanwhile, jumped to Murati’s defense, arguing that if you’ve ever published anything to the internet, you should be perfectly fine with AI companies gobbling it up.
Whether Murati was keeping things close to the vest to avoid more copyright litigation or simply just didn’t know the answer, people have good reason to wonder where AI data — be it “publicly available and licensed” or not — is coming from.
The original article contains 667 words, the summary contains 178 words. Saved 73%. I’m a bot and I’m open source!
Funny how we have all this pissing and moaning about stealing, yet nobody ever complains about this bot actually lifting entire articles and spitting them back out without ads or fluff. I guess it’s different when you find it useful, huh?
I like the bot, but I mean y’all wanna talk about copyright violations? The argument against this bot is a hell of a lot more solid than just using data for training.