Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
My description might’ve been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn’t know.
What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.
LLMs are an evolution of the same idea. I’m not saying it’s not impressive because it’s very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.
It’s easy to look at what people have created throughout history and think “this looks like that” and on a point by point basis you’d be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we’ve heard recently but we’ll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don’t know. But Carlin regularly calls upon his own experiences so it’s likely that he’s referencing a event from his past that is similar to that of 200 years ago. He might’ve subconsciously absorbed the information.
The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work. I don’t think they’re taking the position that it’s intelligent because from the beginning that was a marketing ploy. They’re taking the position that they should be allowed to use the data they stole because there was no other way.