I really fear for the internet and what it will become in even just another year, with the rise of AI writing and AI art being used in place of real people. And now OpenAI openly state they need to use copyrighted works for training material.
As reported by The Guardian, the New York Times sued OpenAI and Microsoft over copyright infringement and just recently OpenAI sent a submission to the UK House of Lords Communications and Digital Select Committee where OpenAI said pretty clearly:
Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.
Worth noting OpenAI put up their own news post "OpenAI and journalism" on January 8th.
Why am I writing about this here? Well, the reasoning is pretty simple. AI writing is (on top of other things) increasing the race to the bottom of content for clicks. Search engines have quickly become a mess to find what you actually want, and it's only going to continue getting far worse thanks to all these SEO (Search Engine Optimisation) bait content farms, with more popping up all the time, and we've already seen some bigger websites trial AI writing. The internet is a mess.
As time goes on, and as more people use AI to pinch content and write entire articles, we're going to hand off profitable writing to a select few big names only who can weather the storm and handle it. A lot of smaller scale websites are just going to die off. Any time you search for something, it will be those big names sprinkled in between the vast AI website farms all with very similar robotic plain writing styles.
Many (most?) websites make content for search engines, not for people. The Verge recently did a rather fascinating piece on this showing how websites are designed around Google, and it really is something worth scrolling through and reading.
One thing you can count on: my perfectly imperfect writing full of terrible grammar continuing without the use of AI. At least it's natural right? I write as I speak, for better or worse. By humans, for humans — a tagline I plan to stick with until AI truly takes over and I have to go find a job flipping burgers or something. But then again, there will be robots for that too. I think I need to learn how to fish…
It would be bad enough if it only meant more trash on the Internet, but you just know this widespread corruption of language is also going to influence people, especially language learners, who will be much more frequently exposed to mistranslations and unnatural expressions and adopt them naively. The effect on English speakers will likely be lesser or slower to manifest (English being the "native" tongue of most AI and being a simple language to begin with), but I weep for just about every other language.
Quoting: NathanaelKStottlemyer...it's not hard to write anybody can do it, and it's the pennical of laziness to...*Pinnacle ;)
Quoting: SalvatosWorking in translation, another worrisome trend I’ve noticed is content farms not just using AI to write articles, but to translate them, so we’re seeing tons of poorly translated content filling up search engine results in content farms with country-specific domains (e.g. "english-sounding-name.fr"). Logically, the AI is going to continue to get trained on junk content that it (or another AI) wrote or translated itself and perpetuate, if not amplify, the errors in it by seeing them as commonly used. A race to the bottom indeed.Duolingo recently did this. Translations now by AI, with only a few checking them over. Expect more of this sort of thing over time.
And for what? LLMs are just complex guessers. Sure, they guess with context, but they're still just guessing based on all the billions of documents they consumed during their (extremely intensive) training. You can't use them for research because they make shit up... because they're just guessing. It's a mess.
I'm hoping that 2024 might see some of this novelty wear off as consumers realise how bland and uninspiring AI generated content generally is, but I suspect that real, lasting damage will have been done by then.
It has a use in enterprise settings, properly controlled, with targeted outcomes. As it stands? Total shit show.
Quoting: damarrinWell, AI may be bad right now, but I'll eat my hat if it doesn't become much less bad very quickly.I've noticed that in this current degenerate age, most people don't even have hats. How do I know you will really eat it? You have no credibility, sir!
More seriously, I'm not sure it will improve that much that fast. This seems like a new technology because of the way it burst on the scene, but the research into this basic schtick has been going on for decades, staying quiet until they got the whole thing looking promising enough that someone was willing to sink in the cash to scale it up to really big data sets. And with these things, the size of the data set is key. So while it looks new, it may actually already be a fairly mature technology, not subject to the kind of rapid improvement you might expect from something genuinely new.
Last edited by Purple Library Guy on 9 January 2024 at 5:38 pm UTC
Quoting: NathanaelKStottlemyerP.S. According to LanguageTool, three commas were needed in the article.Ehhh, IMO commas are kind of a "soft" punctuation mark--there are stylistic differences in how people use them. There are many situations where it's not really technically "wrong" either to use one or not to use one, and others where it is wrong by some technical standards to do it a particular way, but doing it that "wrong" way still works given the flow of the sentence and the way people talk. Periods, for instance, are a lot clearer--if you're at the end of a sentence you should be using one, period. Well, unless you have a reason to use a question mark or exclamation point instead. But commas are comparatively mushy, and I don't trust computerized guidance about how to use them.
QuoteWorking in translation, another worrisome trend I’ve noticed is content farms not just using AI to write articles, but to translate them
Given the choice between a corrupt human translation and an AI translation, which one will you choose?
Canonical recently had to take down the Ubuntu 23.10 release because a corrupt translator vandalized the Ukrainian translation. Although it's perfectly understandable why they would do so, it is no less inappropriate and disrespectful to the authors of the original text.
The Anime industry has recently come under fire for that sort of localization vandalism, too. Apparently it's gotten so bad, that people will celebrate when a human translator is fired from the Anime industry, and replaced with an AI.
if you are going to let AI systems steal copyrighted content then it is also OK for the ReactOS and wine teams to use leaked windows source code to build ReactOS and wine if they did that Microsoft will DMCA strike the projects faster that you can say Microsoft
the problems with GitHub copilot is 1. AI models getting trained on source code that is source available but not open source for example the windows research kernel
2. having a project on GitHub and not having the option not to let the AI models train on there project but who gets the final decision the project lead or is it like trying to change the license of a project where you need most of the contributors to agree to the license change
Quoting: SalvatosQuoting: NathanaelKStottlemyer
...it's not hard to write anybody can do it, and it's the pennical of laziness to...
*Pinnacle ;)
Clearly I'm not a robot.
See more from me