I really fear for the internet and what it will become in even just another year, with the rise of AI writing and AI art being used in place of real people. And now OpenAI openly state they need to use copyrighted works for training material.
As reported by The Guardian, the New York Times sued OpenAI and Microsoft over copyright infringement and just recently OpenAI sent a submission to the UK House of Lords Communications and Digital Select Committee where OpenAI said pretty clearly:
Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.
Worth noting OpenAI put up their own news post "OpenAI and journalism" on January 8th.
Why am I writing about this here? Well, the reasoning is pretty simple. AI writing is (on top of other things) increasing the race to the bottom of content for clicks. Search engines have quickly become a mess to find what you actually want, and it's only going to continue getting far worse thanks to all these SEO (Search Engine Optimisation) bait content farms, with more popping up all the time, and we've already seen some bigger websites trial AI writing. The internet is a mess.
As time goes on, and as more people use AI to pinch content and write entire articles, we're going to hand off profitable writing to a select few big names only who can weather the storm and handle it. A lot of smaller scale websites are just going to die off. Any time you search for something, it will be those big names sprinkled in between the vast AI website farms all with very similar robotic plain writing styles.
Many (most?) websites make content for search engines, not for people. The Verge recently did a rather fascinating piece on this showing how websites are designed around Google, and it really is something worth scrolling through and reading.
One thing you can count on: my perfectly imperfect writing full of terrible grammar continuing without the use of AI. At least it's natural right? I write as I speak, for better or worse. By humans, for humans — a tagline I plan to stick with until AI truly takes over and I have to go find a job flipping burgers or something. But then again, there will be robots for that too. I think I need to learn how to fish…
QuoteBecause copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.
Yet their own platform, LLM, code, etc is copyrighted and not released under an open-source licence. I could potentially believe their shit if AI was to benefit everyone, not just them and/or a few companies.
QuoteBecause copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.
That's like saying that I cannot profit massively without destroying the environment and exploiting employees. Oh, wait...
QuoteBecause copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.
Today's citizens needs are not a fascinating AI built on top of other people's works. Citizens today need job stability, affordable housing, a public, free and of quality education and health, and the list could be virtually endless.
Quoting: scaineQuoting: NathanaelKStottlemyerQuoting: Purple Library GuyQuoting: NathanaelKStottlemyerP.S. According to LanguageTool, three commas were needed in the article.Ehhh, IMO commas are kind of a "soft" punctuation mark--there are stylistic differences in how people use them. There are many situations where it's not really technically "wrong" either to use one or not to use one, and others where it is wrong by some technical standards to do it a particular way, but doing it that "wrong" way still works given the flow of the sentence and the way people talk. Periods, for instance, are a lot clearer--if you're at the end of a sentence you should be using one, period. Well, unless you have a reason to use a question mark or exclamation point instead. But commas are comparatively mushy, and I don't trust computerized guidance about how to use them.
All the places where LanguageTool said a comma was needed, I wouldn't care either way. However, I personally err on the side of using the commas, because they save lives after all.
This joke?
A comma is the difference between:
- Let's eat, Grandma!
and
- Let's eat Grandma!
It's not a joke, it's serious. Every day, countless lives are lost!
Sorry that job has already been taken by AI. What we do have is a job where rich folk need someone to live in their toilet to wipe their asses! Requires a degree!
Last edited by TheRiddick on 11 January 2024 at 4:36 am UTC
well, not really everything, GamingOnLinux is a corner of interesting stuff made by humans for humans, and these little corners must be preserved for the sake of humanity.
I think there is an argument that reading copyrighted material is same as a human doing so and then writing their own creative work, however it doesn't stand against companies' acceptable use policies which often deny or limit scraping by bots. This is exactly the same as bot scraping, where the difference between typical usage is a machine doing it as well as the volume (resource expense).
Quoting: 14I think there is an argument that reading copyrighted material is same as a human doing so and then writing their own creative work
When people do this, they pay for the privilege, or access libraries and can only check out books and copyrighted materials for private use. OpenAI and others aren't doing that, they're just consuming all the content, even pirated material and context behind paywalls, on the internet, and using it to train their model.
Of course, proving that will be the court battle.
Quoting: 14I think there is an argument that reading copyrighted material is same as a human doing so and then writing their own creative work
Look up the legal standing of fan fiction. Than repeat that statement.
Using copyrighted "aspects" is enough to be considered a copyright violation.
Quoting: LoudTechieI suggest looking up Marion Zimmer Bradley.Quoting: 14I think there is an argument that reading copyrighted material is same as a human doing so and then writing their own creative work
Look up the legal standing of fan fiction. Than repeat that statement.
Using copyrighted "aspects" is enough to be considered a copyright violation.
QuoteFor many years, Bradley actively encouraged Darkover fan fiction. She encouraged submissions from unpublished authors and reprinted some of it in commercial Darkover anthologies. This ended after a dispute with a fan over an unpublished Darkover novel of Bradley's that had similarities to one of the fan's stories. As a result, the novel remained unpublished and Bradley demanded the cessation of all Darkover fan fictionThe fan threatened to take Marion Zimmer Bradley to court for infringing on the fan's copyright. The fan holds the copyright to their own prose. The fan clearly does not hold the copyright to the characters. But should the author of the original work use prose from a fan work...well, things get dicey.
You'd also expect to face some legal trouble if you ripped some fan subs and tried to pass them off as your own translation (which has been done before).
Of note is the Organization for Transformative Works, which works to protect fan works and has this to say:
QuoteCopyright is intended to protect the creator’s right to profit from her work for a period of time to encourage creative endeavor and the widespread sharing of knowledge. But this does not preclude the right of others to respond to the original work, either with critical commentary, parody, or, we believe, transformative works.And:
In the United States, copyright is limited by the fair use doctrine. The legal case of Campbell v. Acuff-Rose held that transformative uses receive special consideration in fair use analysis. For those interested in reading in-depth legal analysis, more information can be found on the Fanlore Legal Analysis page.
QuoteWhile case law in this area is limited, we believe that current copyright law already supports our understanding of fanfiction as fair use.
We seek to broaden knowledge of fan creators’ rights and reduce the confusion and uncertainty on both fan and pro creators’ sides about fair use as it applies to fanworks. One of our models is the documentary filmmakers’ statement of best practices in fair use, which has helped clarify the role of fair use in documentary filmmaking.
It's certainly not as cut and dry as you might think.
See more from me