OpenAI say it would be 'impossible' to train AI without pinching copyrighted works

By Liam Dawe - 9 January 2024 at 12:47 pm UTC

I really fear for the internet and what it will become in even just another year, with the rise of AI writing and AI art being used in place of real people. And now OpenAI openly state they need to use copyrighted works for training material.

As reported by The Guardian, the New York Times sued OpenAI and Microsoft over copyright infringement and just recently OpenAI sent a submission to the UK House of Lords Communications and Digital Select Committee where OpenAI said pretty clearly:

Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.

Worth noting OpenAI put up their own news post "OpenAI and journalism" on January 8th.

Why am I writing about this here? Well, the reasoning is pretty simple. AI writing is (on top of other things) increasing the race to the bottom of content for clicks. Search engines have quickly become a mess to find what you actually want, and it's only going to continue getting far worse thanks to all these SEO (Search Engine Optimisation) bait content farms, with more popping up all the time, and we've already seen some bigger websites trial AI writing. The internet is a mess.

As time goes on, and as more people use AI to pinch content and write entire articles, we're going to hand off profitable writing to a select few big names only who can weather the storm and handle it. A lot of smaller scale websites are just going to die off. Any time you search for something, it will be those big names sprinkled in between the vast AI website farms all with very similar robotic plain writing styles.

Many (most?) websites make content for search engines, not for people. The Verge recently did a rather fascinating piece on this showing how websites are designed around Google, and it really is something worth scrolling through and reading.

One thing you can count on: my perfectly imperfect writing full of terrible grammar continuing without the use of AI. At least it's natural right? I write as I speak, for better or worse. By humans, for humans — a tagline I plan to stick with until AI truly takes over and I have to go find a job flipping burgers or something. But then again, there will be robots for that too. I think I need to learn how to fish…

Article taken from GamingOnLinux.com.

Tags: Editorial, Misc

26 Likes

About the author - Liam Dawe

I am the owner of GamingOnLinux. After discovering Linux back in the days of Mandrake in 2003, I constantly came back to check on the progress of Linux until Ubuntu appeared on the scene and it helped me to really love it. You can reach me easily by emailing GamingOnLinux directly.
See more from me

Some you may have missed, popular articles from the last month:

Playing with special beans in Bean To Me! is messing with my brain

Bazzite updated to Fedora 41 with expanded PC gaming handheld support

Sony say their PSN account requirement on PC is so you can enjoy their games 'safely'

GOG launch their Preservation Program to make games live forever with a hundred classics being 're-released'

68 comments

Page: «5/7 »

BlackBloodRum Jan 15

Link

View PC info

Supporter Plus

One thing that occurred to me, they are claiming this, according to the article.

Quoteit would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.

Okay. So then, in their terms of service, under "Using Our Services", source:
https://openai.com/policies/terms-of-use

It states, and I quote:

QuoteWhat You Cannot Do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not:

Use our Services in a way that infringes, misappropriates or violates anyone’s rights.

Modify, copy, lease, sell or distribute any of our Services.

Attempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law).

Automatically or programmatically extract data or Output (defined below).

Represent that Output was human-generated when it was not.

Interfere with or disrupt our Services, including circumvent any rate limits or restrictions or bypass any protective measures or safety mitigations we put on our Services.

Use Output to develop models that compete with OpenAI.

Don't you think, it's a little amusing they are claiming that you need to "copy" others to create "AI". While at the same time, trying to strictly forbid anyone else from doing the same?

4 Likes, Who?

Purple Library Guy Jan 15

Link

Quoting: BlackBloodRumOne thing that occurred to me, they are claiming this, according to the article.

Quoteit would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.

Okay. So then, in their terms of service, under "Using Our Services", source:
https://openai.com/policies/terms-of-use

It states, and I quote:

QuoteWhat You Cannot Do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not:

Use our Services in a way that infringes, misappropriates or violates anyone’s rights.

Modify, copy, lease, sell or distribute any of our Services.

Attempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law).

Automatically or programmatically extract data or Output (defined below).

Represent that Output was human-generated when it was not.

Interfere with or disrupt our Services, including circumvent any rate limits or restrictions or bypass any protective measures or safety mitigations we put on our Services.

Use Output to develop models that compete with OpenAI.

Don't you think, it's a little amusing they are claiming that you need to "copy" others to create "AI". While at the same time, trying to strictly forbid anyone else from doing the same?

Hee!
As usual, the real corporate rationale is "Whatever makes us money is OK, whatever might reduce our money is not!"

1 Likes, Who?

Pengling Jan 15

Link

View PC info

Quoting: Purple Library GuyHee!
As usual, the real corporate rationale is "Whatever makes us money is OK, whatever might reduce our money is not!"

And copyright is forever-and-a-day for their stuff, but if they want to use your stuff they will regardless of what the law says.

1 Likes, Who?

LoudTechie Jan 17

Link

Quoting: pleasereadthemanual
Quoting: LoudTechie
Quoting: 14I think there is an argument that reading copyrighted material is same as a human doing so and then writing their own creative work

Look up the legal standing of fan fiction. Than repeat that statement.
Using copyrighted "aspects" is enough to be considered a copyright violation.
I suggest looking up Marion Zimmer Bradley.

QuoteFor many years, Bradley actively encouraged Darkover fan fiction. She encouraged submissions from unpublished authors and reprinted some of it in commercial Darkover anthologies. This ended after a dispute with a fan over an unpublished Darkover novel of Bradley's that had similarities to one of the fan's stories. As a result, the novel remained unpublished and Bradley demanded the cessation of all Darkover fan fiction
The fan threatened to take Marion Zimmer Bradley to court for infringing on the fan's copyright. The fan holds the copyright to their own prose. The fan clearly does not hold the copyright to the characters. But should the author of the original work use prose from a fan work...well, things get dicey.

You'd also expect to face some legal trouble if you ripped some fan subs and tried to pass them off as your own translation (which has been done before).

Of note is the Organization for Transformative Works, which works to protect fan works and has this to say:

QuoteCopyright is intended to protect the creator’s right to profit from her work for a period of time to encourage creative endeavor and the widespread sharing of knowledge. But this does not preclude the right of others to respond to the original work, either with critical commentary, parody, or, we believe, transformative works.

In the United States, copyright is limited by the fair use doctrine. The legal case of Campbell v. Acuff-Rose held that transformative uses receive special consideration in fair use analysis. For those interested in reading in-depth legal analysis, more information can be found on the Fanlore Legal Analysis page.
And:

QuoteWhile case law in this area is limited, we believe that current copyright law already supports our understanding of fanfiction as fair use.

We seek to broaden knowledge of fan creators’ rights and reduce the confusion and uncertainty on both fan and pro creators’ sides about fair use as it applies to fanworks. One of our models is the documentary filmmakers’ statement of best practices in fair use, which has helped clarify the role of fair use in documentary filmmaking.

It's certainly not as cut and dry as you might think.

Thnx.
I'm happy to be proven wrong about the legal standing of fan fiction.
I like fan fiction and it having the option of being legal is a breath of fresh air.
I don't think it helps OpenAI, because I argue they "adversely affect the sale of the original", but I admit that is a matter of interpretation.

1 Likes, Who?

egocanis Jan 17

Link

Cool cool, also, I need to use copyright material for my personal use, free also. Let's re-talk about copyright. No? Ok then, same rules for everyone then... 😏

1 Likes, Who?

14 Jan 20

Link

View PC info

Supporter Plus

I think I was misunderstood. We all watch movies, play video games, and read books, right? That influences our imagination. So when you make your own creative work, it is influenced by all those things. This is undeniable. Sigh.

1 Likes, Who?

LoudTechie Jan 20

Link

Quoting: 14I think I was misunderstood. We all watch movies, play video games, and read books, right? That influences our imagination. So when you make your own creative work, it is influenced by all those things. This is undeniable. Sigh.

Many know you meant that(it's a common argument made in this kind of discussions).
Penglin argued in reaction that although we do as you described, we pay for the privilege of reading, watching and playing things before getting inspired, which OpenAI didn't do(they just downloaded the content from piracy sites).

I argued that making things that contain unlicensed copyrighted elements is always illegal including for entities and "proofed" that with the legal standing fan fiction.
Someone correctly pointed out to me that in the case of fan fiction it's sometimes not illegal(if you can show you didn't negatively affect the sale of the original).
I argued that OpenAI still wouldn't be able to claim that, because they did negatively affect the sale of the original.

Someone else(too lazy to check who) argued that, that is only a persuasive argument to the law if you treat the AI as a separate entity capable and the law only treats citizens and some kinds of companies as entities. The AI is neither of these things.
This is an effect of the context dependence of the law, which I find hard to understand, so I'm assuming you also think so.
This my attempt at explaining it.
You know how wNK3c5Z5 is a random generated string and thus useful for security, wNK3c5Z5 isn't secure, because it's a copy of the first one and thus not randomly generated.
The strings are identical and still one is secure and the other isn't. It's because of the context one is randomly generated, while the other isn't.
The law deals with these kind of differences all the time.
If I make an AI that exactly behaves and looks likes exactly like some human with a driving license the AI still can't be allowed to drive alone, because it isn't an adult citizen with a driving license and thus can't be held responsible for its deeds.
If I smash a soft wax stamp in a statue and use the stamp as a mall to make more of that statue I'm also violating copyright although I didn't have to make any of the movements the artist.

Last edited by LoudTechie on 20 January 2024 at 11:34 am UTC

0 Likes

tuubi Jan 20

Link

View PC info

Supporter Plus

Quoting: LoudTechieSomeone else(to lazy to check who) argued that, that is only a persuasive argument to the law if you treat the AI as a separate entity and the law only treats citizens and some kinds of companies as entities. The AI is neither of these things.

Yeah, this should be obvious. The AI isn't a person or a corporate entity. It's a tool, and whoever operates this tool is the one who would have to make sure they're not using copyrighted materials without license. The AI itself has no agency, which underlines the absurdity of the term as applied to these algorithms.

2 Likes, Who?

LoudTechie Jan 20

Link

Quoting: tuubi
Quoting: LoudTechieSomeone else(to lazy to check who) argued that, that is only a persuasive argument to the law if you treat the AI as a separate entity and the law only treats citizens and some kinds of companies as entities. The AI is neither of these things.
Yeah, this should be obvious. The AI isn't a person or a corporate entity. It's a tool, and whoever operates this tool is the one who would have to make sure they're not using copyrighted materials without license. The AI itself has no agency, which underlines the absurdity of the term as applied to these algorithms.

Obvious is relative.
For all my life I've trained in the way of "if it walks like a duck, if it quacks like a duck and looks like a duck it's a duck".
To me this is the most challenging aspect of the legal system I seriously interact with.
The only reason I somewhat understand it is, because I've looked in the effects of copyright in my terrain of study before AI got cool.

0 Likes

14 Jan 21

Link

View PC info

Supporter Plus

Quoting: LoudTechie
Quoting: 14I think I was misunderstood. We all watch movies, play video games, and read books, right? That influences our imagination. So when you make your own creative work, it is influenced by all those things. This is undeniable. Sigh.

Many know you meant that(it's a common argument made in this kind of discussions).
Penglin argued in reaction that although we do as you described, we pay for the privilege of reading, watching and playing things before getting inspired, which OpenAI didn't do(they just downloaded the content from piracy sites).

I argued that making things that contain unlicensed copyrighted elements is always illegal including for entities and "proofed" that with the legal standing fan fiction.
Someone correctly pointed out to me that in the case of fan fiction it's sometimes not illegal(if you can show you didn't negatively affect the sale of the original).
I argued that OpenAI still wouldn't be able to claim that, because they did negatively affect the sale of the original.

Someone else(too lazy to check who) argued that, that is only a persuasive argument to the law if you treat the AI as a separate entity capable and the law only treats citizens and some kinds of companies as entities. The AI is neither of these things.
This is an effect of the context dependence of the law, which I find hard to understand, so I'm assuming you also think so.
This my attempt at explaining it.
You know how wNK3c5Z5 is a random generated string and thus useful for security, wNK3c5Z5 isn't secure, because it's a copy of the first one and thus not randomly generated.
The strings are identical and still one is secure and the other isn't. It's because of the context one is randomly generated, while the other isn't.
The law deals with these kind of differences all the time.
If I make an AI that exactly behaves and looks likes exactly like some human with a driving license the AI still can't be allowed to drive alone, because it isn't an adult citizen with a driving license and thus can't be held responsible for its deeds.
If I smash a soft wax stamp in a statue and use the stamp as a mall to make more of that statue I'm also violating copyright although I didn't have to make any of the movements the artist.

Sorry you went through the trouble to make a large post, but appreciate it.

I don't have a strong opinion in favor of AI here. But I like trying to understand why each perspective somehow makes sense to the owner of that perspective. My choice words I think there is an argument is me saying there is a compelling-enough argument to have an argument, but I don't think I'd be the one playing representative because I haven't chosen a side.

Out of the handful of debatable elements you pointed out, in my own words, I think the most compelling argument content creators of any kind have against current AI usage in terms of copyrighted material is that AI chat bots can effectively become a proxy to same information and harm creator profits by eliminating sales of said content as well as ad traffic for "free" content. Acting as a proxy is like hijacking in a way... mm, let's say mimicking or miming. In another way, you could say it is redistribution, which is a clear topic in copyright law as far as I know. Yeah, I think if lawyers can convince judges that AI falls under redistribution of copyrighted material, that is winnable.

Last edited by 14 on 21 January 2024 at 3:05 pm UTC

1 Likes, Who?

« 1 «5 /7 »

While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations:

PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!