I, ChatGPT

Benjamin Hall

@topspin it's fairly settled law now at least in the US that AI cannot get copyright, only the user of the tool. And manufacturer liability for products that aren't inherently flawed is generally disfavored, at least when used as directed. By analogy, all responsibility for use of a tool lies on the user. Because only humans can have responsibility.

sockpuppet7

@Benjamin-Hall IMO lawmakers should create rules to ensure copyright for artists that use AI tools. there are anecdotes of people starting with a generated image, putting significant work over it and not getting copyright protection. the amount of human effort required is too vague, and compare that with photography where you need zero effort to get copyright

I don't have a clue what the laws on my own country say about it. I don't see any mention of this drama on our local news

topspin

@Benjamin-Hall said in I, ChatGPT:

@topspin it's fairly settled law now at least in the US that AI cannot get copyright, only the user of the tool. And manufacturer liability for products that aren't inherently flawed is generally disfavored, at least when used as directed. By analogy, all responsibility for use of a tool lies on the user. Because only humans can have responsibility.

Of course the tool isn’t actually to blame, thus the parenthetical. To blame are Microsoft or whoever who tell you “you can use this, we are guaranteeing it’s not producing infringing material”.

The original claims were that these tools don’t contain (obviously false) or produce (also false) copyrighted material. The remaining question is then if they only produce infringing content when you explicitly ask for it (you can only blame yourself), or if they can produce infringing content you don’t explicitly ask for and have no chance to check.

Benjamin Hall

@topspin that's a claim of manufacturer liability. Which unless you can show that the tool when used as directed, always causes foreseeable harm, is not generally allowed.

To me, this is blaming Photoshop for allowing deep fakes or copyright infringement. The tool is a neutral tool with non infringing uses.

And they don't contain copyrighted material. They don't contain any material in a meaningful way. You can't go through the model files and identify any such thing, which is what would be required. Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

boomzilla

@topspin said in I, ChatGPT:

The original claims were that these tools don’t contain (obviously false) or produce (also false) copyrighted material.

What does it even mean to "contain" copyrighted material? That just seems like the wrong word to use here.

The remaining question is then if they only produce infringing content when you explicitly ask for it (you can only blame yourself), or if they can produce infringing content you don’t explicitly ask for and have no chance to check.

The case generated was explicitly asking for it. When you start talking about something similar...well, opinions will vary and there's no good answer, period. For instance:

Althea Legaspi / Dec 13, 2018

‘Blurred Lines’ Copyright Suit Against Robin Thicke, Pharrell Ends in $5M Judgment

Marvin Gaye family's copyright infringement lawsuit against Robin Thicke and Pharrell Williams ends in $5 million judgment.

boomzilla

It's been a while, so for kicks I fired up my local stable diffusion, apparently now calling itself Easy Diffusion v3.0.7 and asked for "mona lisa":

I'd say it looks like an amateur copy.

topspin

@Benjamin-Hall said in I, ChatGPT:

And they don't contain copyrighted material.

Of course they do.

They don't contain any material in a meaningful way.

That’s nonsense of the highest order. If they don’t contain anything, how can they reproduce something. And what’s “meaningful” is subject to interpretation. It’s meaningful enough for the model to be able to copy the original.

You can't go through the model files and identify any such thing, which is what would be required.

No. You just don’t understand the encoding. If I ship a recording of a copyrighted track of music, where do you draw the line? Does a vinyl not “contain” the song? It’s just plastic, not sound. What about a digital PCM or an MP3? That’s a complicated enough encoding. If I compress it with gzip and encrypt it with AES, you’ll say you can’t find it because it’s not meaningful to you, but it’s still there. All of these contain the song.

Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

Of course our brains contain information about that material. We just don’t copy and ship them, and our laws don’t care about that because it’d be impossible to handle.

It’s literally in the word “copyright”: copying the original.

@boomzilla said in I, ChatGPT:

@topspin said in I, ChatGPT:

The original claims were that these tools don’t contain (obviously false) or produce (also false) copyrighted material.

What does it even mean to "contain" copyrighted material?

The information necessary to reproduce it.

dkf

@Benjamin-Hall said in I, ChatGPT:

@topspin that's a claim of manufacturer liability. Which unless you can show that the tool when used as directed, always causes foreseeable harm, is not generally allowed.

To me, this is blaming Photoshop for allowing deep fakes or copyright infringement. The tool is a neutral tool with non infringing uses.

And they don't contain copyrighted material. They don't contain any material in a meaningful way. You can't go through the model files and identify any such thing, which is what would be required. Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

Does the information reside within the model or not? It clearly must do, if it is able to provide a reproduction of it. (Do I understand the encoding? No, but then I'm not exactly au fait with the precise definition of the JPEG format either.)

Should the user bear total responsibility for getting an infringing output? That's ridiculous, especially if they didn't explicitly ask for one. Expecting end users to act as guardians of copyright is just crazy; not gonna ever happen, no matter what the law says.

Should there be infringing material in the model in the first place? I'd argue that it shouldn't — that that is the problem infringement — but that very much puts me on a collision course with MS, Google and OpenAI, who have a lot more money and lawyers than I do...

Benjamin Hall

@dkf said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

@topspin that's a claim of manufacturer liability. Which unless you can show that the tool when used as directed, always causes foreseeable harm, is not generally allowed.

To me, this is blaming Photoshop for allowing deep fakes or copyright infringement. The tool is a neutral tool with non infringing uses.

And they don't contain copyrighted material. They don't contain any material in a meaningful way. You can't go through the model files and identify any such thing, which is what would be required. Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

Does the information reside within the model or not? It clearly must do, if it is able to provide a reproduction of it. (Do I understand the encoding? No, but then I'm not exactly au fait with the precise definition of the JPEG format either.)

It doesn't, not under any sane sense of encoding. That's just a complete misunderstanding of how LLMs and other transformer-based AI systems operate. One obvious tell is that the models are only a few GB in size but are trained on multiple petabytes of data. And images are already heavily encoded--you're not getting 3-5 orders of magnitude compression on those. Also, the fact that the images you get back on a direct query are not the actual image, but modifications of it, representing a melange of multiple source images plus some hallucinations says that it can't just be stored, even encoded.

Should the user bear total responsibility for getting an infringing output? That's ridiculous, especially if they didn't explicitly ask for one. Expecting end users to act as guardians of copyright is just crazy; not gonna ever happen, no matter what the law says.

The user is the only one who can bear any responsibility. Every single person, everywhere, has the responsibility to not infringe copyright. Does that suck? Yes. Should copyright be made less totalizing? Yes. Is
it still the law and the only possible path? Yes.

The manufacturer has the responsibility not to infringe on existing copyright while creating it. They bear no responsibility (absent a specific law saying otherwise) in how it is then used. No more than Adobe does when people infringe on copyright in Photoshop.

Should there be infringing material in the model in the first place? I'd argue that it shouldn't — that that is the problem infringement — but that very much puts me on a collision course with MS, Google and OpenAI, who have a lot more money and lawyers than I do...

I have other feelings about how they're trained, but that's a completely separate conversation. Google (et al) cannot provide a blanket license to the end user--that's not how copyright works. Each user must have a valid license of their own. And ALL works are copyrighted unless they come from a source not able to own copyrights. Definitionally. While the original Mona Lisa may be public domain, reproductions of it...aren't. You can't take someone else's photo of the original and make prints. So the only way for anything to get into an AI is for someone to make a reproduction to feed it in. And that reproduction has copyright inherent in it.

Zerosquare

@Benjamin-Hall said in I, ChatGPT:

One obvious tell is that the models are only a few GB in size but are trained on multiple petabytes of data.

And a MIDI file is much smaller than an uncompressed audio recording of a song. Still, if it's a transcription of a copyrighted song, good luck claiming copyright doesn't apply.

Benjamin Hall

@Zerosquare said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

One obvious tell is that the models are only a few GB in size but are trained on multiple petabytes of data.

And a MIDI file is much smaller than a WAV recording. Still, if it's a transcription of a copyrighted song, good luck claiming copyright doesn't apply.

Yeah, that just doesn't hold any water. We're talking about multiple orders of magnitude above the best compression algorithms ever invented. The information is transformed, in a non-deterministic fashion, not stored. And that very much matters from a legal and moral perspective. Because otherwise you'd have to say that human brains infringe every copyright ever. There's no carve out for that either.

Zerosquare

@Benjamin-Hall said in I, ChatGPT:

The information is transformed, in a non-deterministic fashion, not stored.

And then the lawyer from the other side just shows the picture to the judge and goes "if it's not stored, then how is this possible?":

At that point, if you think you'll be able to successfully argue about compression ratios, you're... optimistic.

@Benjamin-Hall said in I, ChatGPT:

Because otherwise you'd have to say that human brains infringe every copyright ever.

Shhh! Don't give ideas to lawyers. Copyright law is bad enough as it is.

Benjamin Hall

@Zerosquare said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

The information is transformed, in a non-deterministic fashion, not stored.

And then the lawyer from the other side just shows the picture to the judge and goes "if it's not stored, then how is this possible?":

At that point, if you think you'll be able to successfully argue about compression ratios, you're... optimistic.

@Benjamin-Hall said in I, ChatGPT:

Because otherwise you'd have to say that human brains infringe every copyright ever.

Shhh! Don't give ideas to lawyers. Copyright law is already too restrictive.

Sure. Lawyers can BS all sorts of things. That doesn't make it meaningful or accurate!

Zerosquare

@Benjamin-Hall said in I, ChatGPT:

That doesn't make it meaningful or accurate!

Yes, but unfortunately we have to live with the consequences anyways.

Benjamin Hall

@Zerosquare said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

That doesn't make it meaningful or accurate!

Yes, but unfortunately we have to live with the consequences anyways.

Sure. Once a judge thus rules. So far, they haven't. And I can see no theory of law in which the tool or the manufacturer will be blamed--the manufacturer because they have attorneys of their own, the tool because it's well-established that tools themselves cannot infringe, only people, because only people (including corporate persons) can bear legal liability.

I have no doubt that a judge could rule that using AI to create such a reproduction would be infringement. But that's on the user, not the tool.

boomzilla

@dkf said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

@topspin that's a claim of manufacturer liability. Which unless you can show that the tool when used as directed, always causes foreseeable harm, is not generally allowed.

To me, this is blaming Photoshop for allowing deep fakes or copyright infringement. The tool is a neutral tool with non infringing uses.

And they don't contain copyrighted material. They don't contain any material in a meaningful way. You can't go through the model files and identify any such thing, which is what would be required. Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

Does the information reside within the model or not? It clearly must do, if it is able to provide a reproduction of it. (Do I understand the encoding? No, but then I'm not exactly au fait with the precise definition of the JPEG format either.)

Should the user bear total responsibility for getting an infringing output? That's ridiculous, especially if they didn't explicitly ask for one. Expecting end users to act as guardians of copyright is just crazy; not gonna ever happen, no matter what the law says.

But...they're the ones generating the images and (presumably) distributing them. Who else would you want to hold accountable for that?

Should there be infringing material in the model in the first place? I'd argue that it shouldn't — that that is the problem infringement — but that very much puts me on a collision course with MS, Google and OpenAI, who have a lot more money and lawyers than I do...

I'd think the only way to get there is to not have the model to begin with. Remember, we have a argument that reproducing something like the original means it "contains" the original. How can you prevent combinations of non-infringing stuff that infringes? What's the payoff, anyways? Basically nothing, for a massive downside for what to me seem like nonsensical reasons.

Arantor

@Benjamin-Hall said in I, ChatGPT:

Sure. Lawyers can BS all sorts of things. That doesn't make it meaningful or accurate!

Thing is, this isn't even a lawyer argument. Take this to a layperson who isn't that deep into the understanding and ask them, chances are they'll react much the same way.

Here's the thing: instead of smugly dismissing the 'idiots' for being luddites, the tech bros would do infinitely better to actually try and bring people along for the journey. They might even convince some people that this whole endeavour isn't the heat death of human creativity as we know it.

The problem is that this is already encroaching on human creative endeavour; people whose role is creative are already being displaced to make way for AI - Wizards of the Coast for example is hiring for an artist whose job appears to be doing touch-ups on AI images from the job description.

The bottom end of Amazon Kindle's self publishing was already a dumpster fire but AI has made that infinitely worse.

And we can expect these technologies to improve in speed and quality. Which means we'll see people find new uses for it.

Which means we're going to see more people displaced. It's already kinda crappy working as a creative as it is - long hours, shit pay, and this is about to get worse.

I'm relatively certain that AI isn't coming for my job just yet, I'm hoping to have paid off my mortgage and semi-retire to something less about to be cannibalised from within.

The real problem isn't right now though. It's in 5-10 years when all of today's juniors were pushed out, the seniors are leaving and there's no-one to replace them.

Now, you can brand me a doom-monger if you like (and I know some of you already do), or a luddite - but I've tried AI, and while I'm not afraid of it yet it's going to reach a tipping point, and sooner rather than later, where the results are consistently good enough, cheap enough, that the market will absolutely start replacing people with AI.

And to the people who think we shouldn't 'slow progress for the sake of idiots', your turn will be coming. There will be a time - probably in our lifetimes - where a number of us here will be displaced by AI.

Where people today complain about people using frameworks to help them along, the next one will be 'you should be using AI to do your job for you'. Because doing anything else won't be financially viable at any scale. And then it'll come for you by stealth.

I think this is genuinely the first time I've started to worry about humanity's technical stupidity.

Watson

Something that I reported on just over a year ago (hence the dodgy fidelity):

https://what.thedailywtf.com/post/2034640

https://what.thedailywtf.com/post/2034919

Didn't ask for that, even asked for something that wasn't that (negative weighting). Those were the only two times that particular (recognisable) image came up, but they were within a day of each other.

sockpuppet7

@Zerosquare said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

The information is transformed, in a non-deterministic fashion, not stored.

And then the lawyer from the other side just shows the picture to the judge and goes "if it's not stored, then how is this possible?":

At that point, if you think you'll be able to successfully argue about compression ratios, you're... optimistic.

@Benjamin-Hall said in I, ChatGPT:

Because otherwise you'd have to say that human brains infringe every copyright ever.

Shhh! Don't give ideas to lawyers. Copyright law is bad enough as it is.

the same copyright arguments could be made for any image search. does it matter of it fetches in real time? it's cached on the search engine servers too, are google and bing outlawed?

should we really go out of our way to protect images that are already spread enough for the AI to memorize it?

if you were talking about an anonymous artist having his niche image perfectly duplicated I would have..., no I still wouldn't give a shit on removing it

topspin

@sockpuppet7 said in I, ChatGPT:

if you were talking about an anonymous artist having his niche image perfectly duplicated I would have..., no I still wouldn't give a shit on removing it

That's a wholly different argument.
You're saying that you don't care that copyright is being violated. I agree, copyright is way too excessive. But that doesn't change the fact it's violating copyright. It's just that ordinary people get the book thrown at them, but big corporations doing it on a massive scale now claim AI is magic and must thus be exempt.

topspin

@Benjamin-Hall said in I, ChatGPT:

@dkf said in I, ChatGPT:

@Benjamin-Hall said in I, ChatGPT:

@topspin that's a claim of manufacturer liability. Which unless you can show that the tool when used as directed, always causes foreseeable harm, is not generally allowed.

To me, this is blaming Photoshop for allowing deep fakes or copyright infringement. The tool is a neutral tool with non infringing uses.

And they don't contain copyrighted material. They don't contain any material in a meaningful way. You can't go through the model files and identify any such thing, which is what would be required. Just because something can recreate a similar version does not mean that it contains that original in a meaningful sense. Otherwise all of our brains are constant copyright infringement devices.

Does the information reside within the model or not? It clearly must do, if it is able to provide a reproduction of it. (Do I understand the encoding? No, but then I'm not exactly au fait with the precise definition of the JPEG format either.)

It doesn't, not under any sane sense of encoding. That's just a complete misunderstanding of how LLMs and other transformer-based AI systems operate.

On the contrary. Your argument is basically that you don't find the information in the model because you don't understand the encoding. While the information is obviously there, in a very complex way, otherwise it would have to appear out of thin air.

One obvious tell is that the models are only a few GB in size but are trained on multiple petabytes of data. And images are already heavily encoded--you're not getting 3-5 orders of magnitude compression on those. Also, the fact that the images you get back on a direct query are not the actual image, but modifications of it, representing a melange of multiple source images plus some hallucinations says that it can't just be stored, even encoded.

None of that is relevant. First of, while some reproductions are heavily altered, some others are extremely faithful, to the point where you'd similarly have to argue that JPEG compression artifacts also make your copy exempt from copyright. Second, the modified images aren't exempt, because derivative works also fall under copyright (fair use or not), and these are obvious derivatives. And last, the model size argument doesn't hold water. Even ignoring the argument at ignorance of how the encoding works, the best you can say is that it doesn't remember all the copyrighted images it gets trained with. But it still remembers some of them, as has been well demonstrated. And that's enough.

The fact is that it can reproduce these images, so it logically contains the information to do so. There's no getting around that. It's the smoking gun that proves the model data contains the information.

Should the user bear total responsibility for getting an infringing output? That's ridiculous, especially if they didn't explicitly ask for one. Expecting end users to act as guardians of copyright is just crazy; not gonna ever happen, no matter what the law says.

The user is the only one who can bear any responsibility. Every single person, everywhere, has the responsibility to not infringe copyright. Does that suck? Yes. Should copyright be made less totalizing? Yes. Is
it still the law and the only possible path? Yes.

The manufacturer has the responsibility not to infringe on existing copyright while creating it. They bear no responsibility (absent a specific law saying otherwise) in how it is then used.

But the manufacturers are the ones who infringe copyright, both when they train it with copyrighted images and when they ship it. They then claim that this infringement was just ephemeral and the resulting model doesn't contain the copyrighted works, so what they did was fine. But the infringement happened while training and is also happening when they distribute the model.

No more than Adobe does when people infringe on copyright in Photoshop.

The "tool" angle is not comparable. The "tool" would be just the neural network architecture, without the data. The data is the part with the copyrighted information.
The comparable thing would be if Photoshop contained a library of images that you can paste. Images that they 1) did not license, but 2) tell you that you can use them without problems, it's not copyright infringement. In that case, Adobe would commit copyright infringement first when distributing the tool to you, and second would be an accomplice in you unknowingly comitting copyright infringement, since you assumed Adobe was correct in telling you the images aren't copyrighted.

There's nothing magical about neural networks. If I program a stupid old-school expert system with a bit of an ELIZA talking interface, then have it respond with (slightly procedurally altered) versions of 1000 of my favorite images and soundtracks (all copyrighted which I didn't license), then nobody would say "oh, but the program doesn't contain the data, it's AI, it's just reproducing it."

Should there be infringing material in the model in the first place? I'd argue that it shouldn't — that that is the problem infringement — but that very much puts me on a collision course with MS, Google and OpenAI, who have a lot more money and lawyers than I do...

I have other feelings about how they're trained, but that's a completely separate conversation. Google (et al) cannot provide a blanket license to the end user--that's not how copyright works. Each user must have a valid license of their own.

But it's exactly what I was talking about. The infringement happens already while training, and then again when distributing it to the user (if they distribute the model).
When the user then goes ahead and knowingly produces something that infringes copyright, that's the third infringement. But there's nothing saying that they might not use the tool in a way that is completely innocuous, from their point of view, and still produce infringing content. And that's going to be a real problem for the user if they have no possibility to check. The argument now is that can't happen, but I don't see any reason why not. Google, MS, et al. previously argued everything they're doing is fine and the things don't produce anything infringing, which they do, so there's no reason to believe this one either.

@Benjamin-Hall said in I, ChatGPT:

The information is transformed, in a non-deterministic fashion, not stored. And that very much matters from a legal and moral perspective. Because otherwise you'd have to say that human brains infringe every copyright ever. There's no carve out for that either.

The non-determinism is applied after the fact, you can have it produce deterministic output, if you want. Software is by definition deterministic. "Transformed" is just another word for derivative work.
And of course human brains contain the necessary information to "infringe on every copyright ever", at least of the things they remember. Otherwise they couldn't remember. But that's not how the legal system works (or could work), and thus infringement only legally happens when you use your brain to reproduce the work, not when you memorize it. If you're put into a locked cell without outside contact, but pen and paper, then reproduce from memory copyrighted material, you have legally infringed copyright. The information didn't appear out of thin air, so it was stored in your brain. It's probably fair use and not morally wrong, but that's not the point.

Benjamin Hall

@topspin said in I, ChatGPT:

The fact is that it can reproduce these images, so it logically contains the information to do so. There's no getting around that. It's the smoking gun that proves the model data contains the information.

Having the info to reproduce something is not, in the laws eyes, the same as having a reproduction of the item. A book on painting contains all the info needed as well, even without any depictions. As does a detailed verbal description. Do those violate copyright? No.

On the other hand, actually using that information to reproduce the image does, absolutely, violate copyright. But that's in the use, not in the tool.

topspin

@Benjamin-Hall said in I, ChatGPT:

@topspin said in I, ChatGPT:

The fact is that it can reproduce these images, so it logically contains the information to do so. There's no getting around that. It's the smoking gun that proves the model data contains the information.

Having the info to reproduce something is not, in the laws eyes, the same as having a reproduction of the item. A book on painting contains all the info needed as well, even without any depictions. As does a detailed verbal description. Do those violate copyright? No.

On the other hand, actually using that information to reproduce the image does, absolutely, violate copyright. But that's in the use, not in the tool.

In that sense a jpeg of the mona lisa is not a copy.

Watson

It certainly remembers which features are present and how often they appear in conjunction with other features, and how often those clusters of features are associated with character strings like Mona Lisa. Certain character strings end up correlated with very large feature clusters because in the training data they pretty much always appear together. Try getting Mona Lisa to appear anywhere other than in the painting:

[Great. I didn't ask for that and now I'm going to have Coca-Cola on my back.]

It's like there's a spectrum of image compression formats; you can throw more and more information away during compression depending on how much you're prepared to reconstruct based on what information remains. You might have a format with a hardcoded Huffman tree to serve as a default encoding for pixel data, you might tolerate a certain amount of roundoff during quantising, or you might have a 6GB diffusion-trained model to lookup while guessing which features should be present where.

I haven't looked, but I'll be surprised if there aren't commercial packages for generating realistic texture assets. Trained from photo libraries of the real thing. Of course, any granite cliff-face is going to look like any other granite cliff-face, which is why texture generation works, but the computer doesn't know the difference between a granite cliff-face and the Mona Lisa; it's doing the same thing in both cases.

Tsaukpaetra

@boomzilla said in I, ChatGPT:

I'd say it looks like an amateur copy.

I tossed it through a few models.

Is it failing to be transformative? Who nose...

Tsaukpaetra

@Watson said in I, ChatGPT:

Try getting Mona Lisa to appear anywhere other than in the painting:

I got her out of it!

But I agree, eliminating the background painting is a chore. There will almost always be mountains and a lake somewheree.

DogsB

Craig Hale / Dec 19, 2023 / Pro

Microsoft's new Windows AI Studio developer tool... makes you install Linux to use it

Windows AI Studio runs on Ubuntu 18.04 or later

Sheesh, not even ms dev want to get this shit on themselves.

Applied Mediocrity

@DogsB The absolute state of the industry come 2024 - cobbling together gigabytes of assorted open-sores shit to run a small bunch of some other shit.

BernieTheBernie

@Tsaukpaetra How did you get that funny cat in the bottom right image? A truly amazing pussy!

sockpuppet7

@topspin your brain has a copy of it, proof is that you can recognize it. this is proof that this interpretation of copyright is flawed, and we need new laws if this is where we're going

topspin

@sockpuppet7 said in I, ChatGPT:

@topspin your brain has a copy of it, proof is that you can recognize it. this is proof that this interpretation of copyright is flawed, and we need new laws if this is where we're going

I'm fine with changing copyright laws, but your brain is quite obviously treated differently by law than a computer is.

There's no reason to apply copyright law to old-school computer programs and/or data in one way and then hand-wave that AI is somehow "different" and not apply it the same way. They're the same thing. There's nothing magic about AI, the only reason to treat it differently is that big corporations want to be able to do something that you're not allowed to: consume everything there is out there while pretending copyright doesn't apply.
If we don't create new laws, then current copyright laws mean training AI with copyrighted material is infringement, and using it to create derivative works is too. Same as without the "AI" label.

If you don't want to infringe, then don't train on stuff that isn't in the public domain, simple as that.

sockpuppet7

@topspin treating differently no, any software should be able to be trained with publicly available images on some kind of fair use

considering I think copyright should be limited to something like 5 years, I don't think I can ever agree with these limits

It's not like we have a say anyway. Maybe the next thing after stable diffusion will need the prompts on Chinese.

sockpuppet7

This post is deleted!

sockpuppet7

@topspin normal training take a small amount of information from each picture that would fit on fair use

mona Lisa has so many copies spread that this little from each image that it gets the entire picture, that is an exception, and I didn't even see an example of this with something that isn't on public domain

sockpuppet7

sockpuppet7

sockpuppet7

dkf

@sockpuppet7 said in I, ChatGPT:

mona Lisa has so many copies spread that this little from each image that it gets the entire picture, that is an exception, and I didn't even see an example of this with something that isn't on public domain

Mind you, Mona Lisa is also long long out of even the most Disney-inspired copyright lifespan. It really is now an image that belongs to all humanity (as well as a painting in the Louvre).

Tsaukpaetra

@BernieTheBernie said in I, ChatGPT:

@Tsaukpaetra How did you get that funny cat in the bottom right image? A truly amazing pussy!

It was the words "kitten" I think.

Prompt: mona lisa , playing with kitten, ww2, hotel room
Negative Prompt:
Seed: 110491744
Stable Diffusion model: haydensYiffmix_v14Tuned
Clip Skip: False
ControlNet model: None
VAE model:
Sampler: euler_a
Width: 512
Height: 512
Steps: 25
Guidance Scale: 7.5
LoRA model: None
Embedding models: None
Seamless Tiling: None
Use Face Correction: None
Use Upscaling: None

This one was a bit more murder-y
mona_lisa___playing_with_kitten__ww2_war_room__dri_DBRDXC90.jpeg

This one played with a different kind of pussy.

Tsaukpaetra

@sockpuppet7

How about some horror?

boomzilla

@topspin said in I, ChatGPT:

@sockpuppet7 said in I, ChatGPT:

@topspin your brain has a copy of it, proof is that you can recognize it. this is proof that this interpretation of copyright is flawed, and we need new laws if this is where we're going

I'm fine with changing copyright laws, but your brain is quite obviously treated differently by law than a computer is.
There's no reason to apply copyright law to old-school computer programs and/or data in one way and then hand-wave that AI is somehow "different" and not apply it the same way. They're the same thing. There's nothing magic about AI, the only reason to treat it differently is that big corporations want to be able to do something that you're not allowed to: consume everything there is out there while pretending copyright doesn't apply.
If we don't create new laws, then current copyright laws mean training AI with copyrighted material is infringement, and using it to create derivative works is too. Same as without the "AI" label.

Seems like at worst it should fall under fair use to me.

Zecc

@Tsaukpaetra said in I, ChatGPT:

This one played with a different kind of pussy.

I wonder what Lisa del Giocondo would have thought about this.

Tsaukpaetra

@Zecc said in I, ChatGPT:

@Tsaukpaetra said in I, ChatGPT:

This one played with a different kind of pussy.

I wonder what Lisa del Giocondo would have thought about this.

"that....! How did they know! Nobody ever saw us!"

izzion

Data slurping application has potential to expose slurped data. News at 11.

Dec 22, 2023 / Technology

How Strangers Got My Email Address From ChatGPT’s Model

Researchers at Indiana University used ChatGPT’s model to extract contact information for more than 30 New York Times employees.

kazitor

That’s just overfitting, because all the individual components of the email address were non-deterministically transformed so often throughout the training set

boomzilla

locusmag / Dec 18, 2023

Cory Doctorow: What Kind of Bubble is AI?

Of course AI is a bubble. It has all the hallmarks of a classic tech bubble. Pick up a rental car at SFO and drive in either direction on the 101 – north to San Francisco, south to Palo Alto – and …

Arantor

BernieTheBernie

What about training a chat bot to jailbrake a different chat bot?
It worx.

Researchers use AI chatbots against themselves to 'jailbreak' each other

Computer scientists from Nanyang Technological University, Singapore (NTU Singapore) have managed to compromise multiple artificial intelligence (AI) chatbots, including ChatGPT, Google Bard and Microsoft ...

topspin

boomzilla

Daniel Stenberg / Jan 2

I, ChatGPT

‘Blurred Lines’ Copyright Suit Against Robin Thicke, Pharrell Ends in $5M Judgment

Microsoft's new Windows AI Studio developer tool... makes you install Linux to use it

How Strangers Got My Email Address From ChatGPT’s Model

Cory Doctorow: What Kind of Bubble is AI?

Researchers use AI chatbots against themselves to 'jailbreak' each other

The I in LLM stands for intelligence