Fair Use: Training Generative AI

peter.suber's bookmarks 2023-02-18

Summary:

"Given all this background on fair use, how do we apply these principles to the use of copyrighted works as AI training data, such as in the Stable Diffusion/Midjourney case? To answer this question, we must first look at the facts of the case. Dr Andrés Guadamuz has a couple excellent blog posts that explain the technology involved in this case and that begin to explain why this should constitute fair use. Stability AI used a dataset called LAION to train Stable Diffusion, but this dataset does not actually contain images. Instead, it contains over 5 billion weblinks to image-text pairs. Diffusion models like Stable Diffusion and Midjourney take  these inputs, add “noise” to them, corrupting them, and then train neural networks to remove the corruption. The models then use another tool, called CLIP, to understand the relationship between the text and the associated images. Finally, they use what are called “latent spaces” to cluster together similar data. With these latent spaces, the models contain representations of what images are supposed to look like, based on the training data, and not copies of the images in their training data. Then, user focused applications collect text prompts from users to generate new images based on the training data, the language model, and the latent space."  

Link:

https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/

From feeds:

[IOI] Open Infrastructure Tracking Project » Items tagged with oa.ai in Open Access Tracking Project (OATP)
Open Access Tracking Project (OATP) » peter.suber's bookmarks
Gudgeon and gist » Creative Commons » Commons News

Tags:

oa.training oa.new oa.fair_use oa.copyright oa.ai use sharing intelligence generative fair better artificial ai oa.creative_commons

Authors:

Stephen Wolfson

Date tagged:

02/18/2023, 01:32

Date published:

02/17/2023, 11:02