Fair Use: Training Generative AI

peter.suber's bookmarks 2023-02-18

Summary:

"Given all this background on fair use, how do we apply these principles to the use of copyrighted works as AI training data, such as in the Stable Diffusion/Midjourney case? To answer this question, we must first look at the facts of the case. Dr Andrés Guadamuz has a couple excellent blog posts that explain the technology involved in this case and that begin to explain why this should constitute fair use. Stability AI used a dataset called LAION to train Stable Diffusion, but this dataset does not actually contain images. Instead, it contains over 5 billion weblinks to image-text pairs. Diffusion models like Stable Diffusion and Midjourney take these inputs, add “noise” to them, corrupting them, and then train neural networks to remove the corruption. The models then use another tool, called CLIP, to understand the relationship between the text and the associated images. Finally, they use what are called “latent spaces” to cluster together similar data. With these latent spaces, the models contain representations of what images are supposed to look like, based on the training data, and not copies of the images in their training data. Then, user focused applications collect text prompts from users to generate new images based on the training data, the language model, and the latent space."

Authors:

Stephen Wolfson

Date tagged:

02/18/2023, 01:32

Date published:

02/17/2023, 11:02

Fair Use: Training Generative AI

peter.suber's bookmarks 2023-02-18

Summary:

Link:

From feeds:

Tags:

Authors:

Date tagged:

Date published: