本文源自于以下内容做的整理。
Generative AI learning path
Image Generation Model Families
Variational Auto Encoders (VAEs)
Encode images to a compressed size, then decode back to the original size, while learning the distribution of the data itself.
Generative Adversarial Models(GANs)
pit two neural networks against each other. One neural network the generator create images and the other neural network the discriminator(鉴别器) predicts if the image is real or fake. Over time the discriminator get better and better at distinguishing between real and fake and the generator gets better and better at creating real looking fakes.
Auto Regressive Models
Generative images by treating an image as a sequence of pixels. And the modern approach with auto regressive models actually draws much of its inspiration from how LLM’s models handle text.
Diffusion(扩散) Models
Diffusion models draw their inspiration from physics specially thermodynamics.
What is it?
The essential idea is to systematically and slowly destroy structure in a data distribution through an iterative(重复的) forward diffusion process. This is going to be adding noise iteratively to an image. We then learn a reverse diffusion process that restore structure in the data, yielding a highly flexible and tractable generative model of the data.
In other words, we can add noise to an image iteratively and then we can train a model that learns how to de-noise an image, thus generating novel images.
Denoising Diffusion Probabilistic Models(DDPM)
The goal is that we want to have this model learn to de-noise to remove noise.
DDPM Training

We have the initial image x on the left, and we sample at a time step to create a noisy image, we then send that through our noisy model with the goal of predicting the noise, so the output of the model is the predicted noise.
We can see what is the difference between the model’s predicted noise and the actual noise we added.
Now the model is trained similar to most machine learning models, and over time after seeing enough examples this model gets very very good at removing noise from images.
DDPM Generation

We can just start with pure absolute noise and send that noise through our model that is trained. We then take the output of the predicted noise and subtract it from the initial noise, and if we do that over and over and over again, we end up with a generated image.
Another way to think about it is that the model is able to learn the real data distribution of images that it’s seen and then sample from that learn distribution to create new novel images.
Text-to-image Models
combine the power of diffusion models with LLMs.
扫描二维码,分享此文章