Diffusion Model
A diffusion model is an AI image generator that works by progressively removing noise from a random starting image until a coherent picture emerges.
Diffusion models are the dominant architecture behind modern text-to-image AI systems like Stable Diffusion, Midjourney, DALL-E, and our own in-house models. They work in reverse: during training, they learn to add noise to real images in small steps; at generation time, they start with pure noise and iteratively remove it, guided by the text prompt, until a recognizable image emerges.
Because the process is iterative (typically 20–50 denoising steps), it produces images with consistent global structure and fine detail simultaneously — something earlier approaches (like GANs) struggled with.
Different diffusion models have different strengths depending on what they were trained on. Models fine-tuned on illustration produce illustrative output; models trained on photography produce photorealistic output. The "taste" of a model is largely a function of its training corpus and fine-tuning choices.
Related terms
Ready to make something?
Describe what you want, or pick a style. We’ll preview it in your room before you order.
Start creating