Diffusion

Diffusion: The Core Technique Behind AI Image Generation

A practical hub for mastering diffusion—how models denoise noise into images, which samplers to pick, and the settings that actually matter for anime, comics, and visual styles.

Updated

Nov 18, 2025

Cluster path

/anime/guides/diffusion

Graph links

12 cross-links

Tags
diffusion
stable diffusion
sdxl
sd3
samplers
cfg scale
latent diffusion
controlnet
lora
img2img
inpainting
anime
comics
visual style
family:style
Graph explorer

What is diffusion?

Diffusion models learn to remove noise from images step by step. During training, clean images are progressively noised. The model learns the reverse process: starting from noise and denoising toward a coherent image.

At inference, a sampler applies the learned denoising in discrete steps, guided by your prompt and any conditioning (e.g., pose, sketches, references). Fewer steps are faster but may miss detail; more steps add fidelity but increase time.

  • Forward process: add noise to images (training).
  • Reverse process: remove noise to generate images (inference).
  • Conditioning: text, images, and controls steer the result.

How diffusion generates images

Core mechanics:

  • Noise schedule: defines how much noise is removed per step; schedules pair with samplers for stability.
  • Samplers: algorithms that traverse the noise-to-image path (e.g., DPM++ 2M Karras, DDIM, Euler a). Some are deterministic; others are stochastic.
  • Classifier-free guidance (CFG): blends guided vs. unguided predictions to follow the prompt. Higher CFG = stronger adherence, but can cause artifacts.
  • Seed: initializes the noise tensor for reproducibility.
  • Latent diffusion: many modern models denoise in latent space for speed and memory efficiency.

Model families and when to use them

  • Stable Diffusion 1.x: fast, lightweight, huge community ecosystem; good for stylized/anime with specialized checkpoints.
  • Stable Diffusion 2.x: improved depth/photography in some cases; different VAE/NSFW filters; fewer community styles.
  • SDXL (1.0+): higher fidelity, better composition and detail; heavier VRAM requirements; strong general-purpose quality.
  • SD3 (announced 2024): larger text understanding and improved generation; check license/runtime specifics.

Pick the smallest model that meets your visual bar; use SDXL/SD3 for higher realism or complex scenes, and community anime-focused checkpoints for stylized work.

Key parameters that shape results

  • Steps: 20–35 is a good starting band. Increase for intricate scenes; decrease for speed.
  • Sampler: DPM++ 2M Karras is a robust default. Try Euler a for stylized/anime or fast previews; DDIM for smooth transitions in img2img.
  • CFG scale: 4.5–8.0 typical. Too low = off-prompt; too high = overbaked, oversaturated, or crunchy details.
  • Resolution: Generate near final aspect ratio. SDXL often shines at ~832–1024 on the long edge; upscale after.
  • Seed: Fix for reproducibility; vary for exploration.
  • Denoising strength (img2img/inpaint): 0.35–0.65 for edits; higher values drift more from the source.
  • Start: DPM++ 2M Karras, 28 steps, CFG 6.5, 1024×1024 (SDXL).
  • For anime: Euler a, 24–32 steps, CFG 5.5–7.

Conditioning and control

  • ControlNet: structural control (pose, depth, lineart, soft edge). Great for consistency, pose-matching, and panels.
  • LoRA: lightweight adapters to add styles, characters, or skills without swapping the base model.
  • IP-Adapter / T2I-Adapter: style or subject guidance from reference images.
  • Textual Inversion/Embeddings: custom tokens for niche styles or subjects.

Combine lightly: one or two controls often beat stacking many. Balance CFG and steps when adding strong controls.

Workflows: text-to-image, img2img, and inpainting

  • Text-to-Image: craft prompt + negative prompt, pick sampler/steps/CFG, generate multiple seeds, curate.
  • Image-to-Image: start with a rough sketch, 3D render, or photo; set denoise 0.4–0.6; preserve composition while changing style.
  • Inpainting: mask regions to fix faces, hands, or add objects; use slightly more steps in masked areas.
  • Outpainting: extend canvases for full scenes or comic panels; guide with ControlNet lineart or depth.
  • Loop: t2i draft → img2img refine → inpaint fixes → upscale.

Diffusion for anime

  • Use anime-trained checkpoints or LoRAs for clean linework and flat shading.
  • Sampler: Euler a or DPM++ SDE Karras often yields crisp lines.
  • Control: Lineart/SoftEdge ControlNet to align outlines; IP-Adapter for style reference.
  • Prompts: emphasize line quality, cel shading, color palette, character design tags.
  • Post: light denoise, color-grade, optional halftone for print.

Diffusion for comics

  • Structure first: rough panel layout → generate panels → refine per-panel with consistent prompts and seeds.
  • Consistency: character LoRA + reference images; pose via ControlNet OpenPose or Depth.
  • Inking: generate clean lineart, then flat colors; try lineart-specific ControlNet.
  • Text/lettering: add in vector tools for clarity; diffusion struggles with text fidelity.
  • Keep a character bible: fixed tokens, LoRAs, palettes, and seeds.

Quality, speed, and cost trade-offs

  • Steps vs. resolution: prefer moderate steps and correct aspect ratio, then upscale.
  • Batch strategy: generate small previews (lower steps), pick winners, upscale/refine selectively.
  • Precision/VRAM: use half-precision where supported; enable memory optimizations.
  • Sampler tuning: switch to faster samplers for ideation; slower for finals.

Common problems and fast fixes

  • Hands/anatomy: add pose/depth ControlNet; reduce CFG; use targeted LoRA; inpaint hands separately.
  • Mushy details: increase steps slightly; try DPM++ 2M Karras; lower denoise in img2img; sharpen post.
  • Overbaked contrast/saturation: lower CFG; try a non-ancestral sampler; soften negative prompts.
  • Off-prompt drift: raise CFG slightly; increase steps; simplify prompt; ensure correct base model.
  • Text on image: generate blank sign/area and add text in a design tool.
  • When in doubt: lower CFG, change sampler, modestly raise steps.

Prompt templates you can adapt

Anime character close-up:

  • Prompt: "(anime, cel shading), clean lineart, expressive eyes, medium shot, soft rim light, detailed hair, [color palette]"
  • Negative: "blurry, extra fingers, lowres, artifacts, text, watermark"

Comic panel action:

  • Prompt: "comic ink style, dynamic pose, motion lines, dramatic lighting, high contrast, halftone shading, [setting], [character tokens]"
  • Negative: "muddy shading, low detail, incorrect anatomy, text"

Stylized portrait (SDXL):

  • Prompt: "high-detail portrait, cinematic lighting, shallow depth of field, intricate textures, [style reference]"
  • Negative: "overexposed, harsh contrast, extra limbs, artifacts"

Topic summary

Condensed context generated from the KG.

Diffusion is a denoising-based generative technique used by models like Stable Diffusion to iteratively transform random noise into images, guided by text, images, and control signals.