Home/Guides/How to Use Stable Diffusion: From Setup to Stunning Images (2026)

How to Use Stable Diffusion: From Setup to Stunning Images (2026)

Updated April 2026·10 min read

What Is Stable Diffusion?

Stable Diffusion is an open-source AI image generation model created by Stability AI. Unlike Midjourney (cloud-only, subscription) or DALL-E (OpenAI's proprietary model), Stable Diffusion lets you download and run the model yourself on your computer or use cloud platforms.

This means:

  • You own your images (no corporate usage rights shenanigans)
  • Complete privacy (images never leave your machine if you run locally)
  • Total creative control (adjust every parameter: sampler, steps, CFG scale, seeds)
  • Zero subscription fees if you run locally
  • Endless customization via LoRA, ControlNet, and model fine-tuning

Stable Diffusion 3 (released early 2025) and SDXL 1.0 (multi-purpose model) represent the current standard. In 2026, the ecosystem has matured significantly with robust tools like Automatic1111 and ComfyUI making setup accessible even to non-technical users.

Installation: Local Setup (Recommended for Control)

Option 1: Automatic1111 Web UI (Easiest)

Automatic1111 is the most user-friendly interface for running Stable Diffusion locally.

System Requirements

  • GPU: NVIDIA (8GB+ VRAM), AMD, or Apple Silicon (M1/M2/M3)
  • CPU: Any modern processor (but much slower than GPU)
  • Storage: 20-30GB (for model + extras)
  • RAM: 8GB minimum, 16GB+ recommended

NVIDIA RTX 3060 or 4060: ~$250-400, sufficient for high-quality image generation.

Installation (Windows/Mac/Linux)

  1. Install Python 3.10+

    • Download from python.org
    • Add Python to PATH during installation
  2. Clone Automatic1111 repository

    git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
    cd stable-diffusion-webui
    
  3. Download a model

    • Visit Hugging Face → search "Stable Diffusion"
    • Download sd-v1-5-pruned.safetensors or sd_xl_base_1.0.safetensors (15-20GB)
    • Place in models/Stable-diffusion/ folder
  4. Run the web UI

    • Windows: Double-click webui-user.bat
    • Mac/Linux: bash webui.sh
    • Open localhost:7860 in your browser

You're now running Stable Diffusion locally. No cloud, no subscription, no limits.

Option 2: ComfyUI (Advanced Control)

ComfyUI is a node-based interface for Stable Diffusion. More complex setup, but unmatched power.

Best for:

  • Advanced workflows (img2img, inpainting, upscaling pipelines)
  • Fine-tuning sampling parameters per step
  • Professional production workflows
  • Combining multiple models in one generation

Installation:

  1. Clone: git clone https://github.com/comfyanonymous/ComfyUI.git
  2. Install Python dependencies: pip install -r requirements.txt
  3. Download models (same as Automatic1111)
  4. Run: python main.py
  5. Open localhost:8188

ComfyUI's learning curve is steeper, but it's the industry standard for professional AI image work.

Cloud-Based Alternatives (No Setup Required)

If local installation seems intimidating:

Stability AI API (Official)

  • Website: stability.ai
  • Models: SDXL 1.0, SD 3.5
  • Pricing: $0.02-0.04 per image (pay-as-you-go)
  • Limits: Free tier limited, pro tier $10+/month
  • Best for: Integration into apps, no local hardware needed

Free Cloud Platforms

  • DreamStudio (free credits, SDXL)
  • Replicate (pay-per-use, $0.01-0.08 per image)
  • RunwayML (free tier, but slow)

Trade-off: Cloud platforms are convenient but cost money and upload images to their servers.

Choosing Your Model

Stable Diffusion has multiple models. Pick based on your use case:

ModelReleaseBest ForDownload Size
SDXL 1.0Sept 2023General quality, portraits, scenes13.5 GB
Stable Diffusion 32025Text-in-images, complex prompts7 GB
Stable Diffusion 2.12023Budget option, basic generation5 GB
SD Turbo2024Speed (4 steps), real-time generation2 GB
Fine-tuned modelsVariousSpecific styles (anime, 3D, photorealism)4-10 GB

Recommendation for beginners: Start with SDXL 1.0. It's the most versatile and produces stunning results.

Pro tip: You can use multiple models. Switch between them in Automatic1111's dropdown menu.

Image Generation Basics: The Prompting Formula

The best Stable Diffusion prompt has this structure:

[Subject] [Descriptors] [Style] [Quality Terms] [Negative Prompt]

Example: A Photography-Quality Landscape

Good prompt: "A serene mountain lake at sunset, crystal clear water, golden hour lighting, ultra-detailed, professional photography, 4K, Ansel Adams style"

Prompt breakdown:

  • Subject: "mountain lake at sunset"
  • Descriptors: "crystal clear water, golden hour lighting"
  • Style: "Ansel Adams style, professional photography"
  • Quality: "ultra-detailed, 4K"

Negative Prompts (What NOT to Generate)

Negative prompts tell Stable Diffusion what to exclude:

Negative: blurry, low quality, distorted, ugly, poorly rendered, worst quality, text, watermark

Add these if you want to avoid common AI artifacts.

Best Prompt Practices

DoDon't
Be specific: "Renaissance oil painting of a merchant"Vague: "old art"
Use style references: "by Greg Rutkowski"Assume no style context
Describe lighting: "soft volumetric lighting"Ignore lighting entirely
Chain descriptors: "detailed, intricate, sharp focus"Single adjective: "good"
Use weight syntax: (subject:1.5) for emphasisTreat all words equally
Describe composition: "symmetrical, centered"Ignore framing
Add modifiers: "professional", "high quality"Assume the model knows what's good

Core Parameters: The Settings That Matter

Every Stable Diffusion generation uses these settings. Tweak them to refine results:

ParameterRangeWhat It DoesRecommendation
Steps1-100Number of diffusion steps (higher = better quality, slower)25-40 for SDXL, 50-60 for SD 2.1
CFG Scale1-20How much the model follows your prompt (higher = stricter)7-12 (higher = more prompt-adherent)
SamplerVariousNoise reduction algorithm (DPM++, Euler, DDIM)DPM++ 2M Karras (fast + quality)
Seed0-4BRandom number for reproducibility (0 = random)Note good seeds, reuse them
Width/HeightAnyImage dimensions512x768 (portrait), 768x512 (landscape)
Denoising0.1-1.0Strength for img2img (lower = keeps original)0.7-0.9 for variations

Understanding CFG Scale

CFG 7: "Close to prompt, but creative freedom" CFG 12: "Strict adherence to prompt" CFG 18+: "Might ignore common sense for prompt words"

Too high CFG can produce ugly, over-literal results. 7-12 is the sweet spot.

Advanced Techniques

img2img: Transform Existing Images

Instead of generating from scratch, start with an image and transform it:

  1. Upload an image
  2. Set Denoising to 0.5-0.8 (lower = preserve original more)
  3. Change the prompt: "Make this a watercolor painting"
  4. Generate

This is how you create variations, apply art styles, or reimagine compositions.

Inpainting: Edit Specific Regions

Want to change just one part of an image? Use inpainting:

  1. Generate or upload an image
  2. Mask (paint over) the area you want to change
  3. Set new prompt: "Replace this with a robot"
  4. Generate

Inpainting fixes mistakes or adds new elements without regenerating everything.

LoRA (Low-Rank Adaptation)

LoRA files are small add-ons (~200MB) that change the model's style or inject new concepts.

Popular LoRAs:

  • Anime styles (for SD to generate anime)
  • Celebrity faces (for photorealism)
  • 3D rendering (for Blender-like output)
  • Specific artists ("in the style of Studio Ghibli")

How to use (Automatic1111):

  1. Download LoRA from Civitai.com or Hugging Face
  2. Place in models/Lora/
  3. In prompt, reference it: <lora:anime-style:0.8> (0.8 = strength)

Example prompt with LoRA:

A girl in a café, <lora:anime-xl:0.9>, detailed anime style, soft lighting, trending on Pixiv

ControlNet: Precise Composition Control

ControlNet guides image generation using edge maps, poses, or depth information.

Use cases:

  • Generate characters in specific poses (use pose map)
  • Follow a sketch (use edge detection)
  • Match depth/perspective (use depth map)
  • Generate consistent hand poses

Example:

  1. Upload a reference image with a specific pose
  2. Use ControlNet's "Pose" mode
  3. Stable Diffusion generates a new image matching that pose
  4. Adjust prompt for style, clothing, etc.

ControlNet is in ComfyUI by default; Automatic1111 requires installation.

Hardware & Performance

GPU Memory Requirements

ModelMin VRAMComfortableFast
SD 2.14GB6GB8GB+
SDXL6GB8GB12GB+
SD 36GB8GB12GB+
Multiple LoRA+1-2GB per LoRA+0.5GB per LoRANo penalty

Don't have a GPU?

  • Use CPU mode (much slower, 5-10 min per image)
  • Use cloud platforms (Replicate, DreamStudio)
  • Reduce image resolution to 512x512

Stable Diffusion vs Competitors

How does Stable Diffusion stack up?

FeatureStable DiffusionMidjourneyDALL-E 3
PriceFree (local) or $0.02-0.04 per image$10-120/month$0.04 per image
SetupLocal install requiredNone (Discord)None (web)
Image qualityExcellent (especially SDXL)ExcellentExcellent
SpeedDepends on hardwareFastFast
CustomizationExtreme (LoRA, ControlNet)LimitedLimited
PrivacyFull (if local)None (Discord logs)None (OpenAI logs)
Text-in-imageGood (SD3)ExcellentExcellent
ConsistencyModerate (with seed control)High (across variations)High

When to use each:

  • Stable Diffusion: Maximum control, privacy-sensitive work, low cost at scale
  • Midjourney: Discord-native workflow, fastest setup, aesthetic consistency
  • DALL-E 3: OpenAI integration, seamless ChatGPT usage, premium quality

Real-World Workflow Example

Goal: Generate concept art for a sci-fi character for a video game.

Step 1: Initial Generation (Automatic1111)

Prompt: A cyborg warrior, sleek metallic armor, neon blue accents, 
cinematic lighting, trending on ArtStation, by H.R. Giger
Negative: cartoon, blurry, disfigured
Steps: 40
CFG Scale: 11
Sampler: DPM++ 2M Karras

Generate 4 variations.

Step 2: Refine with Inpainting

  • The face looks odd → Use inpainting to fix it
  • New prompt: "detailed cyborg face, glowing eyes, battle-worn"

Step 3: Apply Art Style via LoRA

  • Add <lora:detailed-hyperrealistic:0.7> to prompt
  • Regenerate for photorealistic version

Step 4: Scale with Upscaler

  • Use Real-ESRGAN or Upscayl to 4x image size
  • Final result: 4K concept art

Total time: 15 minutes. Total cost: $0 (if local), $0.10 (if cloud).

Troubleshooting

Q: My NVIDIA GPU isn't being detected. A: Install CUDA and cuDNN. On Windows, use webui-user.bat with GPU argument. On Mac, use --use-mps flag.

Q: Generated images have distorted hands or faces. A: Use negative prompt: bad hands, extra fingers, disfigured face. Or use ControlNet to guide hand/face generation.

Q: Prompt isn't being followed (CFG too high?). A: Lower CFG scale from 15 to 10. Increase steps from 25 to 40. Rewrite prompt more clearly.

Q: Runs very slowly on my CPU. A: This is normal. GPU strongly recommended. If CPU-only, reduce image size to 256x256 and steps to 10.

Q: Model download failed (slow internet). A: Use BitTorrent or resume the download. Stability AI's Hugging Face page has torrent links.

Q: Which LoRA is best for photorealism? A: Try detailed-hyperrealistic, photo-realistic, or dreamshaper from Civitai. Blend 2-3 for best results.

Related Reading