How to Use Stable Diffusion: From Setup to Stunning Images (2026)
What Is Stable Diffusion?
Stable Diffusion is an open-source AI image generation model created by Stability AI. Unlike Midjourney (cloud-only, subscription) or DALL-E (OpenAI's proprietary model), Stable Diffusion lets you download and run the model yourself on your computer or use cloud platforms.
This means:
- You own your images (no corporate usage rights shenanigans)
- Complete privacy (images never leave your machine if you run locally)
- Total creative control (adjust every parameter: sampler, steps, CFG scale, seeds)
- Zero subscription fees if you run locally
- Endless customization via LoRA, ControlNet, and model fine-tuning
Stable Diffusion 3 (released early 2025) and SDXL 1.0 (multi-purpose model) represent the current standard. In 2026, the ecosystem has matured significantly with robust tools like Automatic1111 and ComfyUI making setup accessible even to non-technical users.
Installation: Local Setup (Recommended for Control)
Option 1: Automatic1111 Web UI (Easiest)
Automatic1111 is the most user-friendly interface for running Stable Diffusion locally.
System Requirements
- GPU: NVIDIA (8GB+ VRAM), AMD, or Apple Silicon (M1/M2/M3)
- CPU: Any modern processor (but much slower than GPU)
- Storage: 20-30GB (for model + extras)
- RAM: 8GB minimum, 16GB+ recommended
NVIDIA RTX 3060 or 4060: ~$250-400, sufficient for high-quality image generation.
Installation (Windows/Mac/Linux)
-
Install Python 3.10+
- Download from python.org
- Add Python to PATH during installation
-
Clone Automatic1111 repository
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git cd stable-diffusion-webui -
Download a model
- Visit Hugging Face → search "Stable Diffusion"
- Download
sd-v1-5-pruned.safetensorsorsd_xl_base_1.0.safetensors(15-20GB) - Place in
models/Stable-diffusion/folder
-
Run the web UI
- Windows: Double-click
webui-user.bat - Mac/Linux:
bash webui.sh - Open localhost:7860 in your browser
- Windows: Double-click
You're now running Stable Diffusion locally. No cloud, no subscription, no limits.
Option 2: ComfyUI (Advanced Control)
ComfyUI is a node-based interface for Stable Diffusion. More complex setup, but unmatched power.
Best for:
- Advanced workflows (img2img, inpainting, upscaling pipelines)
- Fine-tuning sampling parameters per step
- Professional production workflows
- Combining multiple models in one generation
Installation:
- Clone:
git clone https://github.com/comfyanonymous/ComfyUI.git - Install Python dependencies:
pip install -r requirements.txt - Download models (same as Automatic1111)
- Run:
python main.py - Open localhost:8188
ComfyUI's learning curve is steeper, but it's the industry standard for professional AI image work.
Cloud-Based Alternatives (No Setup Required)
If local installation seems intimidating:
Stability AI API (Official)
- Website: stability.ai
- Models: SDXL 1.0, SD 3.5
- Pricing: $0.02-0.04 per image (pay-as-you-go)
- Limits: Free tier limited, pro tier $10+/month
- Best for: Integration into apps, no local hardware needed
Free Cloud Platforms
- DreamStudio (free credits, SDXL)
- Replicate (pay-per-use, $0.01-0.08 per image)
- RunwayML (free tier, but slow)
Trade-off: Cloud platforms are convenient but cost money and upload images to their servers.
Choosing Your Model
Stable Diffusion has multiple models. Pick based on your use case:
| Model | Release | Best For | Download Size |
|---|---|---|---|
| SDXL 1.0 | Sept 2023 | General quality, portraits, scenes | 13.5 GB |
| Stable Diffusion 3 | 2025 | Text-in-images, complex prompts | 7 GB |
| Stable Diffusion 2.1 | 2023 | Budget option, basic generation | 5 GB |
| SD Turbo | 2024 | Speed (4 steps), real-time generation | 2 GB |
| Fine-tuned models | Various | Specific styles (anime, 3D, photorealism) | 4-10 GB |
Recommendation for beginners: Start with SDXL 1.0. It's the most versatile and produces stunning results.
Pro tip: You can use multiple models. Switch between them in Automatic1111's dropdown menu.
Image Generation Basics: The Prompting Formula
The best Stable Diffusion prompt has this structure:
[Subject] [Descriptors] [Style] [Quality Terms] [Negative Prompt]
Example: A Photography-Quality Landscape
Good prompt: "A serene mountain lake at sunset, crystal clear water, golden hour lighting, ultra-detailed, professional photography, 4K, Ansel Adams style"
Prompt breakdown:
- Subject: "mountain lake at sunset"
- Descriptors: "crystal clear water, golden hour lighting"
- Style: "Ansel Adams style, professional photography"
- Quality: "ultra-detailed, 4K"
Negative Prompts (What NOT to Generate)
Negative prompts tell Stable Diffusion what to exclude:
Negative: blurry, low quality, distorted, ugly, poorly rendered, worst quality, text, watermark
Add these if you want to avoid common AI artifacts.
Best Prompt Practices
| Do | Don't |
|---|---|
| Be specific: "Renaissance oil painting of a merchant" | Vague: "old art" |
| Use style references: "by Greg Rutkowski" | Assume no style context |
| Describe lighting: "soft volumetric lighting" | Ignore lighting entirely |
| Chain descriptors: "detailed, intricate, sharp focus" | Single adjective: "good" |
Use weight syntax: (subject:1.5) for emphasis | Treat all words equally |
| Describe composition: "symmetrical, centered" | Ignore framing |
| Add modifiers: "professional", "high quality" | Assume the model knows what's good |
Core Parameters: The Settings That Matter
Every Stable Diffusion generation uses these settings. Tweak them to refine results:
| Parameter | Range | What It Does | Recommendation |
|---|---|---|---|
| Steps | 1-100 | Number of diffusion steps (higher = better quality, slower) | 25-40 for SDXL, 50-60 for SD 2.1 |
| CFG Scale | 1-20 | How much the model follows your prompt (higher = stricter) | 7-12 (higher = more prompt-adherent) |
| Sampler | Various | Noise reduction algorithm (DPM++, Euler, DDIM) | DPM++ 2M Karras (fast + quality) |
| Seed | 0-4B | Random number for reproducibility (0 = random) | Note good seeds, reuse them |
| Width/Height | Any | Image dimensions | 512x768 (portrait), 768x512 (landscape) |
| Denoising | 0.1-1.0 | Strength for img2img (lower = keeps original) | 0.7-0.9 for variations |
Understanding CFG Scale
CFG 7: "Close to prompt, but creative freedom" CFG 12: "Strict adherence to prompt" CFG 18+: "Might ignore common sense for prompt words"
Too high CFG can produce ugly, over-literal results. 7-12 is the sweet spot.
Advanced Techniques
img2img: Transform Existing Images
Instead of generating from scratch, start with an image and transform it:
- Upload an image
- Set Denoising to 0.5-0.8 (lower = preserve original more)
- Change the prompt: "Make this a watercolor painting"
- Generate
This is how you create variations, apply art styles, or reimagine compositions.
Inpainting: Edit Specific Regions
Want to change just one part of an image? Use inpainting:
- Generate or upload an image
- Mask (paint over) the area you want to change
- Set new prompt: "Replace this with a robot"
- Generate
Inpainting fixes mistakes or adds new elements without regenerating everything.
LoRA (Low-Rank Adaptation)
LoRA files are small add-ons (~200MB) that change the model's style or inject new concepts.
Popular LoRAs:
- Anime styles (for SD to generate anime)
- Celebrity faces (for photorealism)
- 3D rendering (for Blender-like output)
- Specific artists ("in the style of Studio Ghibli")
How to use (Automatic1111):
- Download LoRA from Civitai.com or Hugging Face
- Place in
models/Lora/ - In prompt, reference it:
<lora:anime-style:0.8>(0.8 = strength)
Example prompt with LoRA:
A girl in a café, <lora:anime-xl:0.9>, detailed anime style, soft lighting, trending on Pixiv
ControlNet: Precise Composition Control
ControlNet guides image generation using edge maps, poses, or depth information.
Use cases:
- Generate characters in specific poses (use pose map)
- Follow a sketch (use edge detection)
- Match depth/perspective (use depth map)
- Generate consistent hand poses
Example:
- Upload a reference image with a specific pose
- Use ControlNet's "Pose" mode
- Stable Diffusion generates a new image matching that pose
- Adjust prompt for style, clothing, etc.
ControlNet is in ComfyUI by default; Automatic1111 requires installation.
Hardware & Performance
GPU Memory Requirements
| Model | Min VRAM | Comfortable | Fast |
|---|---|---|---|
| SD 2.1 | 4GB | 6GB | 8GB+ |
| SDXL | 6GB | 8GB | 12GB+ |
| SD 3 | 6GB | 8GB | 12GB+ |
| Multiple LoRA | +1-2GB per LoRA | +0.5GB per LoRA | No penalty |
Don't have a GPU?
- Use CPU mode (much slower, 5-10 min per image)
- Use cloud platforms (Replicate, DreamStudio)
- Reduce image resolution to 512x512
Stable Diffusion vs Competitors
How does Stable Diffusion stack up?
| Feature | Stable Diffusion | Midjourney | DALL-E 3 |
|---|---|---|---|
| Price | Free (local) or $0.02-0.04 per image | $10-120/month | $0.04 per image |
| Setup | Local install required | None (Discord) | None (web) |
| Image quality | Excellent (especially SDXL) | Excellent | Excellent |
| Speed | Depends on hardware | Fast | Fast |
| Customization | Extreme (LoRA, ControlNet) | Limited | Limited |
| Privacy | Full (if local) | None (Discord logs) | None (OpenAI logs) |
| Text-in-image | Good (SD3) | Excellent | Excellent |
| Consistency | Moderate (with seed control) | High (across variations) | High |
When to use each:
- Stable Diffusion: Maximum control, privacy-sensitive work, low cost at scale
- Midjourney: Discord-native workflow, fastest setup, aesthetic consistency
- DALL-E 3: OpenAI integration, seamless ChatGPT usage, premium quality
Real-World Workflow Example
Goal: Generate concept art for a sci-fi character for a video game.
Step 1: Initial Generation (Automatic1111)
Prompt: A cyborg warrior, sleek metallic armor, neon blue accents,
cinematic lighting, trending on ArtStation, by H.R. Giger
Negative: cartoon, blurry, disfigured
Steps: 40
CFG Scale: 11
Sampler: DPM++ 2M Karras
Generate 4 variations.
Step 2: Refine with Inpainting
- The face looks odd → Use inpainting to fix it
- New prompt: "detailed cyborg face, glowing eyes, battle-worn"
Step 3: Apply Art Style via LoRA
- Add
<lora:detailed-hyperrealistic:0.7>to prompt - Regenerate for photorealistic version
Step 4: Scale with Upscaler
- Use Real-ESRGAN or Upscayl to 4x image size
- Final result: 4K concept art
Total time: 15 minutes. Total cost: $0 (if local), $0.10 (if cloud).
Troubleshooting
Q: My NVIDIA GPU isn't being detected.
A: Install CUDA and cuDNN. On Windows, use webui-user.bat with GPU argument. On Mac, use --use-mps flag.
Q: Generated images have distorted hands or faces.
A: Use negative prompt: bad hands, extra fingers, disfigured face. Or use ControlNet to guide hand/face generation.
Q: Prompt isn't being followed (CFG too high?). A: Lower CFG scale from 15 to 10. Increase steps from 25 to 40. Rewrite prompt more clearly.
Q: Runs very slowly on my CPU. A: This is normal. GPU strongly recommended. If CPU-only, reduce image size to 256x256 and steps to 10.
Q: Model download failed (slow internet). A: Use BitTorrent or resume the download. Stability AI's Hugging Face page has torrent links.
Q: Which LoRA is best for photorealism?
A: Try detailed-hyperrealistic, photo-realistic, or dreamshaper from Civitai. Blend 2-3 for best results.