How to Use Stable Diffusion: From Setup to Stunning Images (2026)

What Is Stable Diffusion?

Stable Diffusion is an open-source AI image generation model created by Stability AI. Unlike Midjourney (cloud-only, subscription) or DALL-E (OpenAI's proprietary model), Stable Diffusion lets you download and run the model yourself on your computer or use cloud platforms.

This means:

You own your images (no corporate usage rights shenanigans)
Complete privacy (images never leave your machine if you run locally)
Total creative control (adjust every parameter: sampler, steps, CFG scale, seeds)
Zero subscription fees if you run locally
Endless customization via LoRA, ControlNet, and model fine-tuning

Stable Diffusion 3 (released early 2025) and SDXL 1.0 (multi-purpose model) represent the current standard. In 2026, the ecosystem has matured significantly with robust tools like Automatic1111 and ComfyUI making setup accessible even to non-technical users.

Installation: Local Setup (Recommended for Control)

Option 1: Automatic1111 Web UI (Easiest)

Automatic1111 is the most user-friendly interface for running Stable Diffusion locally.

System Requirements

GPU: NVIDIA (8GB+ VRAM), AMD, or Apple Silicon (M1/M2/M3)
CPU: Any modern processor (but much slower than GPU)
Storage: 20-30GB (for model + extras)
RAM: 8GB minimum, 16GB+ recommended

NVIDIA RTX 3060 or 4060: ~$250-400, sufficient for high-quality image generation.

Installation (Windows/Mac/Linux)

Install Python 3.10+
- Download from python.org
- Add Python to PATH during installation

Clone Automatic1111 repository

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

Download a model
- Visit Hugging Face → search "Stable Diffusion"
- Download sd-v1-5-pruned.safetensors or sd_xl_base_1.0.safetensors (15-20GB)
- Place in models/Stable-diffusion/ folder
Run the web UI
- Windows: Double-click webui-user.bat
- Mac/Linux: bash webui.sh
- Open localhost:7860 in your browser

You're now running Stable Diffusion locally. No cloud, no subscription, no limits.

Option 2: ComfyUI (Advanced Control)

ComfyUI is a node-based interface for Stable Diffusion. More complex setup, but unmatched power.

Best for:

Advanced workflows (img2img, inpainting, upscaling pipelines)
Fine-tuning sampling parameters per step
Professional production workflows
Combining multiple models in one generation

Installation:

Clone: git clone https://github.com/comfyanonymous/ComfyUI.git
Install Python dependencies: pip install -r requirements.txt
Download models (same as Automatic1111)
Run: python main.py
Open localhost:8188

ComfyUI's learning curve is steeper, but it's the industry standard for professional AI image work.

Cloud-Based Alternatives (No Setup Required)

If local installation seems intimidating:

Stability AI API (Official)

Website: stability.ai
Models: SDXL 1.0, SD 3.5
Pricing: $0.02-0.04 per image (pay-as-you-go)
Limits: Free tier limited, pro tier $10+/month
Best for: Integration into apps, no local hardware needed

Free Cloud Platforms

DreamStudio (free credits, SDXL)
Replicate (pay-per-use, $0.01-0.08 per image)
RunwayML (free tier, but slow)

Trade-off: Cloud platforms are convenient but cost money and upload images to their servers.

Choosing Your Model

Stable Diffusion has multiple models. Pick based on your use case:

Model	Release	Best For	Download Size
SDXL 1.0	Sept 2023	General quality, portraits, scenes	13.5 GB
Stable Diffusion 3	2025	Text-in-images, complex prompts	7 GB
Stable Diffusion 2.1	2023	Budget option, basic generation	5 GB
SD Turbo	2024	Speed (4 steps), real-time generation	2 GB
Fine-tuned models	Various	Specific styles (anime, 3D, photorealism)	4-10 GB

Recommendation for beginners: Start with SDXL 1.0. It's the most versatile and produces stunning results.

Pro tip: You can use multiple models. Switch between them in Automatic1111's dropdown menu.

Image Generation Basics: The Prompting Formula

The best Stable Diffusion prompt has this structure:

[Subject] [Descriptors] [Style] [Quality Terms] [Negative Prompt]

Example: A Photography-Quality Landscape

Good prompt: "A serene mountain lake at sunset, crystal clear water, golden hour lighting, ultra-detailed, professional photography, 4K, Ansel Adams style"

Prompt breakdown:

Subject: "mountain lake at sunset"
Descriptors: "crystal clear water, golden hour lighting"
Style: "Ansel Adams style, professional photography"
Quality: "ultra-detailed, 4K"

Negative Prompts (What NOT to Generate)

Negative prompts tell Stable Diffusion what to exclude:

Negative: blurry, low quality, distorted, ugly, poorly rendered, worst quality, text, watermark

Add these if you want to avoid common AI artifacts.

Best Prompt Practices

Do	Don't
Be specific: "Renaissance oil painting of a merchant"	Vague: "old art"
Use style references: "by Greg Rutkowski"	Assume no style context
Describe lighting: "soft volumetric lighting"	Ignore lighting entirely
Chain descriptors: "detailed, intricate, sharp focus"	Single adjective: "good"
Use weight syntax: `(subject:1.5)` for emphasis	Treat all words equally
Describe composition: "symmetrical, centered"	Ignore framing
Add modifiers: "professional", "high quality"	Assume the model knows what's good

Core Parameters: The Settings That Matter

Every Stable Diffusion generation uses these settings. Tweak them to refine results:

Parameter	Range	What It Does	Recommendation
Steps	1-100	Number of diffusion steps (higher = better quality, slower)	25-40 for SDXL, 50-60 for SD 2.1
CFG Scale	1-20	How much the model follows your prompt (higher = stricter)	7-12 (higher = more prompt-adherent)
Sampler	Various	Noise reduction algorithm (DPM++, Euler, DDIM)	DPM++ 2M Karras (fast + quality)
Seed	0-4B	Random number for reproducibility (0 = random)	Note good seeds, reuse them
Width/Height	Any	Image dimensions	512x768 (portrait), 768x512 (landscape)
Denoising	0.1-1.0	Strength for img2img (lower = keeps original)	0.7-0.9 for variations

Understanding CFG Scale

CFG 7: "Close to prompt, but creative freedom" CFG 12: "Strict adherence to prompt" CFG 18+: "Might ignore common sense for prompt words"

Too high CFG can produce ugly, over-literal results. 7-12 is the sweet spot.

Advanced Techniques

img2img: Transform Existing Images

Instead of generating from scratch, start with an image and transform it:

Upload an image
Set Denoising to 0.5-0.8 (lower = preserve original more)
Change the prompt: "Make this a watercolor painting"
Generate

This is how you create variations, apply art styles, or reimagine compositions.

Inpainting: Edit Specific Regions

Want to change just one part of an image? Use inpainting:

Generate or upload an image
Mask (paint over) the area you want to change
Set new prompt: "Replace this with a robot"
Generate

Inpainting fixes mistakes or adds new elements without regenerating everything.

LoRA (Low-Rank Adaptation)

LoRA files are small add-ons (~200MB) that change the model's style or inject new concepts.

Popular LoRAs:

Anime styles (for SD to generate anime)
Celebrity faces (for photorealism)
3D rendering (for Blender-like output)
Specific artists ("in the style of Studio Ghibli")

How to use (Automatic1111):

Download LoRA from Civitai.com or Hugging Face
Place in models/Lora/
In prompt, reference it: <lora:anime-style:0.8> (0.8 = strength)

Example prompt with LoRA:

A girl in a café, <lora:anime-xl:0.9>, detailed anime style, soft lighting, trending on Pixiv

ControlNet: Precise Composition Control

ControlNet guides image generation using edge maps, poses, or depth information.

Use cases:

Generate characters in specific poses (use pose map)
Follow a sketch (use edge detection)
Match depth/perspective (use depth map)
Generate consistent hand poses

Example:

Upload a reference image with a specific pose
Use ControlNet's "Pose" mode
Stable Diffusion generates a new image matching that pose
Adjust prompt for style, clothing, etc.

ControlNet is in ComfyUI by default; Automatic1111 requires installation.

Hardware & Performance

GPU Memory Requirements

Model	Min VRAM	Comfortable	Fast
SD 2.1	4GB	6GB	8GB+
SDXL	6GB	8GB	12GB+
SD 3	6GB	8GB	12GB+
Multiple LoRA	+1-2GB per LoRA	+0.5GB per LoRA	No penalty

Don't have a GPU?

Use CPU mode (much slower, 5-10 min per image)
Use cloud platforms (Replicate, DreamStudio)
Reduce image resolution to 512x512

Stable Diffusion vs Competitors

How does Stable Diffusion stack up?

Feature	Stable Diffusion	Midjourney	DALL-E 3
Price	Free (local) or $0.02-0.04 per image	$10-120/month	$0.04 per image
Setup	Local install required	None (Discord)	None (web)
Image quality	Excellent (especially SDXL)	Excellent	Excellent
Speed	Depends on hardware	Fast	Fast
Customization	Extreme (LoRA, ControlNet)	Limited	Limited
Privacy	Full (if local)	None (Discord logs)	None (OpenAI logs)
Text-in-image	Good (SD3)	Excellent	Excellent
Consistency	Moderate (with seed control)	High (across variations)	High

When to use each:

Stable Diffusion: Maximum control, privacy-sensitive work, low cost at scale
Midjourney: Discord-native workflow, fastest setup, aesthetic consistency
DALL-E 3: OpenAI integration, seamless ChatGPT usage, premium quality

Real-World Workflow Example

Goal: Generate concept art for a sci-fi character for a video game.

Step 1: Initial Generation (Automatic1111)

Prompt: A cyborg warrior, sleek metallic armor, neon blue accents, 
cinematic lighting, trending on ArtStation, by H.R. Giger
Negative: cartoon, blurry, disfigured
Steps: 40
CFG Scale: 11
Sampler: DPM++ 2M Karras

Generate 4 variations.

Step 2: Refine with Inpainting

The face looks odd → Use inpainting to fix it
New prompt: "detailed cyborg face, glowing eyes, battle-worn"

Step 3: Apply Art Style via LoRA

Add <lora:detailed-hyperrealistic:0.7> to prompt
Regenerate for photorealistic version

Step 4: Scale with Upscaler

Use Real-ESRGAN or Upscayl to 4x image size
Final result: 4K concept art

Total time: 15 minutes. Total cost: $0 (if local), $0.10 (if cloud).

Troubleshooting

Q: My NVIDIA GPU isn't being detected. A: Install CUDA and cuDNN. On Windows, use webui-user.bat with GPU argument. On Mac, use --use-mps flag.

Q: Generated images have distorted hands or faces. A: Use negative prompt: bad hands, extra fingers, disfigured face. Or use ControlNet to guide hand/face generation.

Q: Prompt isn't being followed (CFG too high?). A: Lower CFG scale from 15 to 10. Increase steps from 25 to 40. Rewrite prompt more clearly.

Q: Runs very slowly on my CPU. A: This is normal. GPU strongly recommended. If CPU-only, reduce image size to 256x256 and steps to 10.

Q: Model download failed (slow internet). A: Use BitTorrent or resume the download. Stability AI's Hugging Face page has torrent links.

Q: Which LoRA is best for photorealism? A: Try detailed-hyperrealistic, photo-realistic, or dreamshaper from Civitai. Blend 2-3 for best results.