Diffusion 모델 - 배움 에이아이

학습 목표

Diffusion 모델의 전방(Forward) / 역방(Reverse) 과정을 이해한다
DDPM의 노이즈 예측 학습 방식을 설명할 수 있다
노이즈 스케줄의 역할과 종류를 안다
Stable Diffusion의 Latent Diffusion 아이디어를 이해한다

왜 중요한가

확산 모델(Diffusion Model)은 현재 이미지 생성의 최고 품질을 달성하는 패러다임입니다. GAN 대비 학습이 안정적이고, VAE 대비 선명한 이미지를 생성합니다. DALL-E, Stable Diffusion, Midjourney 등 텍스트-이미지 생성의 핵심 기술입니다.

핵심 아이디어

데이터에 점진적으로 노이즈를 추가하는 **전방 과정(Forward Process)**과, 노이즈에서 데이터를 복원하는 **역방 과정(Reverse Process)**을 학습합니다.

전방 과정 (Forward Process)

원본 데이터

x_0

에

T

단계에 걸쳐 가우시안 노이즈를 점진적으로 추가합니다.

q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} \, x_{t-1}, \, \beta_t I)

$\beta_t$ : 노이즈 스케줄 ( $0 < \beta_1 < \beta_2 < \cdots < \beta_T < 1$ )

임의 시점으로 직접 이동 (중요한 성질)

\alpha_t = 1 - \beta_t

\bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s

로 정의하면:

q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t} \, x_0, \, (1 - \bar{\alpha}_t) I)

이 성질 덕분에 중간 단계를 거치지 않고

x_0

에서 임의의

x_t

를 직접 샘플링할 수 있어, 학습이 효율적입니다.

x_t = \sqrt{\bar{\alpha}_t} \, x_0 + \sqrt{1 - \bar{\alpha}_t} \, \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

import torch

def forward_diffusion(x_0, t, sqrt_alphas_cumprod, sqrt_one_minus_alphas_cumprod):
    """전방 과정: x_0에서 x_t를 직접 샘플링

    Args:
        x_0: 원본 이미지 (배치, C, H, W)
        t: 시간 단계 (배치,)
        sqrt_alphas_cumprod: √ᾱ_t 값들
        sqrt_one_minus_alphas_cumprod: √(1-ᾱ_t) 값들
    """
    noise = torch.randn_like(x_0)

    # 시간 단계에 해당하는 계수 추출
    sqrt_alpha = sqrt_alphas_cumprod[t].view(-1, 1, 1, 1)
    sqrt_one_minus_alpha = sqrt_one_minus_alphas_cumprod[t].view(-1, 1, 1, 1)

    # x_t = √ᾱ_t * x_0 + √(1-ᾱ_t) * ε
    x_t = sqrt_alpha * x_0 + sqrt_one_minus_alpha * noise
    return x_t, noise

역방 과정 (Reverse Process)

신경망

\epsilon_\theta

가

x_t

에서 추가된 노이즈 $\epsilon$ 을 예측하여 데이터를 복원합니다.

p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \sigma_t^2 I)

학습 목표

\mathcal{L}_{\text{simple}} = \mathbb{E}_{t, x_0, \epsilon} \left[ \| \epsilon - \epsilon_\theta(x_t, t) \|^2 \right]

모델은

x_t

와 시간 단계

t

를 입력받아, 추가된 노이즈

\epsilon

을 예측합니다.

노이즈 예측 모델 (U-Net)

DDPM은 U-Net 구조를 사용하며, 시간 정보를 사인파 임베딩(Sinusoidal Embedding)으로 주입합니다.

import torch.nn as nn
import math

class SinusoidalPositionEmbedding(nn.Module):
    """시간 단계를 사인파 임베딩으로 변환"""
    def __init__(self, dim):
        super().__init__()
        self.dim = dim

    def forward(self, t):
        device = t.device
        half_dim = self.dim // 2
        embeddings = math.log(10000) / (half_dim - 1)
        embeddings = torch.exp(torch.arange(half_dim, device=device) * -embeddings)
        embeddings = t[:, None] * embeddings[None, :]
        embeddings = torch.cat([embeddings.sin(), embeddings.cos()], dim=-1)
        return embeddings


class SimpleUNetBlock(nn.Module):
    """U-Net 블록 (시간 임베딩 포함)"""
    def __init__(self, in_ch, out_ch, time_dim):
        super().__init__()
        self.conv1 = nn.Conv2d(in_ch, out_ch, 3, padding=1)
        self.conv2 = nn.Conv2d(out_ch, out_ch, 3, padding=1)
        self.time_mlp = nn.Linear(time_dim, out_ch)
        self.norm1 = nn.GroupNorm(8, out_ch)
        self.norm2 = nn.GroupNorm(8, out_ch)
        self.act = nn.SiLU()

    def forward(self, x, t_emb):
        h = self.act(self.norm1(self.conv1(x)))
        # 시간 임베딩 주입
        h = h + self.time_mlp(t_emb)[:, :, None, None]
        h = self.act(self.norm2(self.conv2(h)))
        return h

노이즈 스케줄

스케줄	수식	특징
선형(Linear)	$\beta_t = \beta_1 + \frac{t-1}{T-1}(\beta_T - \beta_1)$	DDPM 원본, 간단
코사인(Cosine)	$\bar{\alpha}_t = \frac{f(t)}{f(0)}$ , $f(t) = \cos^2(\frac{t/T + s}{1+s} \cdot \frac{\pi}{2})$	개선된 DDPM, 더 나은 품질

def linear_beta_schedule(timesteps, beta_start=1e-4, beta_end=0.02):
    """선형 노이즈 스케줄"""
    return torch.linspace(beta_start, beta_end, timesteps)

def cosine_beta_schedule(timesteps, s=0.008):
    """코사인 노이즈 스케줄 (Improved DDPM)"""
    steps = timesteps + 1
    x = torch.linspace(0, timesteps, steps)
    alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * math.pi * 0.5) ** 2
    alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
    betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
    return torch.clamp(betas, 0.0001, 0.999)

DDPM 학습 루프

def train_step(model, x_0, optimizer, timesteps, noise_schedule):
    """DDPM 학습 스텝"""
    batch_size = x_0.shape[0]

    # 랜덤 시간 단계 선택
    t = torch.randint(0, timesteps, (batch_size,)).to(x_0.device)

    # 전방 과정: x_0 → x_t
    x_t, noise = forward_diffusion(
        x_0, t,
        noise_schedule['sqrt_alphas_cumprod'],
        noise_schedule['sqrt_one_minus_alphas_cumprod'],
    )

    # 노이즈 예측
    noise_pred = model(x_t, t)

    # 손실: 실제 노이즈 vs 예측 노이즈
    loss = nn.functional.mse_loss(noise_pred, noise)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    return loss.item()

샘플링 (역방 과정)

@torch.no_grad()
def sample(model, image_size, channels, timesteps, noise_schedule):
    """DDPM 샘플링: 순수 노이즈에서 이미지 생성"""
    device = next(model.parameters()).device
    b = 1  # 배치 크기

    # x_T ~ N(0, I)
    x = torch.randn(b, channels, image_size, image_size).to(device)

    for t in reversed(range(timesteps)):
        t_batch = torch.full((b,), t, dtype=torch.long).to(device)

        # 노이즈 예측
        predicted_noise = model(x, t_batch)

        # 역방 과정 계수
        alpha = noise_schedule['alphas'][t]
        alpha_cumprod = noise_schedule['alphas_cumprod'][t]
        beta = noise_schedule['betas'][t]

        # 평균 계산
        x = (1 / alpha.sqrt()) * (
            x - (beta / (1 - alpha_cumprod).sqrt()) * predicted_noise
        )

        # 노이즈 추가 (마지막 단계 제외)
        if t > 0:
            noise = torch.randn_like(x)
            x = x + beta.sqrt() * noise

    return x

Stable Diffusion 개관

Stable Diffusion은 **Latent Diffusion Model(LDM)**로, 픽셀 공간이 아닌 **잠재 공간(Latent Space)**에서 Diffusion을 수행하여 계산 효율을 높였습니다.

구성 요소	역할
VAE	이미지 ↔ 잠재 공간 변환 (512×512 → 64×64)
U-Net	잠재 공간에서 노이즈 예측 (Cross-Attention으로 텍스트 조건 주입)
CLIP 텍스트 인코더	텍스트 프롬프트를 조건 임베딩으로 변환

Stable Diffusion은 이 탭에서 배운 VAE(잠재 공간 구조), U-Net(CNN 아키텍처), CLIP(대조 학습 임베딩)이 모두 결합된 모델입니다. 각 구성 요소의 원리를 이해하면 Stable Diffusion의 동작을 깊이 이해할 수 있습니다.

생성 모델 패러다임 비교

모델	학습 안정성	생성 품질	추론 속도	다양성
VAE	높음	보통 (흐릿)	빠름	높음
GAN	낮음	높음 (선명)	빠름	낮음 (모드 붕괴)
Diffusion	높음	매우 높음	느림	높음

Diffusion 모델의 추론 속도 개선

DDPM의 원래 샘플링은 수백~~수천 단계가 필요해 매우 느립니다. DDIM(Denoising Diffusion Implicit Models)은 결정론적 샘플링으로 단계를 줄이고, DPM-Solver 등 고속 솔버는 10~~25 단계로 고품질 생성을 달성합니다. Distillation 기법(Consistency Models, LCM)은 1~4 단계까지 줄이는 연구가 진행 중입니다.

참고 논문

논문	학회/연도	핵심 기여
Denoising Diffusion Probabilistic Models - DDPM (Ho et al.)	NeurIPS 2020	현대 Diffusion 모델의 기초
Improved Denoising Diffusion Probabilistic Models (Nichol & Dhariwal)	ICML 2021	코사인 스케줄, 학습 분산
High-Resolution Image Synthesis with Latent Diffusion Models (Rombach et al.)	CVPR 2022	Latent Diffusion / Stable Diffusion

체크리스트

전방 과정과 역방 과정의 역할을 설명할 수 있다
DDPM이 노이즈를 예측하는 학습 방식을 이해한다
노이즈 스케줄의 역할과 선형/코사인 차이를 안다
Stable Diffusion의 Latent Diffusion 아이디어를 설명할 수 있다

다음 단계

Vision 탭

Diffusion 모델의 이미지 생성 응용

Fine-Tuning 탭

Stable Diffusion Fine-Tuning (LoRA, DreamBooth)

​학습 목표

​왜 중요한가

​핵심 아이디어

​전방 과정 (Forward Process)

​임의 시점으로 직접 이동 (중요한 성질)

​역방 과정 (Reverse Process)

​학습 목표

​노이즈 예측 모델 (U-Net)

​노이즈 스케줄

​DDPM 학습 루프

​샘플링 (역방 과정)

​Stable Diffusion 개관

​생성 모델 패러다임 비교

​참고 논문

​체크리스트

​다음 단계

Vision 탭

Fine-Tuning 탭

학습 목표

왜 중요한가

핵심 아이디어

전방 과정 (Forward Process)

임의 시점으로 직접 이동 (중요한 성질)

역방 과정 (Reverse Process)

학습 목표

노이즈 예측 모델 (U-Net)

노이즈 스케줄

DDPM 학습 루프

샘플링 (역방 과정)

Stable Diffusion 개관

생성 모델 패러다임 비교

참고 논문

체크리스트

다음 단계