JSON과 YAML - 배움 에이아이

학습 목표

json 모듈로 JSON 파일을 읽고 쓸 수 있다
PyYAML로 YAML 파일을 파싱할 수 있다
설정 파일 패턴을 구현할 수 있다
커스텀 직렬화/역직렬화를 수행할 수 있다

왜 중요한가

JSON과 YAML은 설정 파일, API 응답, 실험 기록 등에 광범위하게 사용됩니다. 머신러닝/DL에서 모델 설정(config), 하이퍼파라미터 관리(hydra, wandb), API 통신(REST) 모두 이 형식들을 사용합니다.

JSON 처리

파일 읽기/쓰기

import json

# JSON 파일 쓰기
config = {
    "model": "bert-base",
    "learning_rate": 0.001,
    "epochs": 50,
    "layers": [128, 64, 32],
    "use_gpu": True,
}

with open("config.json", "w", encoding="utf-8") as f:
    json.dump(config, f, ensure_ascii=False, indent=2)

# JSON 파일 읽기
with open("config.json", "r", encoding="utf-8") as f:
    loaded = json.load(f)

print(loaded["model"])         # "bert-base"
print(loaded["learning_rate"]) # 0.001

문자열 변환

# Python -> JSON 문자열
json_str = json.dumps(config, ensure_ascii=False, indent=2)
print(json_str)

# JSON 문자열 -> Python
data = json.loads('{"name": "test", "value": 42}')
print(data["name"])  # "test"

커스텀 직렬화

from datetime import datetime
from dataclasses import dataclass, asdict

# datetime 등 JSON 미지원 타입 처리
class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if hasattr(obj, "__dict__"):
            return obj.__dict__
        return super().default(obj)

experiment = {
    "name": "실험 1",
    "timestamp": datetime.now(),
    "results": {"accuracy": 0.95},
}

json_str = json.dumps(experiment, cls=CustomEncoder, ensure_ascii=False, indent=2)
print(json_str)

YAML 처리

YAML은 JSON보다 읽기 쉬운 형식입니다. 머신러닝 실험 설정에 널리 사용됩니다.

# PyYAML 설치
pip install pyyaml

import yaml

# YAML 파일 읽기
with open("config.yaml", "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)

# YAML 파일 쓰기
config = {
    "model": {
        "name": "bert-base",
        "hidden_size": 768,
        "num_layers": 12,
    },
    "training": {
        "epochs": 50,
        "learning_rate": 0.001,
        "batch_size": 32,
    },
}

with open("config.yaml", "w", encoding="utf-8") as f:
    yaml.dump(config, f, allow_unicode=True, default_flow_style=False)

생성되는 YAML 파일:

model:
  hidden_size: 768
  name: bert-base
  num_layers: 12
training:
  batch_size: 32
  epochs: 50
  learning_rate: 0.001

항상 yaml.safe_load()를 사용합니다. yaml.load()는 임의의 Python 객체를 생성할 수 있어 보안 위험이 있습니다.

JSON vs YAML 비교

특성	JSON	YAML
가독성	보통	높음
주석	불가	가능 (`#`)
표준 라이브러리	내장	PyYAML 필요
API 통신	표준	거의 안 쓰임
설정 파일	보통	매우 적합

AI/ML에서의 활용

# 실험 결과 JSON으로 저장
import json
from datetime import datetime

def save_experiment(filepath, config, metrics):
    """실험 결과를 JSON으로 저장합니다."""
    result = {
        "timestamp": datetime.now().isoformat(),
        "config": config,
        "metrics": metrics,
    }
    with open(filepath, "w", encoding="utf-8") as f:
        json.dump(result, f, ensure_ascii=False, indent=2)

save_experiment(
    "experiment_001.json",
    config={"model": "bert", "lr": 0.001},
    metrics={"accuracy": 0.95, "f1": 0.93}
)

# YAML 설정 파일 로딩 패턴
def load_config(filepath):
    """YAML 설정 파일을 로딩합니다."""
    with open(filepath, "r", encoding="utf-8") as f:
        config = yaml.safe_load(f)
    return config

config = load_config("experiment.yaml")

TOML은 무엇인가요?

TOML은 pyproject.toml에서 사용하는 설정 파일 형식입니다. Python 3.11부터 tomllib이 표준 라이브러리에 포함되어, 별도 설치 없이 TOML 파일을 읽을 수 있습니다.

체크리스트

json.load()/json.dump()로 JSON 파일을 읽고 쓸 수 있다
yaml.safe_load()로 YAML 파일을 파싱할 수 있다
커스텀 JSON Encoder를 작성할 수 있다
JSON과 YAML의 적합한 사용 상황을 구분할 수 있다

다음 문서

pathlib

Path 객체와 파일 경로 관리

텍스트와 CSV

기본 파일 처리 복습

​학습 목표

​왜 중요한가

​JSON 처리

​YAML 처리

​JSON vs YAML 비교

​AI/ML에서의 활용

​체크리스트

​다음 문서

pathlib

텍스트와 CSV

학습 목표

왜 중요한가

JSON 처리

YAML 처리

JSON vs YAML 비교

AI/ML에서의 활용

체크리스트

다음 문서