로깅과 디버깅

학습 목표

logging 모듈의 로그 레벨과 포맷을 설정할 수 있다
파일과 콘솔에 동시에 로그를 출력할 수 있다
pdb와 breakpoint()로 코드를 단계별로 디버깅할 수 있다
머신러닝 학습 과정에서 효과적인 로깅 전략을 적용할 수 있다

왜 중요한가

print()로 디버깅하는 것은 소규모 스크립트에서는 충분하지만, 머신러닝 프로젝트에서는 학습 로그, 에러 추적, 실험 기록을 체계적으로 관리해야 합니다. logging 모듈은 로그 레벨별 필터링, 파일 저장, 포맷 지정을 제공하며, pdb는 코드 실행을 중단하고 변수 상태를 검사할 수 있는 강력한 디버거입니다.

logging 기본 설정

빠른 시작

import logging

# 기본 설정 (한 번만 호출)
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

# 로거 생성
logger = logging.getLogger(__name__)

# 로그 레벨별 사용
logger.debug("디버그 메시지 (개발 중 상세 정보)")
logger.info("정보 메시지 (정상 동작 기록)")
logger.warning("경고 메시지 (잠재적 문제)")
logger.error("에러 메시지 (기능 오류)")
logger.critical("치명적 메시지 (시스템 중단)")

로그 레벨 이해

레벨	숫자	용도	예시
`DEBUG`	10	개발 중 상세 정보	변수 값, 함수 호출 추적
`INFO`	20	정상 동작 기록	학습 시작, epoch 완료
`WARNING`	30	잠재적 문제	메모리 사용량 높음, 학습률 너무 큼
`ERROR`	40	기능 오류	파일 못 찾음, API 실패
`CRITICAL`	50	시스템 중단	GPU 메모리 부족, 디스크 꽉 참

프로덕션에서는 INFO 이상, 개발 중에는 DEBUG 레벨로 설정하는 것이 일반적입니다.

포맷 문자열

# 주요 포맷 코드
format_str = "%(asctime)s [%(levelname)-8s] %(name)s:%(lineno)d - %(message)s"
# 출력: 2024-06-15 14:30:25 [INFO    ] __main__:42 - 학습 시작

# 주요 포맷 코드 목록
# %(asctime)s    - 시간 (2024-06-15 14:30:25)
# %(levelname)s  - 레벨 이름 (INFO, WARNING 등)
# %(name)s       - 로거 이름
# %(filename)s   - 파일명
# %(lineno)d     - 줄 번호
# %(funcName)s   - 함수명
# %(message)s    - 로그 메시지

핸들러와 포매터

import logging

def setup_logger(
    name: str,
    log_file: str | None = None,
    level: int = logging.INFO,
) -> logging.Logger:
    """콘솔 + 파일 로거를 설정합니다."""
    logger = logging.getLogger(name)
    logger.setLevel(level)

    # 포매터 정의
    formatter = logging.Formatter(
        "%(asctime)s [%(levelname)-8s] %(name)s: %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S",
    )

    # 콘솔 핸들러
    console_handler = logging.StreamHandler()
    console_handler.setLevel(level)
    console_handler.setFormatter(formatter)
    logger.addHandler(console_handler)

    # 파일 핸들러 (선택)
    if log_file:
        file_handler = logging.FileHandler(log_file, encoding="utf-8")
        file_handler.setLevel(level)
        file_handler.setFormatter(formatter)
        logger.addHandler(file_handler)

    return logger

# 사용
logger = setup_logger("ml_experiment", "experiment.log", logging.DEBUG)
logger.info("실험 시작")
logger.debug("하이퍼파라미터: lr=0.001, batch=32")

로그 파일 로테이션

import logging
from logging.handlers import RotatingFileHandler, TimedRotatingFileHandler

# 크기 기반 로테이션 (5MB마다 새 파일, 최대 3개 백업)
rotating_handler = RotatingFileHandler(
    "app.log",
    maxBytes=5 * 1024 * 1024,  # 5MB
    backupCount=3,
    encoding="utf-8",
)

# 시간 기반 로테이션 (매일 자정)
timed_handler = TimedRotatingFileHandler(
    "app.log",
    when="midnight",
    interval=1,
    backupCount=7,  # 7일 보관
    encoding="utf-8",
)

구조화된 로깅

import logging
import json

# JSON 포맷 로거 (로그 수집 시스템 연동용)
class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        log_data = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        # 추가 필드가 있으면 포함
        if hasattr(record, "extra_data"):
            log_data["data"] = record.extra_data
        return json.dumps(log_data, ensure_ascii=False)

# 설정
logger = logging.getLogger("structured")
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# 추가 데이터와 함께 로깅
logger.info(
    "학습 완료",
    extra={"extra_data": {"epoch": 10, "loss": 0.05, "accuracy": 0.95}},
)
# {"timestamp": "2024-06-15 14:30:25", "level": "INFO", "logger": "structured",
#  "message": "학습 완료", "data": {"epoch": 10, "loss": 0.05, "accuracy": 0.95}}

pdb - Python 디버거

# 방법 1: breakpoint() 사용 (Python 3.7+, 권장)
def calculate_loss(predictions, targets):
    total_loss = 0
    for pred, target in zip(predictions, targets):
        diff = pred - target
        breakpoint()  # 여기서 실행 중단, 디버거 진입
        total_loss += diff ** 2
    return total_loss / len(predictions)

# 방법 2: pdb 직접 사용
import pdb

def find_bug(data):
    result = []
    for item in data:
        pdb.set_trace()  # breakpoint()과 동일
        processed = item * 2
        result.append(processed)
    return result

pdb 주요 명령어

명령어	축약	설명
`next`	`n`	다음 줄 실행 (함수 안으로 들어가지 않음)
`step`	`s`	다음 줄 실행 (함수 안으로 들어감)
`continue`	`c`	다음 중단점까지 계속 실행
`print(expr)`	`p expr`	표현식 값 출력
`pp(expr)`	`pp`	예쁘게 출력 (Pretty Print)
`list`	`l`	현재 위치 주변 코드 표시
`where`	`w`	호출 스택 표시
`up`	`u`	호출 스택에서 한 단계 위로
`down`	`d`	호출 스택에서 한 단계 아래로
`quit`	`q`	디버거 종료

# 조건부 중단점
def process_batch(batch):
    for i, item in enumerate(batch):
        if i == 50:
            breakpoint()  # 50번째 항목에서만 중단
        result = transform(item)

프로덕션 코드에 breakpoint()나 pdb.set_trace()를 남겨두지 마세요. 환경 변수 PYTHONBREAKPOINT=0을 설정하면 모든 breakpoint() 호출이 무시됩니다.

디버깅 전략

import logging
import traceback

logger = logging.getLogger(__name__)

# 1. 예외 정보를 포함한 로깅
def safe_operation(data):
    try:
        result = process(data)
        return result
    except Exception as e:
        logger.error("처리 실패: %s", e, exc_info=True)
        # exc_info=True → 전체 스택 트레이스 기록
        return None

# 2. 함수 진입/종료 로깅 (데코레이터)
def log_function_call(func):
    """함수 호출을 자동으로 로깅합니다."""
    def wrapper(*args, **kwargs):
        logger.debug(
            "%s 호출: args=%s, kwargs=%s",
            func.__name__, args, kwargs,
        )
        try:
            result = func(*args, **kwargs)
            logger.debug("%s 반환: %s", func.__name__, result)
            return result
        except Exception as e:
            logger.error("%s 예외: %s", func.__name__, e, exc_info=True)
            raise
    return wrapper

@log_function_call
def divide(a: float, b: float) -> float:
    return a / b

# 3. 컨텍스트 매니저로 성능 로깅
import time

class LogTimer:
    """코드 블록의 실행 시간을 로깅합니다."""

    def __init__(self, name: str, logger: logging.Logger | None = None):
        self.name = name
        self.logger = logger or logging.getLogger(__name__)

    def __enter__(self):
        self.start = time.time()
        self.logger.info("%s 시작", self.name)
        return self

    def __exit__(self, *exc_info):
        elapsed = time.time() - self.start
        self.logger.info("%s 완료: %.2f초", self.name, elapsed)

# 사용
with LogTimer("데이터 전처리"):
    # 전처리 코드
    pass

AI/ML에서의 활용

import logging
from pathlib import Path
from datetime import datetime

def setup_experiment_logger(
    experiment_name: str,
    log_dir: str = "logs",
) -> logging.Logger:
    """ML 실험용 로거를 설정합니다."""
    log_path = Path(log_dir)
    log_path.mkdir(parents=True, exist_ok=True)

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    log_file = log_path / f"{experiment_name}_{timestamp}.log"

    logger = logging.getLogger(experiment_name)
    logger.setLevel(logging.DEBUG)

    # 포매터
    formatter = logging.Formatter(
        "%(asctime)s [%(levelname)-8s] %(message)s",
        datefmt="%H:%M:%S",
    )

    # 콘솔: INFO 이상
    console = logging.StreamHandler()
    console.setLevel(logging.INFO)
    console.setFormatter(formatter)
    logger.addHandler(console)

    # 파일: DEBUG 이상 (상세 기록)
    file_handler = logging.FileHandler(log_file, encoding="utf-8")
    file_handler.setLevel(logging.DEBUG)
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)

    return logger

# 학습 루프에서 활용
def train_model(config: dict):
    logger = setup_experiment_logger(config.get("name", "experiment"))

    logger.info("=" * 50)
    logger.info("실험 시작: %s", config.get("name"))
    logger.info("하이퍼파라미터: %s", config)
    logger.info("=" * 50)

    epochs = config.get("epochs", 10)
    for epoch in range(epochs):
        train_loss = 1.0 / (epoch + 1)  # 시뮬레이션
        val_loss = 1.2 / (epoch + 1)

        # 매 epoch 로깅
        logger.info(
            "Epoch %d/%d - train_loss: %.4f, val_loss: %.4f",
            epoch + 1, epochs, train_loss, val_loss,
        )

        # 경고 조건
        if val_loss > train_loss * 1.5:
            logger.warning(
                "과적합 의심: val_loss(%.4f) > train_loss(%.4f) * 1.5",
                val_loss, train_loss,
            )

        # 디버그 정보
        logger.debug("학습률: %.6f, 배치 수: %d", config.get("lr", 0.001), 100)

    logger.info("학습 완료! 최종 train_loss: %.4f", train_loss)

# 실행
train_model({
    "name": "bert_finetuning",
    "epochs": 5,
    "lr": 0.00005,
    "batch_size": 16,
})

print와 logging의 결정적 차이는 무엇인가요?

print는 항상 stdout으로 출력되며 제어가 불가능합니다. logging은 레벨별 필터링, 파일/콘솔/네트워크 등 다양한 출력 대상, 포맷 지정, 프로덕션에서 레벨 변경이 가능합니다. print는 디버깅 후 지워야 하지만, logging은 레벨을 올려 무시할 수 있습니다.

VS Code에서 디버거를 사용하는 방법은?

VS Code의 Python 확장은 시각적 디버거를 제공합니다. 줄 번호 옆을 클릭하여 중단점을 설정하고, F5로 디버깅을 시작하면 변수 검사, 호출 스택 확인, 조건부 중단점 등을 GUI로 사용할 수 있습니다. launch.json에서 인자와 환경 변수를 설정합니다.

체크리스트

logging.basicConfig으로 로거를 설정할 수 있다
로그 레벨(DEBUG, INFO, WARNING, ERROR)을 적절히 사용할 수 있다
콘솔과 파일에 동시에 로그를 출력하는 핸들러를 구성할 수 있다
breakpoint()로 코드 실행을 중단하고 변수를 검사할 수 있다
머신러닝 학습 과정에서 체계적인 로깅을 적용할 수 있다

00. Python 개요

01. 시작하기

02. 자료구조 기본

03. 자료구조 심화

04. 제어 흐름

05. 함수 기초

06. 함수 심화

07. OOP 기초

08. OOP 심화

09. 모듈과 패키지

10. 파일과 데이터

11. 실전 Python

학습 목표

왜 중요한가

logging 기본 설정

핸들러와 포매터

로그 파일 로테이션

구조화된 로깅

pdb - Python 디버거

pdb 주요 명령어

디버깅 전략

AI/ML에서의 활용

체크리스트

다음 문서

테스트 기초

asyncio

00. Python 개요

01. 시작하기

02. 자료구조 기본

03. 자료구조 심화

04. 제어 흐름

05. 함수 기초

06. 함수 심화

07. OOP 기초

08. OOP 심화

09. 모듈과 패키지

10. 파일과 데이터

11. 실전 Python

​학습 목표

​왜 중요한가

​logging 기본 설정

​핸들러와 포매터

​로그 파일 로테이션

​구조화된 로깅

​pdb - Python 디버거

​pdb 주요 명령어

​디버깅 전략

​AI/ML에서의 활용

​체크리스트

​다음 문서

테스트 기초

asyncio

학습 목표

왜 중요한가

logging 기본 설정

핸들러와 포매터

로그 파일 로테이션

구조화된 로깅

pdb - Python 디버거

pdb 주요 명령어

디버깅 전략

AI/ML에서의 활용

체크리스트

다음 문서