Instance Segmentation - 배움 에이아이

Instance Segmentation은 이미지 내 각 객체를 개별적으로 인식하고, 객체마다 픽셀 단위 마스크를 생성합니다. 같은 클래스의 객체도 별도로 구분합니다.

Mask R-CNN (torchvision)

import torch
import torchvision
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2

# 사전학습 Mask R-CNN 로드
model = maskrcnn_resnet50_fpn_v2(weights='DEFAULT')
model.eval()

# 추론
image = torch.randn(1, 3, 800, 800)
with torch.no_grad():
    predictions = model(image)

# 결과 구조
for pred in predictions:
    print(f"박스: {pred['boxes'].shape}")    # [N, 4]
    print(f"레이블: {pred['labels'].shape}")  # [N]
    print(f"점수: {pred['scores'].shape}")    # [N]
    print(f"마스크: {pred['masks'].shape}")   # [N, 1, H, W]

YOLO-Seg (Ultralytics)

YOLO-Seg는 탐지와 동일한 인터페이스로 인스턴스 세그멘테이션을 수행합니다.

from ultralytics import YOLO

# 세그멘테이션 모델 로드
model = YOLO('yolo11m-seg.pt')

# 추론
results = model('image.jpg')

for result in results:
    if result.masks is not None:
        masks = result.masks.data          # [N, H, W] 바이너리 마스크
        boxes = result.boxes               # 바운딩 박스
        for i, (mask, box) in enumerate(zip(masks, boxes)):
            cls = int(box.cls)
            conf = float(box.conf)
            cls_name = result.names[cls]
            print(f"객체 {i}: {cls_name} ({conf:.2f})")

YOLO-Seg 커스텀 학습

# 세그멘테이션 모델 학습
model = YOLO('yolo11m-seg.pt')

results = model.train(
    data='dataset_seg.yaml',  # 세그멘테이션용 dataset.yaml
    epochs=100,
    imgsz=640,
    task='segment',
)

YOLO 세그멘테이션용 레이블 파일 형식:

# class_id x1 y1 x2 y2 x3 y3 ... (정규화 다각형 좌표)
0 0.25 0.30 0.45 0.28 0.50 0.60 0.30 0.65 0.20 0.45

결과 시각화 (supervision)

import cv2
import supervision as sv

image = cv2.imread('image.jpg')
results = model(image)[0]

detections = sv.Detections.from_ultralytics(results)

# 마스크 시각화
mask_annotator = sv.MaskAnnotator()
label_annotator = sv.LabelAnnotator()

labels = [
    f"{results.names[int(cls)]} {conf:.2f}"
    for cls, conf in zip(detections.class_id, detections.confidence)
]

annotated = mask_annotator.annotate(scene=image.copy(), detections=detections)
annotated = label_annotator.annotate(scene=annotated, detections=detections, labels=labels)

cv2.imwrite('segmentation_result.jpg', annotated)

Mask R-CNN vs YOLO-Seg 비교

비교 항목	Mask R-CNN	YOLO-Seg
방식	Two-stage	One-stage
속도	느림	빠름
마스크 품질	높음	보통~높음
학습 편의성	복잡 (Detectron2)	간편 (ultralytics)
소형 객체	좋음	보통
추천	정밀도 중시	실무 범용

Instance vs Semantic 중 어떤 것을 선택해야 하나요?

개별 객체를 세어야 하거나 각 객체의 속성을 따로 분석해야 하면 Instance Segmentation이 필요합니다. 도로/건물처럼 영역만 구분하면 되는 경우 Semantic Segmentation이 적합합니다.

YOLO-Seg의 마스크 품질이 Mask R-CNN보다 낮은 이유는?

YOLO-Seg은 속도를 위해 마스크 해상도를 낮추어 예측합니다(160x160). Mask R-CNN은 RoI 단위로 28x28 마스크를 예측하여 상대적으로 정밀합니다. 최신 YOLO 버전들은 이 격차를 점차 줄이고 있습니다.

​Mask R-CNN vs YOLO-Seg 비교

Mask R-CNN vs YOLO-Seg 비교