FLAML

FLAML(Fast and Lightweight AutoML)은 Microsoft Research에서 개발한 경량 AutoML 라이브러리로, 비용 효율적 하이퍼파라미터 최적화(CFO) 알고리즘을 사용하여 적은 리소스로 빠르게 좋은 모델을 찾습니다.

학습 목표

FLAML AutoML을 사용하여 분류/회귀 작업을 수행할 수 있습니다.
시간과 비용 제약 조건을 설정할 수 있습니다.
커스텀 학습기를 FLAML에 통합할 수 있습니다.

FLAML vs 다른 AutoML 도구

특성	FLAML	Auto-sklearn	AutoGluon
설치 용량	매우 가벼움	무거움	무거움
탐색 전략	CFO (비용 효율적)	SMAC + 메타 학습	스택 앙상블
리소스 사용	최소	중간	높음
커스텀 학습기	쉬움	어려움	보통
적합한 상황	빠른 프로토타입, 리소스 제한	scikit-learn 생태계	최고 성능 필요

FLAML 실습

설치 및 기본 분류

pip install flaml[automl]

from flaml import AutoML
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# 데이터 준비
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# FLAML AutoML 설정
automl = AutoML()

settings = {
    "time_budget": 60,           # 총 탐색 시간 (초)
    "metric": "accuracy",        # 최적화 지표
    "task": "classification",    # 작업 유형
    "log_file_name": "flaml.log",
    "seed": 42,
}

# 학습 (자동으로 최적 모델 탐색)
automl.fit(X_train, y_train, **settings)

# 결과 확인
print(f"최적 모델: {automl.best_estimator}")
print(f"최적 하이퍼파라미터: {automl.best_config}")
print(f"테스트 정확도: {automl.score(X_test, y_test):.4f}")
print(f"학습 소요 시간: {automl.best_config_train_time:.2f}초")

회귀 작업

from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

automl_reg = AutoML()

automl_reg.fit(
    X_train, y_train,
    task="regression",
    metric="r2",
    time_budget=120,         # 2분 제한
    estimator_list=[         # 탐색할 모델 지정
        "lgbm",              # LightGBM
        "xgboost",           # XGBoost
        "rf",                # Random Forest
        "extra_tree",        # Extra Trees
    ],
    seed=42,
)

print(f"최적 모델: {automl_reg.best_estimator}")
print(f"R² 점수: {automl_reg.score(X_test, y_test):.4f}")

# 탐색 히스토리 확인
from flaml.automl.data import get_output_from_log
import matplotlib.pyplot as plt

# 로그에서 탐색 히스토리 추출
time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \
    get_output_from_log(filename="flaml.log", time_budget=120)

# 시간에 따른 최적 성능 변화
plt.plot(time_history, best_valid_loss_history)
plt.xlabel("경과 시간 (초)")
plt.ylabel("최적 검증 손실")
plt.title("FLAML 탐색 진행 과정")
plt.show()

커스텀 학습기 통합

from flaml import AutoML
from flaml.automl.model import SKLearnEstimator
from sklearn.neighbors import KNeighborsClassifier

# 커스텀 학습기 정의
class MyKNN(SKLearnEstimator):
    """FLAML에 KNN을 커스텀 학습기로 추가"""

    @classmethod
    def search_space(cls, data_size, task):
        """탐색할 하이퍼파라미터 공간 정의"""
        space = {
            "n_neighbors": {
                "domain": tune.randint(lower=1, upper=50),
                "init_value": 5,
            },
            "weights": {
                "domain": tune.choice(["uniform", "distance"]),
            },
            "p": {
                "domain": tune.choice([1, 2]),  # 1: 맨해튼, 2: 유클리드
            },
        }
        return space

    def __init__(self, task="classification", **params):
        super().__init__(task, **params)
        self.estimator_class = KNeighborsClassifier

from flaml import tune

# 커스텀 학습기를 포함하여 학습
automl = AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    estimator_list=["lgbm", "rf", "my_knn"],  # 커스텀 학습기 포함
    custom_hp={"my_knn": MyKNN.search_space(None, None)},
)

Pipeline과 FLAML 통합

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from flaml import AutoML

# FLAML을 Pipeline에 통합
automl = AutoML()

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", automl),
])

# Pipeline 내에서 AutoML 실행
automl.fit(
    X_train, y_train,
    task="classification",
    time_budget=60,
    metric="f1",
    n_jobs=-1,
)

# 예측
predictions = automl.predict(X_test)

FLAML은 기본적으로 LightGBM, XGBoost, Random Forest, Extra Trees, LR, KNN을 탐색합니다. estimator_list 파라미터로 특정 모델만 탐색하도록 제한하면 더 빠르게 결과를 얻을 수 있습니다.

Q: FLAML이 다른 AutoML보다 빠른 이유는 무엇인가요?

FLAML은 CFO(Cost-Frugal Optimization) 알고리즘을 사용합니다. 이 알고리즘은 학습 비용이 낮은 설정부터 시작하여 점진적으로 비용을 높이는 방식으로 탐색합니다. 또한 BlendSearch라는 전략으로 로컬 탐색과 글로벌 탐색을 효율적으로 결합합니다.

Q: FLAML은 어떤 상황에서 가장 적합한가요?

리소스가 제한된 환경(노트북, 소규모 서버), 빠른 프로토타이핑이 필요한 경우, 또는 커스텀 모델을 AutoML 파이프라인에 쉽게 통합하고 싶을 때 FLAML이 적합합니다. 최고 성능이 필요하다면 AutoGluon을 고려합니다.

체크리스트

FLAML로 분류/회귀 작업을 수행할 수 있다
시간/모델 제약을 설정할 수 있다
탐색 히스토리를 시각화하고 해석할 수 있다

다음 문서

Optuna

유연한 하이퍼파라미터 최적화를 학습합니다.

실무 프로젝트

AutoML을 포함한 종합 프로젝트를 수행합니다.

00. 시작하기

01. 데이터와 평가

02. 실무 파이프라인

03. 지도학습

04. 비지도학습

05. 특수 학습 기법

06. 통계 모델링

학습 목표

FLAML vs 다른 AutoML 도구

FLAML 실습

체크리스트

다음 문서

Optuna

실무 프로젝트

00. 시작하기

01. 데이터와 평가

02. 실무 파이프라인

03. 지도학습

04. 비지도학습

05. 특수 학습 기법

06. 통계 모델링

​학습 목표

​FLAML vs 다른 AutoML 도구

​FLAML 실습

​체크리스트

​다음 문서

Optuna

실무 프로젝트

학습 목표

FLAML vs 다른 AutoML 도구

FLAML 실습

체크리스트

다음 문서