Nullable dtype와 Arrow 백엔드

Pandas 2.x에서는 결측 친화 타입(Int64, boolean, string)과 Arrow 백엔드를 함께 사용하면 결측 처리 일관성과 메모리 효율을 동시에 개선할 수 있습니다.

Nullable dtype

s_int = pd.Series([1, None, 3], dtype="Int64")
s_bool = pd.Series([True, None, False], dtype="boolean")
s_str = pd.Series(["a", None, "c"], dtype="string")

기존 int64는 결측을 허용하지 않아 float로 승격되는 문제가 있었는데, Int64는 이를 줄여줍니다.

convert_dtypes 적용

df = df.convert_dtypes()
print(df.dtypes)

스키마 정리가 필요한 초기 단계에서 convert_dtypes()를 먼저 적용하면 dtype 혼합을 줄일 수 있습니다.

Arrow backend

df = pd.read_csv("events.csv", dtype_backend="pyarrow")

대용량 로딩에서 메모리 사용량과 문자열 처리 성능 개선이 가능한 경우가 많습니다. 다만 일부 연산/라이브러리 호환성을 사전에 점검해야 합니다.

이어서 볼 문서

자료형 변환

수치/문자열/날짜 변환의 기본 기준을 정리합니다.

메모리 최적화

dtype 최적화를 메모리 튜닝과 함께 적용합니다.

문자열과 날짜 Categorical 전략

​Nullable dtype

​convert_dtypes 적용

​Arrow backend

​이어서 볼 문서