cross_val_score 할 때 scoring에 입력 가능한 파라미터

머신러닝 모델을 검증할 때, KFold, StratifiedKFold를 쓰지 않고, cross_val_score를 사용하여 편리하게 교차검증이 가능합니다. 그런데, 'cross_val_score' 함수는 평가지표로 사용할 scoring을 설정하게 되어있습니다.

scoring
str or callable, default=None
A str (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y) which should return only a single value.
Similar to cross_validate but only a single metric is permitted.
If None, the estimator’s default scorer (if available) is used.

여기서 이 scoring에 무엇을 넣어야 할까요? sklearn에서 제공하는 함수니까 sklearn.metric의 지표를 사용하면 될 줄 알았는데, 아니었습니다. (accuracy_score, precision_score 등은 사용 불가능)

그래서 cross_val_score를 사용할 때, 평가지표를 찾아보고 예제를 만들어보겠습니다.

sklearn에서 cross_val_score 설명에는 가능한 평가지표에 대한 설명이 없고, 아래 링크에서 입력 가능한 평가지표를 찾을 수 있었습니다. 여기에서 분류 모델은 이진 분류 기준인데, 다중분류인 경우에는 macro를 붙여주면 됩니다.

https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

3.3. Metrics and scoring: quantifying the quality of predictions

There are 3 different APIs for evaluating the quality of a model’s predictions: Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they ...

scikit-learn.org

이제 예제로 보겠습니다. 분류모델은 iris 데이터, 회귀예측 모델은 boston house를 사용하였습니다. 각각의 평가지표에 대한 설명은 제외하고, 정확한 입력 스트링만 확인해 보았습니다.

import numpy as np
from sklearn.datasets import load_boston, load_iris
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor

# Load the Boston Housing dataset
boston = load_boston()
X_boston = boston.data
y_boston = boston.target

# Load the Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Create a linear regression model for the Boston Housing dataset
model_boston = RandomForestRegressor()

# Create a logistic regression model for the Iris dataset
model_iris = RandomForestClassifier()

# Evaluation indicators for regression
regression_indicators = ['neg_mean_squared_error', 'r2']

# Evaluation indicators for classification
classification_indicators = ['accuracy', 'precision_macro', 'recall_macro', 'f1_macro', 'roc_auc_ovr']

# Cross-validation with the Boston Housing dataset
for indicator in regression_indicators:
    scores = cross_val_score(model_boston, X_boston, y_boston, cv=5, scoring=indicator)
    print(f'Boston Housing - {indicator}: {np.mean(scores):.4f}')



# Cross-validation with the Iris dataset
for indicator in classification_indicators:
    scores = cross_val_score(model_iris, X_iris, y_iris, cv=5, scoring=indicator,)
    print(f'Iris - {indicator}: {np.mean(scores):.4f}')

Output:

Boston Housing - neg_mean_squared_error: -22.1448
Boston Housing - r2: 0.6278
Iris - accuracy: 0.9600
Iris - precision_macro: 0.9707
Iris - recall_macro: 0.9600
Iris - f1_macro: 0.9598
Iris - roc_auc_ovr: 0.9930

저작자표시 비영리 동일조건 (새창열림)

'데이터분석과 AI > 데이터분석과 AI 문법(Python)' 카테고리의 다른 글

동일한 플롯에 스케일이 다른 그래프를 그리고 싶을 때 twinx() (0)	2023.07.13
수치형 데이터를 범주형으로 만드는 비닝(Binning)의 세 가지 방법 feat. Xverse (0)	2023.07.06
데이터분석 초보자가 자주하는 실수- 정확도, 정밀도, 재현율, F1-score 까지 모든 성능지표가 1인 경우 (0)	2023.06.30
[Python] 그래프에서 한글 깨질 때, 폰트 확인, 한글 폰트 설정, 마이너스 표기 방법 (0)	2023.06.28
Inplace=True 옵션을 썼는데, 데이터 변경이 안되는 경우 (0)	2023.06.11

IT에서 일하는 비(非) 개발자 이야기

cross_val_score 할 때 scoring에 입력 가능한 파라미터

'데이터분석과 AI > 데이터분석과 AI 문법(Python)' 카테고리의 다른 글

댓글

티스토리툴바

cross_val_score 할 때 scoring에 입력 가능한 파라미터

'데이터분석과 AI > 데이터분석과 AI 문법(Python)' 카테고리의 다른 글

관련글

댓글

티스토리툴바