머신러닝(MACHINE LEARNING)/코드 리뷰(Code_Review) 2021. 4. 24. 01:11

파이프 라인 이란.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below. A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to ‘passthrough’ or None.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

sklearn.pipeline.Pipeline — scikit-learn 0.24.1 documentation

scikit-learn.org

(먼저, 그 어디에서도. 파이프라인의 Sklearn 홈페이지 정의를 설명하고 있지 않기에 집고 넘어가겠다..)

-파이프 라인의 목적은 cross_validated(교차 검증 가능한) 여러가지의 단계들을 합쳐놓은 것이다. 그렇게 함으로써, parameter 를 __(언더바 2개)로 접근하여, 수정이 가능하다. step 변수들은 파라매터를 세팅해 놈으로서, 대체가능하며... ~~~

- 자 그래서 정리하면!!! Pipeline 을 쓰는 이유는 당연, 여러가지 데이터 전처리를 하는 모델들을 한데 묶어서 또는 fit 시키려고, PipeLine을 쓰는 것이겠다.

1. 데이터 불러오기

In [1]:

import pandas as pd
from sklearn.datasets import load_iris


iris = load_iris()
df = pd.DataFrame(data = iris.data , columns = ['sepal_length','sepal_width', 'petal_length','petal_width'])
target = iris.target
df.head()

Out[1]:

	sepal_length	sepal_width	petal_length	petal_width
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

2. PipeLine 생성

- PipeLine 을 생성해 주었는데, 여기서는 Pipeline 내부에 (StandardScaler(데이터 스케일러) 와 SVC(서포트 벡터 머신) 모델을 같이 섞어 넣어주었으며, parameter 값은 주지 않았다.)

-PipeLine 에는 score 란 메서드가 존재하는데, X_test, Y_test 한 결과값을 float 형태로 점수화 해준다.

In [2]:

from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test = train_test_split(df,target, test_size = 0.2, random_state = 11)

pipe = Pipeline([('scaler', StandardScaler()),('svc',SVC())])
pipe.fit(X_train,Y_train)
pipe.score(X_test,Y_test)

Out[2]:

0.9333333333333333

3. PipeLine parameter 넘겨주기

-PipeLine의 파라매터 값을 밑의 estimators 처럼 넘겨 줄 수 도 있다.

-여기서 PCA(차원 축소 메서드), SVC(서포트 벡터 머신) 을 넘겨줄 때, 'reduce_dim' 과 'clf' 로 정의 하였는데, 뒤에서 나올 언더바 2개(__) 로 PCA 와 SVC 의 파라매터값을 바꿔주기 위해서 설정해둔 것이다. (자기 편한대로 설정..)

In [3]:

from sklearn.decomposition import PCA
estimators = [('reduce_dim',PCA()),('clf',SVC())]
pipe1 = Pipeline(estimators)
pipe1
pipe1.fit(X_train,Y_train)
pipe1.score(X_test,Y_test)

Out[3]:

0.9666666666666667

4. Params 접근 하기

- pipe1의 set_params 메서드를 통하여 (clf__C = 10) 으로 설정 가능.

- 좀 더 부연설명 하자면, SVC() 메서드를 'clf' 로 위에서 정의 하였고, 그 'clf'에 언더바 __2개를 붙여서, SVC() 머신의 C 파라매터를 10으로 바꿔준것이다.( 좀 말이 복잡하지만 , 그냥 SVC()의 파라매터값을 주고 싶은데, 아까 이름붙인 거에서 언더바 2개 추가해서 접근해준것이다.)

In [4]:

print(pipe1[0])
pipe1.set_params(clf__C=10)
print(pipe1[1])

PCA()
SVC(C=10)

5.GridSearchCV와 결합해서 Params 수정

In [5]:

from sklearn.model_selection import GridSearchCV
param_grid = dict(reduce_dim__n_components = [0,1,2],
                  clf__C=[0.1, 1, 10]
                 )
grid_search = GridSearchCV(pipe1,param_grid = param_grid)
grid_search.fit(X_train,Y_train)
grid_search.best_estimator_

Out[5]:

Pipeline(steps=[('reduce_dim', PCA(n_components=2)), ('clf', SVC(C=10))])

6. Best_estimator_ 로 다시 pipe.score 매기기

-제일 좋은 파라매터로 매겼더니, 100%의 정답률이 나오게 되었다.

In [6]:

pipe2 = Pipeline(steps=[('reduce_dim', PCA(n_components=2)), ('clf', SVC(C=10))])
pipe2.fit(X_train,Y_train)
pipe2.score(X_test,Y_test)

Out[6]:

1.0

'머신러닝(MACHINE LEARNING) > 코드 리뷰(Code_Review)' 카테고리의 다른 글

파이썬 %autoreload %matplotlib 이란? (0)	2021.07.13
[Python] with 이란 (0)	2021.05.17
Pytorch torchvision.transforms.normalize 함수 (0)	2021.05.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

Guru_Park의 블로그

파이프 라인 이란.

1. 데이터 불러오기

2. PipeLine 생성

3. PipeLine parameter 넘겨주기

4. Params 접근 하기

5.GridSearchCV와 결합해서 Params 수정

6. Best_estimator_ 로 다시 pipe.score 매기기

'머신러닝(MACHINE LEARNING) > 코드 리뷰(Code_Review)' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

	sepal_length	sepal_width	petal_length	petal_width
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal_length	sepal_width	petal_length	petal_width
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

인기포스트

ABOUT ME

파이프 라인 이란.

1. 데이터 불러오기

2. PipeLine 생성

3. PipeLine parameter 넘겨주기

4. Params 접근 하기

5.GridSearchCV와 결합해서 Params 수정

6. Best_estimator_ 로 다시 pipe.score 매기기

'머신러닝(MACHINE LEARNING) > 코드 리뷰(Code_Review)' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

	sepal_length	sepal_width	petal_length	petal_width
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2