Corgi Dog Bark

ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • ID3 모델 구현_Python(2)_전체모델
    머신러닝(MACHINE LEARNING)/간단하게 이론(Theory...) 2021. 4. 26. 15:16
    반응형

    0. 전체 코드 구현

    -주석을 다 뺐으며, 밑의 코드분석에는 주석을 달아 놓았습니다.

    ID3 모델 구현 Tree 의 마지막 단계 글입니다.

    https://guru.tistory.com/entry/ID3-%EB%AA%A8%EB%8D%B8-%EA%B5%AC%ED%98%84Python

     

    ID3 모델 구현_Python

    저번에 살펴본 ID3 모델을 이제는 Python으로 간략히 구현해보자. 혹시나 ID3모델이 무엇인지 모른다면 , 저번 포스팅을 참고해보자 https://guru.tistory.com/entry/Decision-Tree-%EC%97%90%EC%84%9C%EC%9D%98-I..

    guru.tistory.com

    import numpy as np
    import pandas as pd
    eps = np.finfo(float).eps
    from numpy import log2 as log
    
    outlook = 'overcast,overcast,overcast,overcast,rainy,rainy,rainy,rainy,rainy,sunny,sunny,sunny,sunny,sunny'.split(',')
    temp = 'hot,cool,mild,hot,mild,cool,cool,mild,mild,hot,hot,mild,cool,mild'.split(',')
    humidity = 'high,normal,high,normal,high,normal,normal,normal,high,high,high,high,normal,normal'.split(',')
    windy = 'FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE'.split(',')
    play = 'yes,yes,yes,yes,yes,yes,no,yes,no,no,no,no,yes,yes'.split(',')
    
    dataset = {'outlook': outlook, "temp": temp, "humidity":humidity, "windy":windy, "play":play}
    df = pd.DataFrame(dataset, columns = ['outlook','temp','humidity','windy','play'])
    
    df
    
    def find_entropy(df):
        Class = df.keys()[-1]   
        entropy = 0
        values = df[Class].unique()
        for value in values:
            fraction = df[Class].value_counts()[value]/len(df[Class])
            entropy += -fraction*np.log2(fraction)
        return entropy
    
    def find_entropy_attribute(df,attribute):
        Class = df.keys()[-1]
        target_variables = df[Class].unique()
        variables = df[attribute].unique()
        entropy2 = 0
        for variable in variables:
            entropy = 0
            for target_variable in target_variables:
                num = len(df[attribute][df[attribute] == variable][df[Class]==target_variable])
                den = len(df[attribute][df[attribute] == variable])
                fraction = num/(den+eps)
                entropy+= -fraction*log(fraction)
            fraction2 = den/len(df)
            entropy2 += -fraction2*entropy
        return entropy2
    
    def find_winner(df):
        Entropy_att = []
        IG = []
        for key in df.keys()[:-1]:
            IG.append(find_entropy(df)-find_entropy_attribute(df,key))
        return df.keys()[:-1][np.argmax(IG)]
    
    def get_subtable(df,node,value):
        return df[df[node] == value].reset_index(drop=True)
    
    def buildTree(df,tree=None): 
        Class = df.keys()[-1] 
        node = find_winner(df)
        attValue = np.unique(df[node])
        
        #Create an empty dictionary to create tree    
        if tree is None:                    
            tree={}
            tree[node] = {}
        
        for value in attValue:
            
            subtable = get_subtable(df,node,value)
            clValue,counts = np.unique(subtable[Class],return_counts=True)                        
            
            if len(counts)==1:
                tree[node][value] = clValue[0]                                                    
            else:        
                tree[node][value] = buildTree(subtable) 
                       
        return tree

     

    1. 데이터 로딩

    -앞선 데이터를 그대로 사용하여 pandas 라이브러리에 로딩 시킨다.

    반응형

    댓글

Designed by Tistory.