Gradient_descent 으로 구현한 Linear_Regression

머신러닝(MACHINE LEARNING)/간단하게 이론(Theory...) 2021. 4. 22. 14:17

https://www.boostcourse.org/ai222/lecture/24517

머신러닝을 위한 파이썬

부스트코스 무료 강의

www.boostcourse.org

네이버 부스트 코스 (Linear_Regression) 을 참고하였습니다.

1. 모듈 삽입

In [1]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd

2. LOAD DATASET 여기서, Load_excel()로 import

In the following data

X = number of claims Y = total payment for all the claims in thousands of Swedish Kronor for geographical zones in Sweden

Reference: Swedish Committee on Analysis of Risk Premium in Motor Insurance

dataset from - http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

In [2]:

df = pd.read_excel("./slr06.xls")
df.head()

Out[2]:

	X	Y
0	108	392.5
1	19	46.2
2	13	15.7
3	124	422.2
4	40	119.4

In [3]:

raw_X = df["X"].values.reshape(-1,1)
y = df["Y"].values

3. 데이터 plot(X,y)

In [4]:

plt.figure(figsize=(10,5))
plt.plot(raw_X,y,'o',alpha=0.5)

Out[4]:

[<matplotlib.lines.Line2D at 0x22815d95e80>]

In [5]:

# raw_X 는 2차원 배열 y 는 series 배열이다.
# raw_X 를 2차원 배열로 만드는 이유는 intercept_value(절편값)을 넣어주기 위해서이다.
raw_X[:5],y[:5]

Out[5]:

(array([[108],
        [ 19],
        [ 13],
        [124],
        [ 40]], dtype=int64),
 array([392.5,  46.2,  15.7, 422.2, 119.4]))

In [6]:

np.ones((len(raw_X),1))[:3]

Out[6]:

array([[1.],
       [1.],
       [1.]])

여기서 raw_X 에다가 np.ones((len(raw_X,1)를 concatenate 시켜줬는데, 이는 계수값을 추가시켜 준 것이다.

In [7]:

# X 값에 절편값을 추가해주었음.
X = np.concatenate((np.ones((len(raw_X),1)), raw_X),axis = 1)
X[:5]

Out[7]:

array([[  1., 108.],
       [  1.,  19.],
       [  1.,  13.],
       [  1., 124.],
       [  1.,  40.]])

In [8]:

w = np.random.normal((2,1))
w

Out[8]:

array([1.9471694, 0.6742761])

4. random 값으로 예측된, Weight 상수를 plot 시켜보기

In [9]:

# random 한 w 값을 주어서 X.dot(W)를 진행하여 예측 y값을 도축
plt.figure(figsize=(10,5))
y_predict = np.dot(X,w)
plt.plot(raw_X,y,'o',alpha= 0.5)
plt.plot(raw_X,y_predict)

Out[9]:

[<matplotlib.lines.Line2D at 0x22815e866d0>]

5. 모델 함수 Gradient Descent

In [10]:

# X 와 theta(Weight) 값을 곱해주어 계산된 y 값을 리턴해준다.
def hypothesis_function(X, theta):
    return X.dot(theta)

In [11]:

def cost_function(h, y):
    return (1/(2*len(y))) * np.sum((h-y)**2)

In [12]:

h = hypothesis_function(X,w)
cost_function(h, y)

Out[12]:

5903.2075804735705

주의 할 것이 t0 와 t1 을 바로바로 업데이트 시켜주면 안되고, t0,t1을 담아두었다가 같이 업데이트 시켜주어야함.(simultaneously_)

- t0 값 구하기

-t1 값 구하기

J 값을 w0 로 미분하여 변화값을 구하는 함수(w0로 미분하게 되면, w1 값이 남게 되는 것 주의)

In [13]:

def gradient_descent(X, y, w, alpha, iterations):
    theta = w
    m = len(y)
    
    theta_list = [theta.tolist()]
    cost = cost_function(hypothesis_function(X, theta), y)
    cost_list = [cost]

    for i in range(iterations):
        # t0 값은 J값을 t0로 미분하게되면 , x값이 소멸됨.
        t0 = theta[0] - (alpha / m) * np.sum(np.dot(X, theta) - y)
        # t1 값은 J값을 t1로 미분하게되면 , x값이 남게 된다.
        t1 = theta[1] - (alpha / m) * np.sum((np.dot(X, theta) - y) * X[:,1])
        theta = np.array([t0, t1])
        
        if i % 10== 0:
            theta_list.append(theta.tolist())
            cost = cost_function(hypothesis_function(X, theta), y)
            cost_list.append(cost)


    return theta, theta_list, cost_list

6. 모델예측 (Learning Rate 와 Iteration 설정)

In [14]:

iterations = 10000
alpha = 0.001

theta, theta_list, cost_list = gradient_descent(X, y, w, alpha, iterations)
cost = cost_function(hypothesis_function(X, theta), y)

print("theta:", theta)
print('cost:', cost_function(hypothesis_function(X, theta), y))

theta: [19.87989048  3.41629794]
cost: 625.3742849235086

7. 함수 plot 시켜 보기

In [15]:

theta_list=np.array(theta_list)

In [16]:

plt.figure(figsize=(10,5))
y_predict_step = np.dot(X,theta_list.transpose())
y_predict_step
plt.plot(raw_X,y,"o",alpha = 0.5)
for i in range(0,len(cost_list),100):
    plt.plot(raw_X,y_predict_step[:,i], label='Line %d'%i)
# legend 는 범례로 설정값 적용
plt.legend(bbox_to_anchor = (1.05,1), loc= 2, borderaxespad=0.)
plt.show()

'머신러닝(MACHINE LEARNING) > 간단하게 이론(Theory...)' 카테고리의 다른 글

Decision Tree 에서의 ID3 알고리즘 (0)	2021.04.25
간단한 LinearRegression 으로 Boston_price 예측 (1)	2021.04.22
Train_Test_Split & Holdout Sampling (0)	2021.04.22
Gradient_descent(경사 하강법) (0)	2021.04.22
Normal Equation(정규방정식) (0)	2021.04.19

ABOUT ME

Guru_Park의 블로그

1. 모듈 삽입

2. LOAD DATASET 여기서, Load_excel()로 import

3. 데이터 plot(X,y)

4. random 값으로 예측된, Weight 상수를 plot 시켜보기

5. 모델 함수 Gradient Descent

6. 모델예측 (Learning Rate 와 Iteration 설정)

7. 함수 plot 시켜 보기

'머신러닝(MACHINE LEARNING) > 간단하게 이론(Theory...)' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. 모듈 삽입

2. LOAD DATASET 여기서, Load_excel()로 import

3. 데이터 plot(X,y)

4. random 값으로 예측된, Weight 상수를 plot 시켜보기

5. 모델 함수 Gradient Descent

6. 모델예측 (Learning Rate 와 Iteration 설정)

7. 함수 plot 시켜 보기

'머신러닝(MACHINE LEARNING) > 간단하게 이론(Theory...)' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바