BinaryClassification 직접 구현

모델을 아래와 같은 순서로 생성한다.

1. 데이터 생성하기

2. 모델 생성하기

3. 모델 학습시키기

4. 모델 테스트하기

step 1. 데이터 생성

import torch

N = 20
x1 = torch.randn(int(N/2), 1)
x2 = torch.randn(int(N/2), 1)+5
class1_data = torch.hstack([x1, x2])
class2_data = torch.hstack([x2, x1])
class1_label = torch.ones(int(N/2),1)
class2_label = torch.zeros(int(N/2),1)

X = torch.vstack([class1_data, class2_data])
y = torch.vstack([class1_label, class2_label])
print(X.shape, y.shape)
torch.Size([20, 2]) torch.Size([20, 1])
import matplotlib.pyplot as plt

plt.plot(class1_data[:,0], class1_data[:,1],'o')
plt.plot(class2_data[:,0], class2_data[:,1],'o')
plt.xlabel('x1')
plt.ylabel('x2')
plt.grid()

STEP 2. 모델 만들기

from torch import nn
class MLP(nn.Module):
    def __init__(self):
        super().__init__();

        self.linear = nn.Sequential(nn.Linear(2, 100),
                               nn.Sigmoid(),
                               nn.Linear(100, 1),
                               nn.Sigmoid())
    def forward(self, x):
        x = self.linear(x)
        return x
model = MLP()
print(model)
print(model(torch.randn(5,2)))
MLP(
  (linear): Sequential(
    (0): Linear(in_features=2, out_features=100, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=100, out_features=1, bias=True)
    (3): Sigmoid()
  )
)
tensor([[0.5858],
        [0.6173],
        [0.6474],
        [0.6290],
        [0.6044]], grad_fn=)

STEP 3. 학습 시키기

optimizer는 특정 gradient를 설정해주고, 학습시킬 model의 파라미터를 전부 전달한다. 

optimizer.zero_grad(): gradient 누적값이 더 해져서 매번 초기화 시켜준다.

초기화 해주지않으면 위의 결과처럼 미분의 값이 누적된다.

 

loss는 BCE(binary classification Entropy)를 사용한다. 정확히는 mean BCE이다. 평균 옵션이 default로 지정되어있기 때문이다.

 

결과 확인: 

F.binary_cross_entropy는 -시그마(로그(y_hat**y*(1-y_hat)**(1-y)) / N을 해준다. 

from torch import optim
import torch.nn.functional as F

LR = 1e-1
EPOCH = 100

optimizer = optim.SGD(model.parameters(), lr = LR)
model.train() #train 모드로 바꿔주고

loss_history = []
for ep in range(EPOCH):

    y_hat = model(X)
    loss = F.binary_cross_entropy(y_hat, y)

    optimizer.zero_grad()
    loss.backward()

    optimizer.step()

    loss_history += [loss.item()]
    print(f'Epoch :{ep} train loss: {round(loss.item(), 3)}')
    print('-'*20)
Epoch :0 train loss: 0.439
--------------------
Epoch :1 train loss: 0.37
--------------------
Epoch :2 train loss: 0.318
--------------------
Epoch :3 train loss: 0.278
--------------------
Epoch :4 train loss: 0.247
--------------------
Epoch :5 train loss: 0.221
--------------------
Epoch :6 train loss: 0.2
--------------------
Epoch :7 train loss: 0.183
--------------------
Epoch :8 train loss: 0.168
--------------------
Epoch :9 train loss: 0.155
--------------------
Epoch :10 train loss: 0.144
--------------------
Epoch :11 train loss: 0.135
--------------------
Epoch :12 train loss: 0.126
--------------------
Epoch :13 train loss: 0.119
--------------------
Epoch :14 train loss: 0.112
--------------------
Epoch :15 train loss: 0.106
--------------------
Epoch :16 train loss: 0.101
--------------------
Epoch :17 train loss: 0.096
--------------------
Epoch :18 train loss: 0.092
--------------------
Epoch :19 train loss: 0.088
--------------------
Epoch :20 train loss: 0.084
--------------------
Epoch :21 train loss: 0.081
--------------------
Epoch :22 train loss: 0.077
--------------------
Epoch :23 train loss: 0.075
--------------------
Epoch :24 train loss: 0.072
--------------------
Epoch :25 train loss: 0.069
--------------------
Epoch :26 train loss: 0.067
--------------------
Epoch :27 train loss: 0.065
--------------------
Epoch :28 train loss: 0.063
--------------------
Epoch :29 train loss: 0.061
--------------------
Epoch :30 train loss: 0.059
--------------------
Epoch :31 train loss: 0.057
--------------------
Epoch :32 train loss: 0.056
--------------------
Epoch :33 train loss: 0.054
--------------------
Epoch :34 train loss: 0.053
--------------------
Epoch :35 train loss: 0.051
--------------------
Epoch :36 train loss: 0.05
--------------------
Epoch :37 train loss: 0.049
--------------------
Epoch :38 train loss: 0.048
--------------------
Epoch :39 train loss: 0.046
--------------------
Epoch :40 train loss: 0.045
--------------------
Epoch :41 train loss: 0.044
--------------------
Epoch :42 train loss: 0.043
--------------------
Epoch :43 train loss: 0.042
--------------------
Epoch :44 train loss: 0.041
--------------------
Epoch :45 train loss: 0.041
--------------------
Epoch :46 train loss: 0.04
--------------------
Epoch :47 train loss: 0.039
--------------------
Epoch :48 train loss: 0.038
--------------------
Epoch :49 train loss: 0.037
--------------------
Epoch :50 train loss: 0.037
--------------------
Epoch :51 train loss: 0.036
--------------------
Epoch :52 train loss: 0.035
--------------------
Epoch :53 train loss: 0.035
--------------------
Epoch :54 train loss: 0.034
--------------------
Epoch :55 train loss: 0.034
--------------------
Epoch :56 train loss: 0.033
--------------------
Epoch :57 train loss: 0.032
--------------------
Epoch :58 train loss: 0.032
--------------------
Epoch :59 train loss: 0.031
--------------------
Epoch :60 train loss: 0.031
--------------------
Epoch :61 train loss: 0.03
--------------------
Epoch :62 train loss: 0.03
--------------------
Epoch :63 train loss: 0.029
--------------------
Epoch :64 train loss: 0.029
--------------------
Epoch :65 train loss: 0.029
--------------------
Epoch :66 train loss: 0.028
--------------------
Epoch :67 train loss: 0.028
--------------------
Epoch :68 train loss: 0.027
--------------------
Epoch :69 train loss: 0.027
--------------------
Epoch :70 train loss: 0.027
--------------------
Epoch :71 train loss: 0.026
--------------------
Epoch :72 train loss: 0.026
--------------------
Epoch :73 train loss: 0.026
--------------------
Epoch :74 train loss: 0.025
--------------------
Epoch :75 train loss: 0.025
--------------------
Epoch :76 train loss: 0.025
--------------------
Epoch :77 train loss: 0.024
--------------------
Epoch :78 train loss: 0.024
--------------------
Epoch :79 train loss: 0.024
--------------------
Epoch :80 train loss: 0.023
--------------------
Epoch :81 train loss: 0.023
--------------------
Epoch :82 train loss: 0.023
--------------------
Epoch :83 train loss: 0.023
--------------------
Epoch :84 train loss: 0.022
--------------------
Epoch :85 train loss: 0.022
--------------------
Epoch :86 train loss: 0.022
--------------------
Epoch :87 train loss: 0.022
--------------------
Epoch :88 train loss: 0.021
--------------------
Epoch :89 train loss: 0.021
--------------------
Epoch :90 train loss: 0.021
--------------------
Epoch :91 train loss: 0.021
--------------------
Epoch :92 train loss: 0.02
--------------------
Epoch :93 train loss: 0.02
--------------------
Epoch :94 train loss: 0.02
--------------------
Epoch :95 train loss: 0.02
--------------------
Epoch :96 train loss: 0.02
--------------------
Epoch :97 train loss: 0.019
--------------------
Epoch :98 train loss: 0.019
--------------------
Epoch :99 train loss: 0.019
--------------------
plt.plot(loss_history)
plt.xlabel('EPOCH')
plt.ylabel('loss')
plt.show()

STEP 4. 모델 테스트하기

x1_test=torch.linspace(-10,10,30) # case 1
x2_test=torch.linspace(-10,10,30) # case 1

X1_test, X2_test=torch.meshgrid(x1_test,x2_test)
print(X1_test.shape)

X_test=torch.cat([X1_test.unsqueeze(dim=2), X2_test.unsqueeze(dim=2)], dim=2)
print(X_test.shape)

model.eval() # dropout 혹은 BN 같은거 사용했다면 train mode와 test mode 동작이 다르므로 eval()로 mode를 바꿔줘야

with torch.no_grad(): #grad_fn 계산 <- 메모리가 불필요하게 쓰인다 
    y_hat = model(X_test)

print(y_hat.shape)

Y_hat = y_hat.squeeze()

plt.figure(figsize=[10, 9]) # figsize=[가로, 세로]
ax = plt.axes(projection="3d")
ax.view_init(elev=25,azim=-140)
ax.plot_surface(X1_test,X2_test, Y_hat.numpy(), cmap="viridis", alpha=0.2)
plt.plot(class1_data[:,0],class1_data[:,1],class1_label.squeeze(),'bo')
plt.plot(class2_data[:,0],class2_data[:,1],class2_label.squeeze(),'ro')
plt.xlabel("x1")
plt.ylabel("x2")
torch.Size([30, 30])
torch.Size([30, 30, 2])
torch.Size([30, 30, 1])
Text(0.5, 0.5, 'x2')

 

import plotly.graph_objects as go

fig = go.Figure(data=[go.Surface(x=X1_test, y=X2_test, z=Y_hat, colorscale="viridis", opacity=0.5)])
fig.update_traces(contours_z=dict(show=True, usecolormap=True, highlightcolor="limegreen", project_z=True))
fig.update_layout(title='binary classification', width=700, height=600)

 

파라미터 수가 너무 많아지면, 학습이 어떻게 될까?

입력으로부터 출력까지 나가는 파라미터 수가 많아진다는 것이고 경로에 수많은 bias와 weight가 있다는 것이다. 그럴때는 learning_rate를 감소시켜서 천천히 학습시켜 많은 조금씩 학습이 진행되게 하라.(안정성을 갖게 할 수 있음)

위 그림에서 보면, 학습이 진행되지 않고 무한으로 발산하는 결과를 볼 수 있다.

근데 왜 무한이 아니고 50으로 결과가 나올까? 왜냐하면 pytorch에서 제공하는 BCE는 값이 발산하는 경우 100으로 변환시켜주기 때문이다.

학습을 틀린 경우 inf가 결과로 나와야하나, F.binary_cross_entropy는 100으로 출력되는 것을 확인할 수 있다.