[ML] 機器學習基石:第十講 Logistic Regression

ML:基礎學習
課程:機器學習基石
簡介:第十講 Logistic Regression

x=(x0,x1,x2,,xd)s=i=0dwixiθ(s)=es1+es=11+esh(x)=11+ewTx

比較

s=wTx

Ein(w) 的定義

cross-entropy error err(w,xn,yn)=ln(1+eynwnTx)Ein(w)=1Nn=1Nerr(w,xn,yn)
target function f(x)=P(+1|x)P(y|x)={f(x)for y=+11f(x)for y=1 f 是正確的,所以 f 產生 D 這些資料的機率是很大的,甚至是 1
所以從 h 找出最大機率的 g 就能接近 f
Assume likelihood(h) probability using flargeIf D={(x1,o),(x2,x)}P(o)=P(x1)P(o|x1)=P(x1)f(x1)P(x1)h(x1)P(x)=P(x2)P(x|x2)=P(x2)(1f(x2))P(x2)(1h(x2))=g=argmaxh likelihood(h)h(x)=θ(wTx)1h(x)=h(x)likelihood(h)=P(x1)h(x1)×P(x2)(1h(x2))×P(xn)(1h(xn))=P(x1)h(x1)×P(x2)(h(x2))×P(xn)(h(xn))P(x1)=P(x2)==P(xn)likelihood(h)n=1Nh(ynxn)h is logistic functionmaxh likelihood(logistic h)maxh n=1Nh(ynxn)maxw likelihood(w)maxw n=1Nθ(ynwnTx)maxw lnn=1Nθ(ynwnTx)maxw n=1Nlnθ(ynwnTx)minw n=1Nlnθ(ynwnTx)minw 1Nn=1Nlnθ(ynwnTx)θ(s)=11+es:minw 1Nn=1Nln(11+eynwnTx)=minw 1Nn=1Nln(1+eynwnTx)=minw 1Nn=1Nerr(w,xn,yn)=minw Ein(w) 那為何跟 cross-entropy error 有關呢?
首先 cross-entropy 的定義為 H(p,q)=xp(x)logq(x)p(x) 是目標值,不是 1 就是 0
q(x) 則是實際的機率,代入 logistic function 做為機率的模型,也可用其他的 sigmoid function 取代
再利用 cross-entropy 比較兩者的差異
Ein=xp(x)logq(x)=xp(x)logq(x)+(1p(x))log(1q(x))q(x)=11+ex=xp(x)log11+ex+(1p(x))log(111+ex)=xp(x)log(1+ex)+(1p(x))log(ex1+ex)=xp(x)log(1+ex)+(1p(x))(xlog(1+ex))=xp(x)log(1+ex)(1p(x))(xlog(1+ex))=xp(x)log(1+ex)+x+log(1+ex)p(x)xp(x)log(1+ex))=xx+log(1+ex)p(x)x 而在之前的推導,若不將 1h(x)=h(x),而是維持,改用 f(x) 表示
Assume likelihood(h) probability using flargeIf D={(x1,o),(x2,x)}P(o)=P(x1)P(o|x1)=P(x1)f(x1)P(x1)h(x1)P(x)=P(x2)P(x|x2)=P(x2)(1f(x2))P(x2)(1h(x2))=g=argmaxh likelihood(h)likelihood(h)=P(x1)h(x1)×P(x2)(1h(x2))×P(xn)(1h(xn))P(x1)=P(x2)==P(xn)likelihood(h)h(x1)×(1h(x2))×(1h(xn))=n=1N(h(xn))f(xn)(1h(xn))1f(xn)likelihood(h)logn=1N(h(xn))f(xn)(1h(xn))1f(xn)=n=1Nf(xn)log(h(xn)+(1f(xn))log(1h(xn)) 可看到此項與先前推導的 cross-entropy 的第二項是一樣的
y 的值會影響 f(x)
{f(x)=1for y=+1f(x)=0for y=1

最小化 Ein(w)

因為 Ein(w) 連續可微、二次可微、convex function,故可用梯度求解 Ein(w)=0
Ein(w)=1Nn=1Nθ(ynwnTx)(ynxn)=0
利用微分連鎖率 Ein(w)=1Nn=1Nln(1+eynwnTx)Ein(w)wi=1Nn=1Nln(1+eynwnTx)=1Nn=1N11+eynwnTx×eynwnTx×ynxn,i=1Nn=1Nθ(ynwnTx)×ynxn,iEin(w)=1Nn=1Nθ(ynwnTx)(ynxn)

更新方程式

wt+1wt+ηv 以 PLA 為例 Double subscripts: use braces to clarify

最小化的方法

wt+1wt+ηvwt+1wtηEin(wt)Ein(wt)
minv=1 Ein(wt+ηv) for η>0 從一小段線段來看且 η 夠小,從泰勒展開的餘項,或微分定理都可得到
f(x)=limdx0f(x+dx)f(x)dxEin(wt+ηv)Ein(wt)+ηvTEin(wt) 那麼如何最小化 error 呢?
也就是令 vTEin(wt) 為負最多

v=Ein(wt)Ein(wt)

如何選定 η

ηEin(wt) 遞減
η=ηEin(wt) wt+1wtηEin(wt)Ein(wt)wtηEin(wt) η 為 fixed learning rate

演算法

  1. 初始化 w0
  2. For t=0,1,
    1. 計算
    2. Double subscripts: use braces to clarify
    3. 修正錯誤
    4. wt+1wtηEin(wt)
    5. 直到 Ein(wt)=0 或 大概為 0 或 足夠的次數
  3. 得到 g

Python 原始碼
  1. import matplotlib.pyplot as plt
  2. import numpy as np
  3. import operator
  4. import math
  5. # 網路上找的 dataset 可以線性分割
  6. rawData = [
  7. ((-0.4, 0.3), -1),
  8. ((-0.3, -0.1), -1),
  9. ((-0.2, 0.4), -1),
  10. ((-0.1, 0.1), -1),
  11. ((0.9, -0.5), 1),
  12. ((0.7, -0.9), 1),
  13. ((0.8, 0.2), 1),
  14. ((0.4, -0.6), 1),
  15. ((0.2, 0.6), -1),
  16. ((-0.5, -0.5), -1),
  17. ((0.7, 0.3), 1),
  18. ((0.9, -0.6), 1),
  19. ((-0.1, 0.2), -1),
  20. ((0.3, -0.6), 1),
  21. ((0.5, 0.1), -1), ]
  22. # 加入 x0
  23. dataset = [((1,) + x, y) for x, y in rawData]
  24. # 內積
  25. def dot(*v):
  26. return sum(map(operator.mul, *v))
  27.  
  28. # 計算向量長度
  29. def vlen(v):
  30. square = map(operator.mul, v, v)
  31. return sum(square) ** (1/2)
  32. # 取 thita
  33. def thita(v):
  34. return 1 / ( 1 + math.exp( -v ))
  35.  
  36. # 計算梯度
  37. def gradient(w, dataset):
  38. setsV = []
  39. for x, y in dataset:
  40. setsV.append( map(operator.mul, [thita(-y*dot(w, x)) * (-y)] * len(x), x) )
  41. sum = [0] * len(x)
  42. for v in setsV:
  43. sum = map(operator.add, sum, v)
  44. sum = map(operator.truediv, sum, [1]*len(x))
  45. return list(sum)
  46. # 更新 w
  47. def update(w, n, g):
  48. u = map(operator.mul, [n] * len(g), g)
  49. w = map(operator.sub, w, u)
  50. return list(w)
  51. # LogisticRegression 演算法實作
  52. def LogisticRegression(dataset):
  53. # 初始化 w
  54. w = [0] * 3
  55. t = 0
  56. n = 0.1
  57. max_number = 100
  58. while True:
  59. g = gradient(w, dataset)
  60. print("g {:.5f} => {}: {}".format(vlen(g), t, tuple(w)))
  61. w = update(w, n, g)
  62. t += 1
  63. if vlen(g) < 0.2 or t > max_number:
  64. break
  65. return w
  66. # 主程式
  67. def main():
  68. # 執行
  69. w = LogisticRegression(dataset)
  70. # 畫圖
  71. fig = plt.figure()
  72. # numrows=1, numcols=1, fignum=1
  73. ax1 = fig.add_subplot(111)
  74. xx = list(filter(lambda d: d[1] == -1, dataset))
  75. ax1.scatter([x[0][1] for x in xx], [x[0][2] for x in xx],
  76. s=100, c='b', marker="x", label='-1')
  77. oo = list(filter(lambda d: d[1] == 1, dataset))
  78. ax1.scatter([x[0][1] for x in oo], [x[0][2] for x in oo],
  79. s=100, c='r', marker="o", label='1')
  80. l = np.linspace(-2, 2)
  81. # w0 + w1x + w2y = 0
  82. # y = -w0/w2 - w1/w2 x
  83. if w[2]:
  84. a, b = -w[1] / w[2], -w[0] / w[2]
  85. ax1.plot(l, a * l + b, 'b-')
  86. else:
  87. ax1.plot([-w[0] / w[1]] * len(l), l, 'b-')
  88. plt.legend(loc='upper left', scatterpoints=1)
  89. plt.show()
  90. if __name__ == '__main__':
  91. main()

參考

如何通俗地解释泰勒公式?
損失函數
tf.nn.sigmoid_cross_entropy_with_logits
cross entropy的直觀理解
Cross entropy

留言