GBDT、XGBoost、LightGBM算法公式推导

论坛 期权论坛 脚本     
匿名技术用户   2020-12-27 01:22   834   0

一、GBDT公式推导

1、第一个基函数:
(1.1)F0(X)=12log1+y1yF_0(X)=\frac{1}{2}log\frac{1+\overline{y}}{1-\overline{y}} \tag{1.1}
对于损失函数L(y,F)=log(1+e2yF),y{1,1}L(y, F)=log(1+e^{-2yF}), y\in \left\{-1, 1\right\},求损失函数最小对应的FF值。求一阶导数:
(1.2)L=2ye2yF1+e2yF L^{'}=\frac{-2ye^{-2yF}}{1+e^{-2yF}} \tag{1.2}

(1.3){count(y=+1)=m1count(y=1)=m2 \begin{cases} &count(y=+1)=m1\\ &count(y=-1)=m2 \end{cases} \tag{1.3}
(1.7)L=y=1L+y=1L=n2(1+y)2e2F1+e2F+n2(1y)2e2F1+e2F=n2(1+y)21+e2F+n2(1y)2e2F1+e2F=n1+e2F[(1+y)+e2F(1y)] \begin{aligned} L^{'}&=\sum_{y=1}L^{'}+\sum_{y=-1}L^{'}\\\\ &=\frac{n}{2}(1+\overline{y})*\frac{-2e^{-2F}}{1+e^{-2F}} + \frac{n}{2}(1-\overline{y})*\frac{2e^{2F}}{1+e^{2F}}\\\\ &=\frac{n}{2}(1+\overline{y})*\frac{-2}{1+e^{2F}} + \frac{n}{2}(1-\overline{y})*\frac{2e^{2F}}{1+e^{2F}}\\\\ &=\frac{n}{1+e^{2F}}[-(1+\overline{y}) + e^{2F}(1-\overline{y})] \end{aligned} \tag{1.7}
则有
(1.4){m1+m2=nm1m2n=y \begin{cases} m1+m2=n\\ \frac{m1-m2}{n}=\overline{y} \end{cases} \tag{1.4}

(1.5){m1=n2(1+y)m2=n2(1y) \begin{cases} m1=\frac{n}{2}(1+\overline{y})\\ m2=\frac{n}{2}(1-\overline{y}) \end{cases} \tag{1.5}

L=0\sum{L^{'}=0}得到,
(1.6)L=y=1L+y=1L=0\sum{L^{'}=\sum_{y=1}L^{'}+\sum_{y=-1}L^{'}=0} \tag{1.6}
(1.5)(1.5)带入(1.6)(1.6),得到,
(1.7)L=y=1L+y=1L=n2(1+y)2e2F1+e2F+n2(1y)2e2F1+e2F=n2(1+y)21+e2F+n2(1y)2e2F1+e2F=n1+e2F[(1+y)+e2F(1y)] \begin{aligned} L^{'}&=\sum_{y=1}L^{'}+\sum_{y=-1}L^{'}\\\\ &=\frac{n}{2}(1+\overline{y})*\frac{-2e^{-2F}}{1+e^{-2F}} + \frac{n}{2}(1-\overline{y})*\frac{2e^{2F}}{1+e^{2F}}\\\\ &=\frac{n}{2}(1+\overline{y})*\frac{-2}{1+e^{2F}} + \frac{n}{2}(1-\overline{y})*\frac{2e^{2F}}{1+e^{2F}}\\\\ &=\frac{n}{1+e^{2F}}[-(1+\overline{y}) + e^{2F}(1-\overline{y})] \end{aligned} \tag{1.7}
由式(1.7)=0(1.7)=0解得F0=12ln1+y1yF_0=\frac{1}{2}ln\frac{1+\overline{y}}{1-\overline{y}},当F<F0F<F_0L<0L^{'}<0;当F>F0F>F_0L>0L^{'}>0,所以F0F_0即为最小值点。得证。

2、最优叶子节点取值:
(1.8)γjm=xiRjmy~i/xiRjmy~i(2y~i),j=1,J\gamma_{jm}=\sum_{x_{i}\in R_{jm}}\widetilde{y}_i/\sum_{x_{i}\in R_{jm}}|\widetilde{y}_i|(2-|\widetilde{y}_i|),j=1,J\tag{1.8}
由文献[1][1]中式(23)(23)可得优化目标函数:
(1.9)γjm=argminγxiRjmlog(1+e2yi(Fm1(xi)+γ))\gamma_{jm}=arg\mathop{min}\limits_{\gamma}\sum\limits_{x_i\in{R_jm}}log(1+e^{-2y_i(F_{m-1}(x_i)+\gamma)}) \tag{1.9}
由文献[1][1]中式(22)(22)可得损失函数负梯度:
(1.10)y~i=[L(yi,F(xi))F(xi)]F(x)=Fm1(x)=2yi1+e2yiFm1(xi)\widetilde{y}_i=-[\frac{\partial{L(y_i,F(x_i))}}{\partial{F(x_i)}}]_{F(x)=F_{m-1}(x)}=\frac{2y_i}{1+e^{2y_iF_{m-1}(x_i)}} \tag{1.10}

(1.11)e2yiFm1(xi)=2yiy~1e^{2y_iF_{m-1}(x_i)}=\frac{2y_i}{\widetilde{y}}-1 \tag{1.11}
对式(1.9)(1.9)Fm1F_{m-1}处进行泰勒展开,得到,
(1.12)γjm=argminγxiRjm[LFm1+LFm1γ+12LFm1γ2]\gamma_{jm}=arg\mathop{min}\limits_{\gamma}\sum\limits_{x_i\in{R_jm}}[L_{F_{m-1}}+L_{F_{m-1}}^{'}*\gamma+\frac{1}{2}L_{F_{m-1}}^{''}*\gamma^{2}] \tag{1.12}
其中,
(1.13){LFm1=log(1+e2yiFm1(xi))LFm1=y~LFm1=e2yiFm1y~2 \begin{cases} L_{F_{m-1}}=log(1+e^{-2y_iF_{m-1}(x_i)})\\\\ L_{F_{m-1}}^{'}=-\widetilde{y}\\\\ L_{F_{m-1}}^{''}=e^{2y_iF_{m-1}}*\widetilde{y}^{2} \end{cases}\tag{1.13}

由于式(1.12)(1.12)LFm1>0L_{F_{m-1}}^{''}>0,所以存在全局最小值点。式(1.12)(1.12)γ\gamma求导等于0的点即为最小值点,即
(1.14)xiRjm[LFm1+LFm1γjm]xiRjm[LFm1+LFm1γjm]=0γjm=xiRjmLFm1xiRjmLFm1γjm=xiRjmy~xiRjm(2yiy~1)y~2γjm=xiRjmy~xiRjm(2yiy~)y~ \begin{matrix} &&\sum\limits_{x_i\in{R_jm}}[L_{F_{m-1}}^{'}+L_{F_{m-1}}^{''}*\gamma_{jm}]\\\\ &\Longrightarrow&\sum\limits_{x_i\in{R_jm}}[L_{F_{m-1}}^{'}+L_{F_{m-1}}^{''}*\gamma_{jm}]=0\\\\ &\Longrightarrow&\gamma_{jm}=-\frac{\sum\limits_{x_i\in{R_jm}}L_{F_{m-1}}^{'}}{\sum\limits_{x_i\in{R_jm}}L_{F_{m-1}}^{''}}\\\\ &\Longrightarrow&\gamma_{jm}=\frac{\sum\limits_{x_i\in{R_jm}}\widetilde{y}}{\sum\limits_{x_i\in{R_jm}}(\frac{2y_i}{\widetilde{y}}-1)*\widetilde{y}^{2}}\\\\ &\Longrightarrow&\gamma_{jm}=\frac{\sum\limits_{x_i\in{R_jm}}\widetilde{y}}{\sum\limits_{x_i\in{R_jm}}(2y_i-\widetilde{y})*\widetilde{y}} \end{matrix}\tag{1.14}
由式(1.10)(1.10)可知,当yi=1y_i=1时:y~i(0,2)\widetilde{y}_i\in(0, 2);当yi=1y_i=-1时:y~i(2,0)\widetilde{y}_i\in(-2,0)。所以(1.14)(1.14)可化简为
γjm=xiRjmy~xiRjm(2y~)y~ \gamma_{jm}=\frac{\sum\limits_{x_i\in{R_jm}}\widetilde{y}}{\sum\limits_{x_i\in{R_jm}}(2-|\widetilde{y}|)*|\widetilde{y}|}
得证.

二、XGBoost公式推导

1、XGBoost目标函数泰勒展开:
(2.1){L(t)=i=1nl(yi,yi^t1+ft(xi))+Ω(ft)i=1n[l(yi,yi^t1)+gifi(xi)+12hift2(xi)]+Ω(ft) \begin{cases} L^{(t)}&=\sum_{i=1}^{n} l(y_i,\hat{y_i}^{t-1}+f_t(x_i))+\Omega(f_t) \\\\ &\approx\sum_{i=1}^{n}[ l(y_i,\hat{y_i}^{t-1})+g_if_i(x_i)+\frac{1}{2}h_if_t^2(x_i)]+\Omega(f_t) \tag{2.1} \end{cases}
由于 l(yi,yi^t1)l(y_i,\hat{y_i}^{t-1}) 只跟前一棵树的预测结果有关,所以本次迭代时为定值,在计算本次迭代的损失函数时可以忽略。因此,本轮需要需要的目标函数为:
(2.2){L(t)=i=1n[gift(xi)+12hift2(xi)]+Ω(ft)=i=1n[gift(xi)+12hift2(xi)]+γT+12λj=1Twj2=j=1T[(iIjgi)wj+12(iIjhi+λ)wj2]+γT \begin{cases} L^{'(t)}&=\sum_{i=1}^{n} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \Omega(f_t) \\\\ &=\sum_{i=1}^{n} [g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)] + \gamma T + \frac{1}{2}\lambda \sum_{j=1}^{T}w_j^2 \\\\ &=\sum_{j=1}^{T} [(\sum_{i\in I_j}g_i)w_j + \frac{1}{2}(\sum_{i\in I_j}h_i + \lambda)w_j^2] + \gamma T \tag{2.2}\\ \end{cases}

2、logitloss损失函数
① 第一种形式:
(2.3)L=yln(1+ey^)+(1y)ln(1+ey^) L = y *ln(1+ e^{-\hat{y}}) + (1-y) *ln(1+ e^{\hat{y}}) \tag{2.3}
(2.4){L=y+(1y)ey^1+ey^L=ey^(1+ey^)2 \begin{cases} L^{'}&=&\frac{-y+(1-y)*e^{\hat y}}{1+e^{\hat{y}}}\\\\ L^{''}&=&\frac{e^{\hat{y}}}{(1+e^{\hat{y}})^2} \end{cases}\tag{2.4}
由于y^(0,1)\hat{y}\in{(0,1)},所以 L(e(1+e)2,0.25).L^{''}\in{(\frac{e}{(1+e)^2}, 0.25)}.

② 第二种形式:
(2.5)L=[yln(y^)+(1y)ln(1y^)] L = -[y * ln(\hat{y}) + (1-y)*ln(1-\hat{y}) ] \tag{2.5}
(2.6){L=yy^+1y1y^L=yy^2+1y1y^2 \begin{cases} L^{'} &=& \frac{-y}{\hat{y}} + \frac{1-y}{1-\hat{y}}\\\\ L^{''} &=& \frac{y}{\hat{y}^2} + \frac{1-y}{1-\hat{y}^2} \end{cases}\tag{2.6}
由于y^(0,1)\hat{y}\in{(0,1)},所以 L(1,+)L^{''}\in{(1, +\infty)}

参考文献

[1].Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine[J]. The Annals of Statistics. 29(5):1189-1232.
[2].Tianqi Chen, Carlos Guestrin. XGBoost: A Scalable Tree Boosting System[D]. KDD’16.
Guolin Ke, Qi Meng, Thomas Finley. LightGBM: A Highly Efficient Gradient Boosting Decision Tree[D]. NIPS 2017.
[3].李航. 统计学习方法[M].138-154
[4].XGBoost入门系列第一讲. https://zhuanlan.zhihu.com/p/27816315

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:7942463
帖子:1588486
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP