通常,当我提供关于GLM的课程时,我会尝试坚持链接功能可能比分发更重要的事实。为了说明,请考虑以下数据集,并进行5次观察
X = (1,2,3,4,5)
= (1,2,4,2,6)
base = data.frame(x,y)
然后考虑几种模型,具有各种分布,以及一个身份链接; 在那种情况下

或日志链接功能,以便:

regNId = glm(y ~ x,family = gaussian(link = “ identity”),data = base)
regNlog = glm(y ~ x,family = gaussian(link = “ log”),data = base)
regPId = glm(y ~ x,family = poisson(link = “ identity”),data = base)
regPlog = glm(y ~ x,family = poisson(link = “ log”),data = base)
regGId = glm(y ~ x,family = Gamma(link = “ identity”),data = base)
regGlog = glm(y ~ x,family = Gamma(link = “ log”),data = base)
regIGId = glm(y ~ x,family = inverse.gaussian(link = “ identity”),data = base)
regIGlog = glm(y ~ x,family = inverse.gaussian(link = “ log”),data = base
人们还可以考虑一些Tweedie分布,更为一般:
图书馆(statmod)
regTwId = glm(y ~ x,family = tweedie(var.power = 1.5,link.power = 1),data = base)
regTwlog = glm(y ~ x,family = tweedie(var.power = 1.5,link.power = 0),data = base)
考虑在第一种情况下获得的预测,使用线性链接函数:
图书馆(RColorBrewer)
darkcols = brewer.pal(8,“ Dark2”)
情节(x,y,pch = 19)
abline(regNId,col = darkcols [ 1 ])
abline(regPId,col = darkcols [ 2 ])
abline(regGId,col = darkcols [ 3 ])
abline(regIGId,col = darkcols [ 4 ])
abline(regTwId,lty = 2)

预测非常接近,不是吗?在指数预测的情况下,我们获得:
情节(x,y,pch = 19)
= SEQ(0.8,5.2,通过= 0.01)
行(u,predict(regNlog,newdata = data.frame(x = u),type = “ response”),col = darkcols [ 1 ])
行(u,predict(regPlog,newdata = data.frame(x = u),type = “ response”),col = darkcols [ 2 ])
line(u,predict(regGlog,newdata = data.frame(x = u),type = “ response”),col = darkcols [ 3 ])
行(u,predict(regIGlog,newdata = data.frame(x = u),type = “ response”),col = darkcols [ 4 ])
行(u,predict(regTwlog,newdata = data.frame(x = u),type = “ response”),lty = 2)

我们实际上可以看得更近。例如,在线性情况下,考虑使用Tweedie模型获得的斜率(实际上将包括此处提到的所有参数族)。
连珠= 函数(伽马)摘要(GLM( X,家族= 特威迪(var.power = 伽马,link.power = 1),数据= 基))$ 系数 [ 2,1 :2 ]
Vgamma = SEQ(- 0.5,3.5,通过= 0.05)
Vpente = Vectorize(pente)(Vgamma)
情节(Vgamma,Vpente [ 1,],类型= “ L” ,LWD = 3,ylim = (0.965,1.03),xlab = “ 功率”,ylab = “ 斜率”)

这里的坡度总是非常接近一个!如果我们添加置信区间,甚至更多:
情节(Vgamma,Vpente [ 1,])
线(Vgamma,Vpente [ 1,] + 1.96 * Vpente [ 2,],lty = 2)
线(Vgamma,Vpente [ 1,] - 1.96 * Vpente [ 2,],lty = 2)

启发式地,对于Gamma回归或逆高斯回归,因为方差是预测的幂,如果预测很小(这里在左边),方差应该很小。因此,在图表的左侧,误差应该很小,方差函数的功率更高。这确实是我们在这里观察到的。
erreur = function(gamma)predict(glm(y ~ x,family = tweedie(var.power = gamma,link.power = 1),data = base),newdata = data.frame(x = 1),type = “ 回应“)- y [ x == 1 ]
Verreur = Vectorize(erreur)(Vgamma)
plot(Vgamma,Verreur,type = “ l”,lwd = 3,ylim = c(- .1,.04),xlab = “ power”,ylab = “ error”)
abline(h = 0,lty = 2)

当然,我们可以用指数模型做同样的事情:
连珠= 函数(伽马)摘要(GLM( X,家族= 特威迪(var.power = 伽马,link.power = 0),数据= 基))$ 系数 [ 2,1 :2 ]
Vpente = Vectorize(pente)(Vgamma)
plot(Vgamma,Vpente [ 1,],type = “ l”,lwd = 3)

或者,如果我们添加置信区间,我们会得到:
plot(Vgamma,Vpente [ 1,],ylim = c(0,.8),type = “ l”,lwd = 3,xlab = “ power”,ylab = “ slope”)
线(Vgamma,Vpente [ 1,] + 1.96 * Vpente [ 2,],lty = 2)
线(Vgamma,Vpente [ 1,] - 1.96 * Vpente [ 2,],lty = 2)

所以在这里,“斜率”非常相似......如果我们看一下我们在图的左侧部分所做的错误,我们得到:
erreur = function(gamma)predict(glm(y ~ x,family = tweedie(var.power = gamma,link.power = 0),data = base),newdata = data.frame(x = 1),type = “ 回应“)- y [ x == 1 ]
Verreur = Vectorize(erreur)(Vgamma)
plot(Vgamma,Verreur,type = “ l”,lwd = 3,ylim = c(.001,.32),xlab = “ power”,ylab = “ error”)

|