理赔预测

论坛 期权论坛 脚本     
匿名技术用户   2020-12-21 13:30   38   0


分析各个变量对理赔的影响:

import pandas as pd
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def anova(frame,qualitative):
    anv=pd.DataFrame()
    anv['feature']=qualitative
    pvals=[]
    for c in qualitative:
        samples=[]
        for cls in frame[c].unique():
            s=frame[frame[c]==cls]['Y'].values
            samples.append(s)
        pval=stats.f_oneway(*samples)[1]
        pvals.append(pval)
    anv['pval']=pvals
    return anv.sort_values('pval')



path_train='D:/compete/PINGAN-2018-train_demo.csv'
train_data=pd.read_csv(path_train)
train_data.columns = ["TERMINALNO", "TIME", "TRIP_ID", "LONGITUDE", "LATITUDE","DIRECTION","HEIGHT","SPEED","CALLSTATE", "Y"]
train_x=train_data.drop( 'Y',axis=1)
train_y=train_data['Y']
quantity = [attr for attr in train_x.columns if train_x.dtypes[attr] != 'object']
quality = [attr for attr in train_x.columns if train_x.dtypes[attr] == 'object']

a=anova(train_data,quantity)
print(a['pval'].values)
a['disparity']=np.log(1./a['pval'].values)
fig,ax=plt.subplots(figsize=(16,8))
sns.barplot(data=a,x='feature',y='disparity')
x=plt.xticks(rotation=90)
plt.show()

分析打电话的状态对理赔的影响:




发现一个很好用的数据可视化工具seaborn,是matplotlib基础上更高级的封装,作图更美观。

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:7942463
帖子:1588486
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP