ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测

ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测


输出结果

1、数据集简介

Dataset之AllstateClaimsSeverity:AllstateClaimsSeverity数据集(Kaggle2016竞赛)的简介、下载、案例应用之详细攻略

2、数据可视化

T1、绘制heatmap图

T2、绘制散点图

设计思路

核心代码

threshold = 0.5
corr_list = []
for i in range(0,size):
    for j in range(i+1,size):
        if (data_corr.iloc[i,j] >= threshold and data_corr.iloc[i,j] < 1) or (data_corr.iloc[i,j] < 0 and data_corr.iloc[i,j] <= -threshold):
            corr_list.append([data_corr.iloc[i,j],i,j])
s_corr_list = sorted(corr_list,key=lambda x: -abs(x[0]))
for v,i,j in s_corr_list:
    print ("%s and %s = %.2f" % (cols[i],cols[j],v))

for v,i,j in s_corr_list:
    sns.pairplot(train, size=6, x_vars=cols[i],y_vars=cols[j] )
    plt.title('AllstateClaimsSeverity: Scatter plot of only the highly correlated pairs')
    plt.show()
(0)

相关推荐