ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测
ML之FE:利用FE特征工程(分析两两数值型特征之间的相关性)对AllstateClaimsSeverity(Kaggle2016竞赛)数据集实现索赔成本值的回归预测
输出结果
1、数据集简介
Dataset之AllstateClaimsSeverity:AllstateClaimsSeverity数据集(Kaggle2016竞赛)的简介、下载、案例应用之详细攻略
2、数据可视化
T1、绘制heatmap图
T2、绘制散点图
设计思路
核心代码
threshold = 0.5
corr_list = []
for i in range(0,size):
for j in range(i+1,size):
if (data_corr.iloc[i,j] >= threshold and data_corr.iloc[i,j] < 1) or (data_corr.iloc[i,j] < 0 and data_corr.iloc[i,j] <= -threshold):
corr_list.append([data_corr.iloc[i,j],i,j])
s_corr_list = sorted(corr_list,key=lambda x: -abs(x[0]))
for v,i,j in s_corr_list:
print ("%s and %s = %.2f" % (cols[i],cols[j],v))
for v,i,j in s_corr_list:
sns.pairplot(train, size=6, x_vars=cols[i],y_vars=cols[j] )
plt.title('AllstateClaimsSeverity: Scatter plot of only the highly correlated pairs')
plt.show()
赞 (0)