ML之FE:利用FE特征工程(单个特征及其与标签关系的可视化)对RentListingInquries(Kaggle竞赛)数据集实现房屋感兴趣程度的多分类预测
ML之FE:利用FE特征工程(单个特征及其与标签关系的可视化)对RentListingInquries(Kaggle竞赛)数据集实现房屋感兴趣程度的多分类预测
输出结果
RentListingInquries(Kaggle竞赛)数据集解释
参考:Dataset之RentListingInquries:RentListingInquries(Kaggle竞赛)数据集的简介、下载、案例应用之详细攻略
2.0、【interest_level】目标变量的直方图可视化
2.2.1、【bathrooms】列,需过滤处理
T1.1、利用np.percentile()方法,ulimit(99.5),只保留某些分位数内的点去掉奇异点
T1.2、直接定量限制法,ulimit(4)
2.2.2、【bedrooms】列
2.2.3、【price】列
2.2.4、【listing_id】列
2.2.5、【Latitude&Longitude】列
2.2.6、【display_address】列
2.2.7、【building_id】列
2.2.8、【manager_id】列
2.3、查看日期型特征:
【created】、【hour】、【month】
2.4、查看图片类型特征:【photos】
2.5、查看~~类型特征:【features】
2.6、查看单词计数类型特征
2.6.1、【description】
T1.0、不采用去掉奇异点法
T1.1、利用np.percentile()方法,ulimit(99),只保留某些分位数内的点去掉奇异点
2.6.2、【num_description_words】
T1.0、不采用去掉奇异点法
T1.1、利用np.percentile()方法,ulimit(99),只保留某些分位数内的点去掉奇异点
T1.2、直接定量限制法,ulimit(500)
2.7、词云图可视化:
【display_address】、【street_address】、【features】
2.8、特征之间的相关性:【bathrooms】、【bedrooms】、【price】
设计思路
190606更新
190607更新
核心代码
后期更新……
from wordcloud import WordCloud
text = ''
text_da = ''
text_street = ''
#text_desc = ''
for ind, row in train.iterrows():
for feature in row['features']:
text = " ".join([text, "_".join(feature.strip().split(" "))])
text_da = " ".join([text_da,"_".join(row['display_address'].strip().split(" "))])
text_street = " ".join([text_street,"_".join(row['street_address'].strip().split(" "))])
#text_desc = " ".join([text_desc, row['description']])
text = text.strip()
text_da = text_da.strip()
text_street = text_street.strip()
#text_desc = text_desc.strip()
plt.figure(figsize=(12,6))
wordcloud = WordCloud(background_color='white', width=600, height=300, max_font_size=50, max_words=40).generate(text)
wordcloud.recolor(random_state=0)
plt.imshow(wordcloud)
plt.title("features: Wordcloud for features", fontsize=30)
plt.axis("off")
plt.show()
# wordcloud for display address
plt.figure()
wordcloud = WordCloud(background_color='white', width=600, height=300, max_font_size=50, max_words=40).generate(text_da)
wordcloud.recolor(random_state=0)
plt.imshow(wordcloud)
plt.title("display_address: Wordcloud for Display Address", fontsize=30)
plt.axis("off")
plt.show()
# wordcloud for street address
plt.figure()
wordcloud = WordCloud(background_color='white', width=600, height=300, max_font_size=50, max_words=40).generate(text_street)
wordcloud.recolor(random_state=0)
plt.imshow(wordcloud)
plt.title("street_address: Wordcloud for Street Address", fontsize=30)
plt.axis("off")
plt.show()