WordCloud词云库实战(二)

写在前面

昨天我们讲了英文词云绘制,今天我们来试试中文词云,首先我们需要一本道德经

读取文件

#-*- coding:utf-8 -*-with open('C:\\Users\\Administrator\\Desktop\\daode.txt',errors='ignore') as read_file:#读取文本 data=read_file.read() print(data)

读取出来咋用啊,还是逐行读取为字符串吧

data = ''with open('C:\\Users\\Administrator\\Desktop\\daode.txt',errors='ignore') as f:#逐行读取文本为str for line in f.readlines(): line = line.strip() data += line print(data)

去一下标点符号

from string import punctuationstr = dataadd_punc=',。、【】“”:;()《》'’{}?!⑦()、%^>℃:.”“^-——=擅长于的&#@¥' # 去除字符串内的符号all_punc = punctuation + add_punctemp = []for c in str: if c not in all_punc : temp.append(c)newText = ''.join(temp)print(newText)

去除数字

from string import digitss = newTextremove_digits = str.maketrans('', '', digits)#去除字符串内的数字res = s.translate(remove_digits)print(res)

结巴(jieba)分词

import jiebamytext = " ".join(jieba.cut(res))print(mytext)

可视化

import wordcloudc = wordcloud.WordCloud(background_color='white')#1.配置对象参数,背景色换为白色wenzi = "He is busy every day. He has many thing to do. He has no time to go home for lunch. He gets home at 7:00 p.m. At home he does the housework. He cooks nice dishes for mother and me."c.generate(mytext) #2.加载词云文本c.to_file("pywordcloud.png")#3.输出词云文件

懵逼了吧,宝儿,这是因为matplotlib默认字体是不包含中文的,所以我们要给他的参数定义一个字体

import wordcloudc = wordcloud.WordCloud(font_path="msyh.ttc",background_color='white')#1.配置对象参数,背景色换为白色wenzi = "He is busy every day. He has many thing to do. He has no time to go home for lunch. He gets home at 7:00 p.m. At home he does the housework. He cooks nice dishes for mother and me."c.generate(mytext) #2.加载词云文本c.to_file("pywordcloud.png")#3.输出词云文件
(0)

相关推荐