NLP之TEA:基于SnowNLP实现自然语言处理之对输入文本进行情感分析(分词→词性标注→拼音&简繁转换→情感分析→测试)
NLP之TEA:基于SnowNLP实现自然语言处理之对输入文本进行情感分析(分词→词性标注→拼音&简繁转换→情感分析→测试)
NLP分词
sentence = u"今年春节档的电影,我尤其喜欢吴京主演的电影《流浪地球》"
s = SnowNLP(sentence)
print("Segmented words = {}".format(s.words))
Segmented words = ['今年', '春节', '档', '的', '电影', ',', '我', '尤其', '喜欢', '吴', '京', '主演', '的', '电影', '《', '流浪', '地球', '》']
NLP词性标注
for word, tag in s.tags:
print("Word = {}, Tag = {}\n".format(word, tag))
Tagging:
Word = 今年, Tag = t
Word = 春节, Tag = t
Word = 档, Tag = Ng
Word = 的, Tag = u
Word = 电影, Tag = n
Word = ,, Tag = w
Word = 我, Tag = r
Word = 尤其, Tag = d
Word = 喜欢, Tag = v
Word = 吴, Tag = nr
Word = 京, Tag = nr
Word = 主演, Tag = v
Word = 的, Tag = u
Word = 电影, Tag = n
Word = 《, Tag = w
Word = 流浪, Tag = vn
Word = 地球, Tag = n
Word = 》, Tag = w
NLP情感分析—TEA
print("Sentiment score = {}".format(s.sentiments))
Sentiment score = 0.999991806695989
NLP常见功能(输出拼音、支持方法)
print("Pinyin = {}".format(s.pinyin))
print(dir(s))
Pinyin = ['jin', 'nian', 'chun', 'jie', 'dang', 'de', 'dian', 'ying', ',', 'wo', 'you', 'qi', 'xi', 'huan', 'wu', 'jing', 'zhu', 'yan', 'de', 'dian', 'ying', '《', 'liu', 'lang', 'di', 'qiu', '》']
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'bm25', 'doc', 'han', 'idf', 'keywords', 'pinyin', 'sentences', 'sentiments', 'sim', 'summary', 'tags', 'tf', 'words']
sentence = u"春節檔的電影,我尤其喜歡吳京主演的電影"
print("简繁转换 = {}".format(s.han))
简繁转换 = 春节档的电影,我尤其喜欢吴京主演的电影
NLP测试
sentence = u"明天早上举行2019届全明星比赛,我会看直播,因为我特别喜欢詹姆斯、韦德、杜兰特、库里"
print("Sentiment score = {}".format(s.sentiments))
sentence = u"明天早上有比赛"
print("Sentiment score = {}".format(s.sentiments))
sentence = u"明天,上海又要下雨,我特别不喜欢下雨的天气!"
print("Sentiment score = {}".format(s.sentiments))
Sentiment score = 0.9713889788637894
Sentiment score = 0.4228962549024792
Sentiment score = 0.031366312726148315