Python+wordcloud+jieba十分鐘學(xué)會(huì)生成中文詞云

來(lái)源：懂視網(wǎng) 責(zé)編：小采時(shí)間：2020-11-27 14:09:29

Python+wordcloud+jieba十分鐘學(xué)會(huì)生成中文詞云

Python+wordcloud+jieba十分鐘學(xué)會(huì)生成中文詞云:前述本文需要的兩個(gè)Python類庫(kù)jieba:中文分詞分詞工具wordcloud:Python下的詞云生成工具上節(jié)課我們學(xué)習(xí)了如何制作英文詞云，本篇我們將講解如何制作中文詞云，讀完該文章后你將學(xué)會(huì)如何將任意中文文本生成詞云代碼組成簡(jiǎn)介代碼部分來(lái)源于其他人的博客,但是因

推薦度：

點(diǎn)擊下載本文 文檔為doc格式

導(dǎo)讀Python+wordcloud+jieba十分鐘學(xué)會(huì)生成中文詞云:前述本文需要的兩個(gè)Python類庫(kù)jieba:中文分詞分詞工具wordcloud:Python下的詞云生成工具上節(jié)課我們學(xué)習(xí)了如何制作英文詞云，本篇我們將講解如何制作中文詞云，讀完該文章后你將學(xué)會(huì)如何將任意中文文本生成詞云代碼組成簡(jiǎn)介代碼部分來(lái)源于其他人的博客,但是因

前述

本文需要的兩個(gè)Python類庫(kù)

jieba:中文分詞分詞工具

wordcloud:Python下的詞云生成工具

上節(jié)課我們學(xué)習(xí)了如何制作英文詞云，本篇我們將講解如何制作中文詞云，讀完該文章后你將學(xué)會(huì)如何將任意中文文本生成詞云

u=3986286550,4041352992&fm=26&gp=0.jpg

代碼組成簡(jiǎn)介

代碼部分來(lái)源于其他人的博客,但是因?yàn)閎ug或者運(yùn)行效率的原因,我對(duì)代碼進(jìn)行了較大的改變

代碼第一部分,設(shè)置代碼運(yùn)行需要的大部分參數(shù),你可以方便的直接使用該代碼而不需要進(jìn)行過(guò)多的修改

第二部分為jieba的一些設(shè)置,當(dāng)然你也可以利用isCN參數(shù)取消中文分詞

第三部分,wordcloud的設(shè)置,包括圖片展示與保存

##Use the code by comment ##
關(guān)于該程序的使用,你可以直接讀注釋在數(shù)分鐘內(nèi)學(xué)會(huì)如何使用它
# - * - coding: utf - 8 -*-
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import jieba
# jieba.load_userdict("txtuserdict.txt")
# 添加用戶詞庫(kù)為主詞典,原詞典變?yōu)榉侵髟~典
from wordcloud import WordCloud, ImageColorGenerator
# 獲取當(dāng)前文件路徑
# __file__ 為當(dāng)前文件, 在ide中運(yùn)行此行會(huì)報(bào)錯(cuò),可改為
# d = path.dirname('.')
d = path.dirname(__file__)
stopwords = {}
isCN = 1 #默認(rèn)啟用中文分詞
back_coloring_path = "img/lz1.jpg" # 設(shè)置背景圖片路徑
text_path = 'txt/lz.txt' #設(shè)置要分析的文本路徑
font_path = 'D:Fontssimkai.ttf' # 為matplotlib設(shè)置中文字體路徑?jīng)]
stopwords_path = 'stopwordsstopwords1893.txt' # 停用詞詞表
imgname1 = "WordCloudDefautColors.png" # 保存的圖片名字1(只按照背景圖片形狀)
imgname2 = "WordCloudColorsByImg.png"# 保存的圖片名字2(顏色按照背景圖片顏色布局生成)
my_words_list = ['路明非'] # 在結(jié)巴的詞庫(kù)中添加新詞
back_coloring = imread(path.join(d, back_coloring_path))# 設(shè)置背景圖片
# 設(shè)置詞云屬性
wc = WordCloud(font_path=font_path, # 設(shè)置字體
 background_color="white", # 背景顏色
 max_words=2000, # 詞云顯示的最大詞數(shù)
 mask=back_coloring, # 設(shè)置背景圖片
 max_font_size=100, # 字體最大值
 random_state=42,
 width=1000, height=860, margin=2,# 設(shè)置圖片默認(rèn)的大小,但是如果使用背景圖片的話,那么保存的圖片大小將會(huì)按照其大小保存,margin為詞語(yǔ)邊緣距離
 )
# 添加自己的詞庫(kù)分詞
def add_word(list):
 for items in list:
 jieba.add_word(items)
add_word(my_words_list)
text = open(path.join(d, text_path)).read()
def jiebaclearText(text):
 mywordlist = []
 seg_list = jieba.cut(text, cut_all=False)
 liststr="/ ".join(seg_list)
 f_stop = open(stopwords_path)
 try:
 f_stop_text = f_stop.read( )
 f_stop_text=unicode(f_stop_text,'utf-8')
 finally:
 f_stop.close( )
 f_stop_seg_list=f_stop_text.split('
')
 for myword in liststr.split('/'):
 if not(myword.strip() in f_stop_seg_list) and len(myword.strip())>1:
 mywordlist.append(myword)
 return ''.join(mywordlist)
if isCN:
 text = jiebaclearText(text)
# 生成詞云, 可以用generate輸入全部文本(wordcloud對(duì)中文分詞支持不好,建議啟用中文分詞),也可以我們計(jì)算好詞頻后使用generate_from_frequencies函數(shù)
wc.generate(text)
# wc.generate_from_frequencies(txt_freq)
# txt_freq例子為[('詞a', 100),('詞b', 90),('詞c', 80)]
# 從背景圖片生成顏色值
image_colors = ImageColorGenerator(back_coloring)
plt.figure()
# 以下代碼顯示圖片
plt.imshow(wc)
plt.axis("off")
plt.show()
# 繪制詞云
# 保存圖片
wc.to_file(path.join(d, imgname1))
image_colors = ImageColorGenerator(back_coloring)
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis("off")
# 繪制背景圖片為顏色的圖片
plt.figure()
plt.imshow(back_coloring, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
# 保存圖片
wc.to_file(path.join(d, imgname2))