基于python的词云生成及可视化_python 基于 wordcloud + jieba + matplotlib 生成词云

词云##

词云是啥？词云突出一个数据可视化，酷炫。以前以为很复杂，不想python已经有成熟的工具来做词云。而我们要做的就是准备关键词数据，挑一款字体，挑一张模板图片，非常非常无脑。准备好了吗，快跟我一起动手吧

模块##

本案例基于python3.6，相关模块如下，安装都是直接 pip install ：

wordcloud 作用如其名。本例核心模块，它把我们带权重的关键词渲染成词云

matplotlib 绘图模块，主要作用是把wordcloud生成的图片绘制出来并在窗口展示

numpy 图像处理模块，读取图片生成像素矩阵

PIL (pip install pillow) 图片处理模块，打开初始化图片

jieba 牛逼的分词模块，因为我是从一个txt文本里提取关键词，所以需要 jieba 来分词并统计词频。如果是已经有了现成的数据，不再需要它

代码##

# -*- coding=utf8 -*-

import matplotlib.pyplot as plt

import jieba.analyse

import numpy

from PIL import Image

from wordcloud import WordCloud, ImageColorGenerator

def readTxt(file, encoding='utf8'):

"""

:param file:

:param encoding:

:return:

"""

with open(txt_file, 'r', encoding='utf16') as f:

txt = f.read()

return txt

def textDict(content):

"""

jieba 提取1000个关键词及其比重

:param content:

:return:

"""

result = jieba.analyse.textrank(content, topK=1000, withWeight=True)

# 转化为比重字典

keywords = dict()

for i in result:

keywords[i[0]] = i[1]

return keywords

def renderWordCloud(keywords, sourceImg):

# 获取图片资源

image = Image.open(sourceImg)

# 转为像素矩阵

graph = numpy.array(image)

# wordcloud 默认字体库不支持中文，这里自己选取中文字体

fontPath = 'C:/Windows/Fonts/SIMLI.TTF'

#fontPath = 'C:/Windows/Fonts/mplus-1mn-regular.ttf'

wc = WordCloud(

font_path=fontPath,

background_color='white',

max_words=1000,

# 使用的词云模板背景

mask=graph

)

# 基于关键词信息生成词云

wc.generate_from_frequencies(keywords)

# 读取模板图片的颜色

image_color = ImageColorGenerator(graph)

# 生成词云图

plt.imshow(wc)

# 用模板图片的颜色覆盖

plt.imshow(wc.recolor(color_func=image_color))

# 关闭图像坐标系

plt.axis('off')

# 显示图片--在窗口显示

plt.show()

txt_file = 'C:/Users/KF/Downloads/《围城》钱钟书(完美版).TXT'

source_img = 'C:/Users/KF/Pictures/ul1241-2001.jpg'

#source_img = 'C:/Users/KF/Pictures/微信图片_20170710102042.jpg'

#source_img = 'C:/Users/KF/Pictures/微信图片_20170710102054.jpg'

#source_img = 'E:\DOC\Carl\wallpapers\d250038c4fde4ea7f36ebe010a7b58ca.jpg'

content = readTxt(txt_file)

keywords = textDict(content)

renderWordCloud(keywords, source_img)

成果##

基于python的词云生成及可视化_python 基于 wordcloud + jieba + matplotlib 生成词云

浏览过的版块