LDA 中用pyLDAvis可视化出现的编码问题求救？？？？？？？？？？？？？

LDA 中用pyLDAvis可视化出现的ascii' codec can't encode characters in position 18-19: ordinal not in range(128)错误？

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-7-ca6914ad3a98> in <module>
----> 1 visibel(2,20302)

<ipython-input-2-0255fd910b80> in visibel(topic_num, data_num)
     86     model_name = './lda_{}_{}.model'.format(topic_num, data_num)
     87     lda = models.ldamodel.LdaModel.load(model_name)
---> 88     vis_data = pyLDAvis.gensim.prepare(lda, corpus, dictionary)
     89     pyLDAvis.show(vis_data)
     90 

G:\Python38\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
    117     """
    118     opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
--> 119     return vis_prepare(**opts)

G:\Python38\lib\site-packages\pyLDAvis\_prepare.py in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts, sort_topics)
    396    term_frequency = np.sum(term_topic_freq, axis=0)
    397 
--> 398    topic_info         = _topic_info(topic_term_dists, topic_proportion, term_frequency, term_topic_freq, vocab, lambda_step, R, n_jobs)
    399    token_table        = _token_table(topic_info, term_topic_freq, vocab, term_frequency)
    400    topic_coordinates = _topic_coordinates(mds, topic_term_dists, topic_proportion)

G:\Python38\lib\site-packages\pyLDAvis\_prepare.py in _topic_info(topic_term_dists, topic_proportion, term_frequency, term_topic_freq, vocab, lambda_step, R, n_jobs)
    252                            'Category': 'Topic%d' % new_topic_id})
    253 
--> 254    top_terms = pd.concat(Parallel(n_jobs=n_jobs)(delayed(_find_relevance_chunks)(log_ttd, log_lift, R, ls) \
    255                                                  for ls in _job_chunks(lambda_seq, n_jobs)))
    256    topic_dfs = map(topic_top_term_df, enumerate(top_terms.T.iterrows(), 1))

G:\Python38\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
    971 
    972         if not self._managed_backend:
--> 973             n_jobs = self._initialize_backend()
    974         else:
    975             n_jobs = self._effective_n_jobs()

G:\Python38\lib\site-packages\joblib\parallel.py in _initialize_backend(self)
    738         """Build a process or thread pool and return the number of workers"""
    739         try:
--> 740             n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
    741                                              **self._backend_args)
    742             if self.timeout is not None and not self._backend.supports_timeout:

G:\Python38\lib\site-packages\joblib\_parallel_backends.py in configure(self, n_jobs, parallel, prefer, require, idle_worker_timeout, **memmappingexecutor_args)
    492                 SequentialBackend(nesting_level=self.nesting_level))
    493 
--> 494         self._workers = get_memmapping_executor(
    495             n_jobs, timeout=idle_worker_timeout,
    496             env=self._prepare_worker_env(n_jobs=n_jobs),

G:\Python38\lib\site-packages\joblib\executor.py in get_memmapping_executor(n_jobs, **kwargs)
     18 
     19 def get_memmapping_executor(n_jobs, **kwargs):
---> 20     return MemmappingExecutor.get_memmapping_executor(n_jobs, **kwargs)
     21 
     22 

G:\Python38\lib\site-packages\joblib\executor.py in get_memmapping_executor(cls, n_jobs, timeout, initializer, initargs, env, temp_folder, context_id, **backend_args)
     40         _executor_args = executor_args
     41 
---> 42         manager = TemporaryResourcesManager(temp_folder)
     43 
     44         # reducers access the temporary folder in which to store temporary

G:\Python38\lib\site-packages\joblib\_memmapping_reducer.py in __init__(self, temp_folder_root, context_id)
    529             # exposes exposes too many low-level details.
    530             context_id = uuid4().hex
--> 531         self.set_current_context(context_id)
    532 
    533     def set_current_context(self, context_id):

G:\Python38\lib\site-packages\joblib\_memmapping_reducer.py in set_current_context(self, context_id)
    533     def set_current_context(self, context_id):
    534         self._current_context_id = context_id
--> 535         self.register_new_context(context_id)
    536 
    537     def register_new_context(self, context_id):

G:\Python38\lib\site-packages\joblib\_memmapping_reducer.py in register_new_context(self, context_id)
    558                 new_folder_name, self._temp_folder_root
    559             )
--> 560             self.register_folder_finalizer(new_folder_path, context_id)
    561             self._cached_temp_folders[context_id] = new_folder_path
    562 

G:\Python38\lib\site-packages\joblib\_memmapping_reducer.py in register_folder_finalizer(self, pool_subfolder, context_id)
    588         # semaphores and pipes
    589         pool_module_name = whichmodule(delete_folder, 'delete_folder')
--> 590         resource_tracker.register(pool_subfolder, "folder")
    591 
    592         def _cleanup():

G:\Python38\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py in register(self, name, rtype)
    189         '''Register a named resource, and increment its refcount.'''
    190         self.ensure_running()
--> 191         self._send('REGISTER', name, rtype)
    192 
    193     def unregister(self, name, rtype):

G:\Python38\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py in _send(self, cmd, name, rtype)
    202 
    203     def _send(self, cmd, name, rtype):
--> 204         msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
    205         if len(name) > 512:
    206             # posix guarantees that writes to a pipe of less than PIPE_BUF

UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-19: ordinal not in range(128)

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

8条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
清洄KAKA 2022-12-21 11:09
关注
使用n_jobs=1就能解决，因为prepare函数默认多线程，多线程会调用用户地址，用户有中文名就报错，这是n_jobs的通病了

解决 18

无用 1
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Fish_Fish!?_fisher_fishlinear_
2021-10-03 02:13

MATLAB作为一种强大的数值计算和可视化工具，非常适合进行LDA算法的实现。在"Fish.m"文件中，我们可以看到以下步骤： 1. 数据预处理：首先读取"ex3data1.txt"数据文件，这通常包含了样本特征和对应的类别标签。数据...
#NAME?_LDA MATLAB_LDA 分类_LDA分类_lda_lda matlab 分类
2022-07-13 20:43

lda分类代码这个可以运用到很多地方去
运用LDA模型进行可视化时，总是报出UnicodeEncodeError ，该怎么改？
2024-03-14 10:59

qi qi____的博客【代码】运用LDA模型进行可视化时，总是报出UnicodeEncodeError ，该怎么改？
LDA可视化气泡标号与主题标号不对应？
2022-04-01 17:24

月落青山外的博客使用pyLDAvis进行可视化的时候，发现鼠标移到气泡上，右边显示的关键词与我之前得到的主题关键词对应不起来，在必应转了一圈发现好像没有人谈到这个问题，大多是教代码，教如何可视化，并且代码差不多都长这样： ...
如何用 AI 工具做数据分析与可视化？
2024-04-18 13:45

nkwshuyi的博客 2024 年 4 月 14 日，应武汉大学信息管理学院的邀请，我和北京大学步一老师给几千名学员（其中正式报名超过千人）做了一次数据分析与可视化工作坊。我负责的是上午场，题目为《运用 AI 工具进行数据分析与可视化 》...
LDA模型如何挖掘海量文本中背后的隐藏主题？
2025-08-20 16:25

小小科研的博客弹幕是一种新兴的实时互动媒介，如今是反映观众观影行为、情感态度和文化参与的重要载体，小小科研团队选取网络平台的 52780 条弹幕作为分析素材，借助 LDA主题模型，究竟能帮我们揭开哪些用户在观看时讨论的核心...
lda主题模型的可视化_LDA模型怎么画？快速制作可视化模型图表步骤
2020-12-31 00:50

weixin_39842475的博客 LDA全称为Latent Dirichlet Allocation，是现在文本分析中经常用到的也特别受欢迎的一种概率性主题模型。目前主要文本分类，同时在NLP领域也有十分重要的应用。LDA模型的常见用途LDA的作用就是根据每个文档的用词用...
特征锦囊：怎么简单使用LDA来划分数据且可视化呢？
2020-01-30 11:23

Pysamlam的博客今日锦囊特征锦囊：怎么简单使用LDA来划分数据且可视化呢？LDA算法在数据挖掘中是很基础的算法，简单回顾一下定义：LDA的全称为Linear Discriminant Analysis,...
Python实现LDA主题模型以及模型可视化
2021-04-27 14:57

本文将深入探讨如何使用Python的jieba、gensim和pyLDAvis库来实现LDA主题模型并进行可视化。首先，我们需要对原始文本数据进行预处理。jieba是一个强大的中文分词库，它支持分词、词性标注、关键词提取等功能。在...
如何使用Python对中文文档进行可视化的主题建模?
2019-03-18 19:13

-派神-的博客主题建模是一种无监督的机器学习方法，它帮助我们发现文档(语料库)中隐藏的语义结构，它使我们能够快速的发现文档...Latent Dirichlet Allocation（LDA）是一种用于发现文档(语料库)中存在的主题的算法。如果您使用...
如何理解线性判别分类器（LDA）？
2022-05-18 14:09

马同学图解数学的博客感知机是机器学习中最基本的算法，纯粹靠样本点来进行分类。如果增加关于样本点的知识，比如像本文一样就可以得到 LDA 算法
【python数据挖掘课程】二十八.基于LDA和pyLDAvis的主题挖掘及可视化分析
2019-06-12 14:31

Eastmount的博客这是《Python数据挖掘课程》系列文章，前面很多文章都讲解了数据挖掘、机器学习，这篇文章主要讲解LDA和pyLDAvis算法，同时讲解如何读取CSV文本内容进行主题挖掘及可视化展示。文章比较基础，希望对你有所帮助，提供...
手把手教你学会LDA话题模型可视化pyLDAvis库 (2).docx
2023-02-22 22:42

为了更好地理解LDA模型及其可视化，下面通过一个简单的例子来演示如何使用pyLDAvis进行LDA模型的可视化。 1. **数据准备** - 准备一个包含多篇文档的语料库。 - 对文档进行预处理，包括分词、去除停用词等。 2. ...
LDA模型中文文本主题提取丨可视化工具pyLDAvis的使用
2020-07-05 17:04

Seepen_L的博客主题模型LDA的实现及其可视化pyLDAvis1. 无监督提取文档主题——LDA模型1.1 准备工作1.2 调用api实现模型2. LDA的可视化交互分析——pyLDAvis2.1 安装pyLDAvis2.2 结合gensim调用api实现可视化p.s. 保存结果为独立...
lda主题模型的可视化_把LDA主题模型作为自己的硕士课题，有什么可以做的？
2021-02-06 23:38

木子Hui的博客经典的LDA主题模型实现了文本的软聚类的工作，将文档转化为基于主题的数值向量，每个维度上的主题概率取值就是对特定主题的聚类中心的隶属度。由于LDA主题模型提出较早，所以作为基础模型有了很多改进和创新，技术上...
没有解决我的问题, 去提问

LDA 中用pyLDAvis可视化出现的编码问题求救？？？？？？？？？？？？？

8条回答 默认 最新

8条回答默认最新