我用R做中文lda,在别人电脑上结果正常,但是在我电脑上就成了乱码。
我的系统是win7。
代码如下:
library(readxl)
library(jiebaR)
library(topicmodels)
library(wordcloud2)
library(tm)
data <-read_excel("2018.xlsx")
data <- as.data.frame(data)
data <- na.omit(data)
stop <- read_excel("stopword.xls",col_names=F)
stop <- as.data.frame(stop)
cutter <-worker()
WORDS <- lapply(data$正文,function(w){
w <- gsub("[a-zA-Z]","",w)
w <- gsub("\\d+","",w)
w <- gsub("\\s+","",w)
word <- segment(w,cutter)
word <- word[!word %in% stop[,1]]
})
term.table <- table(unlist(WORDS))
num <- data.frame(term.table)
num <- num[order(num$Freq,decreasing = T),]
wordcloud2(num[1:1000,])
corpus <- Corpus(VectorSource(unlist(WORDS)))
document <- DocumentTermMatrix(corpus)
topic <- LDA(document,k=3,iter=500)
terms(topic,5)
其中,data和stop用head()查看都是都是正常的,只有结果在我的电脑上变成乱码