这一步我看到有两种方法解释,但是我都没看懂:
一个是Taylor Arnold:https://statsmaths.github.io/cleanNLP/state-of-union.html,即:
Now, prepare the dataset by putting the text into a column of the metadata table:
<a>input <- sotu_meta</a>
<a>input$text <- sotu_text</a>
Then, extract annotations from the dataset:
<a>anno <- cnlp_annotate(input, verbose=FALSE)</a>
一个是一位老师的basic guide:https://susie-kim.github.io/post/2018-01-09-guide-cnlp-part2/,即:
1 . Processing text files
Place all text files that you want to process under the working directory. For example, currently my working directory is set as: C:/my/working/directory/
. The .txt files that I will process are in a folder named corpus under this working directory: C:/my/working/directory/corpus
. Before proceeding to the next part, load the cleanNLP
and reticulate
packages, and initiate spaCy by executing cnlp_init_spacy
and specifying the language model.
library(cleanNLP); library(reticulate)
cnlp_init_spacy(model_name = "en_core_web_lg")
1.1 . Annotate a single text
Let’s say the name of the text file I want to analyze is: text_01.txt, and it’s in the corpus folder right under the working directory. Here is how to process this particular file:
#annotate a single file
single.text <- cnlp_annotate("corpus/text_01.txt", as_strings = FALSE)
It’s as simple as that. Setting as_strings = FALSE
lets the annotator know that the path provided is the name of a file, not actual text that’s waiting to be annotated.
求问在Rstudio中运行cleanNLP annotation,如何将文本编为数据库或者单篇文本直接导入?救救孩子,非常感谢大家了!!!