游浪飞船 2021-04-06 22:12 采纳率: 0%
浏览 30

在Rstudio中运行cleanNLP annotation,如何将文本编为数据库或者直接导入?

这一步我看到有两种方法解释,但是我都没看懂:

一个是Taylor Arnold:https://statsmaths.github.io/cleanNLP/state-of-union.html,即:

Now, prepare the dataset by putting the text into a column of the metadata table:

<a>input <- sotu_meta</a>
<a>input$text <- sotu_text</a>

Then, extract annotations from the dataset:

<a>anno <- cnlp_annotate(input, verbose=FALSE)</a>

一个是一位老师的basic guide:https://susie-kim.github.io/post/2018-01-09-guide-cnlp-part2/,即:

1 . Processing text files

Place all text files that you want to process under the working directory. For example, currently my working directory is set as: C:/my/working/directory/. The .txt files that I will process are in a folder named corpus under this working directory: C:/my/working/directory/corpus. Before proceeding to the next part, load the cleanNLP and reticulate packages, and initiate spaCy by executing cnlp_init_spacy and specifying the language model.

library(cleanNLP); library(reticulate)
cnlp_init_spacy(model_name = "en_core_web_lg")

1.1 . Annotate a single text

Let’s say the name of the text file I want to analyze is: text_01.txt, and it’s in the corpus folder right under the working directory. Here is how to process this particular file:

#annotate a single file
single.text <- cnlp_annotate("corpus/text_01.txt", as_strings = FALSE)

It’s as simple as that. Setting as_strings = FALSE lets the annotator know that the path provided is the name of a file, not actual text that’s waiting to be annotated.

求问在Rstudio中运行cleanNLP annotation,如何将文本编为数据库或者单篇文本直接导入?救救孩子,非常感谢大家了!!!

  • 写回答

1条回答 默认 最新

  • 故事不长丨 2023-07-20 14:26
    关注

    引用GPT回答:

    1. 将文本编码为数据库:
      如果您的文本数据存储在数据库中,您可以使用适当的包(例如RMySQL、RPostgreSQL等)连接到数据库,并从中获取数据。

      使用以下代码可以将数据编码为数据库,并从中提取注释:

      library(cleanNLP)
      
      # 连接到数据库,并从中获取要进行注释的文本数据
      # 这里我们使用RMySQL作为示例,您可以根据实际使用的数据库包进行相应修改
      conn <- dbConnect(RMySQL::MySQL(), dbname = "your_database_name", host = "your_host", 
                        port = your_port, user = "your_username", password = "your_password")
      
      query <- "SELECT id, text FROM your_table_name"  # 调整查询以符合您的数据库结构和表名
      data <- dbGetQuery(conn, query)
      
      # 进行注释
      annotations <- cnlp_annotate(data, verbose = FALSE)
      
      # 断开与数据库的连接
      dbDisconnect(conn)
      
    2. 直接导入单篇文本:
      如果要直接导入单篇文本进行注释,您可以使用cnlp_annotate()函数,并将文本文件的路径作为参数传递给它。

      使用以下代码可以导入并注释单篇文本:

      library(cleanNLP)
      
      # 设置工作目录以便找到文本文件
      setwd("path_to_directory_containing_text_file")
      
      # 注释单篇文本
      annotations <- cnlp_annotate("text_file.txt", as_strings = FALSE)
      
    评论

报告相同问题?

悬赏问题

  • ¥15 一道python难题2
  • ¥15 一道python难题
  • ¥15 用matlab 设计一个不动点迭代法求解非线性方程组的代码
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试
  • ¥20 问题请教!vue项目关于Nginx配置nonce安全策略的问题
  • ¥15 教务系统账号被盗号如何追溯设备