python 这题不会写，有没有同志可以帮助一下，

从英文文档中读入文本，将每个句子表示为词袋特征向量。要求如下：

1）从文件中读出所有英文句子；

2）统计所有句子中的词；

3）将每个句子表示为词袋模型的向量；

4）将每个句子的向量保存到新的文档中。

文档集内容如下所示。

"State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small",

"supervised training corpora that are available. In this paper, we introduce two new neural architectures: one based on bidirectional LSTMs and conditional random fields",

"and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words",

"character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora",

"Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers"

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

CSDN专家-kaily 2021-09-26 14:13

关注

import numpy as np
import re
from gensim import corpora

def onehot_matrix(list1):
    words = []
    docs = []
    for i in list1:   # 去标点符号
        string = re.sub("[\,\.\:]", "",i)
        docs.append(string)  # 去掉标点符号的句子

    for i in range(len(docs)):
        docs[i] = docs[i].split(" ")
        words += docs[i]
    vocab=sorted(set(words),key=words.index)  # 所有不重复的词

    V=len(vocab)    # 建立一个M行V列的全0矩阵，M为句子数量，V为不重复词语数，即编码维度
    M=len(list1)
    onehot = np.zeros(V, dtype=int)  # 用来表示词
    bow = np.zeros((M,V), dtype=int) # 用来表示所有句子
    
    #生成词典
    dict = corpora.Dictionary([words])
    print(dict.token2id)  # 输出词典
    for i,doc in enumerate(docs):  #词袋 
        for word in doc:
            if word in words:
                pos=vocab.index(word)
                bow[i][pos] += 1
    return [list(i) for i in bow]

list1 = ["State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small",
         "supervised training corpora that are available. In this paper, we introduce two new neural architectures: one based on bidirectional LSTMs and conditional random fields",
         "and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words",
         "character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora",
         "Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers"]
print(onehot_matrix(list1))

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

python 这题不会写，有没有同志可以帮助一下， python 有问必答
2021-09-26 10:42

回答 2 已采纳 import numpy as np import re from gensim import corpora def onehot_matrix(list1): words = []
廖雪峰 python生成器的一个坑希望有哪位同志解答下 list python
2022-04-13 12:17

回答 2 已采纳这个咱知道，主要是因为在yield L后又对L进行了原址修改（L.insert(0, 0)，L.append(0)），使得生成的t（L）也随之变化，可考虑在生成时做个拷贝，以防止原址修改带来的副作用（
精通python的同志们帮帮忙， python
2021-10-18 20:54

回答 1 已采纳 TempStr=input("请输入带有符号的温度值，C/c表示摄氏度、F/f表示华氏温度：") if TempStr[-1] in ['F','f']: C=(eval(TempStr[0
带有源代码的 Python 简单电影预告片网站
2023-08-26 10:44

带有源代码的 Python 简单电影预告片网站该项目是用 Python、HTML 和 CSS 编写的。项目文件包含 python 脚本（entertainment_center.py、media.py、fresh_tomatoes.py）。...为了运行这个项目，您必须有Python 2.x。
python编程，同志们给个完整程序叭 python
2021-11-17 16:37

回答 1 已采纳 def sum_missing_numbers(a): a.sort() sum = 0 for i in range(a[0]+1, a[len(a)-1]):
同志们教俺一下俺不会 python
2023-04-14 17:40

回答 3 已采纳 org = list(range(1000)) for i in range(1000) : if i % 2 == 1 : org.remove(i) print(org)
python的replace函数用起来没效果怎么办？ python
2021-08-10 16:05

回答 2 已采纳 file_path = '/red_alert_3y/python_work/file_txt_csv/python_learning_notebook.txt' with open(file_pa
如何用 Python 画出 69 岁老同志？
2020-11-25 21:01

痴海的博客【P实战】教你最有趣的 Python 入门项目每周，痴海会教你一个 Python 实战项目。编程能力想要快速的提升，唯有不断的实战。而对于许多零基础的同学，很难找到适合的入门级项目。所以...
python 装饰器 functools.wraps的用法 python
2023-02-24 21:56

回答 2 已采纳基于Monster 组和GPT的调写：functools.wraps 是一个装饰器，用于将被装饰函数的元信息，如 __name__、__doc__、__module__，等等，复制到装饰器函数中，以便
为什么我的python打印类中实例时显示的是存储位置 python 其他
2021-12-02 19:18

回答 4 已采纳同志们啊同志们，经过专家指导后，我做出了如下修改：1.不在 class Privileges（）类中直接赋值列表，而是在运行实例时，先将列表定义为某一个变量，再将实例的privileges赋值为该变量
Golang如何调用Python代码详解
2021-01-03 03:34

Python很适合让搞算法的写写模型，而Golang很适合提供API服务，两位同志都红的发紫，这里就介绍一下正确搅基的办法。 go 中的 cgo 模块可以让 go 无缝调用 c 或者 c++ 的代码，而 python 本身就是个 c 库，自然也...
python语言实验报告
2022-06-08 15:00

”，如果s则输出“同志你该努力了！”。 2，从键盘上输入若干个学生的成绩，统计并输出最高成绩和最低成绩，当输入负数时结束输入。（用while循环语句实现） 3，计算 s=1!+2!+……+10! 的值并输出。 4,求1000以内的...
python编写九九乘法表源码
2023-09-21 10:44

用python编写九九乘法表在下面的示例中，程序的输入来自用户。用户在输出时给出的输入是 10。循环给出的范围是 (1,11)，这意味着数字必须大于等于 1 且小于 11。在第一次迭代中，数字是乘以 1。在第二次迭代中，...
python.rar_python
2022-07-14 13:46

python学习资料合集，刚学python和有经验的同志都适合
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 10月4日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 9月26日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 9月26日

悬赏问题

¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 关于大棚监测的pcb板设计
¥15 stm32开发clion时遇到的编译问题
¥15 lna设计源简并电感型共源放大器
¥15 如何用Labview在myRIO上做LCD显示？(语言-开发语言)
¥15 Vue3地图和异步函数使用
¥15 C++ yoloV5改写遇到的问题

python 这题不会写，有没有同志可以帮助一下，

2条回答 默认 最新

问题事件

悬赏问题

2条回答默认最新