从英文文档中读入文本,将每个句子表示为词袋特征向量。要求如下:
1)从文件中读出所有英文句子;
2)统计所有句子中的词;
3)将每个句子表示为词袋模型的向量;
4)将每个句子的向量保存到新的文档中。
文档集内容如下所示。
"State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small",
"supervised training corpora that are available. In this paper, we introduce two new neural architectures: one based on bidirectional LSTMs and conditional random fields",
"and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words",
"character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora",
"Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers"