qq_38569853
qq_38569853
采纳率100%
2021-03-01 19:29

使用kashgari实现BERT+Bilstm命名实体识别,在保存模型时报错!!!求助!!!

50

 以下是我的代码,

import tensorflow as tf
import time
import jieba as jb
import random
import kashgari
import sys,io
from kashgari.corpus import ChineseDailyNerCorpus
from kashgari.tasks.labeling import BiLSTM_Model
from kashgari.embeddings import BertEmbedding

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8') # Change default encoding to utf8

start = time.process_time()
train_x,train_y = ChineseDailyNerCorpus.load_data('train')
test_x, test_y = ChineseDailyNerCorpus.load_data('test')
valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')

embedding = BertEmbedding('chinese_L-12_H-768_A-12')
model = BiLSTM_Model(embedding,sequence_length=100)
model.fit(train_x,train_y,valid_x,valid_y,epochs=1)

model.save('model_learn2/bilstm_ner')

end = time.process_time()
step = end - start
print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step,step / 60))

结果报出这样的错误

我的TensorFlow版本是2.1.0;kashgari版本是2.0.1;BERT, Chinese 中文模型使用的是Google Cloud的BERT-base, Chinese

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

3条回答

  • ProfSnail ProfSnail 1月前

    你好。这个错误需要进入源码进行一下修正。我已经私信你了,请看一下。

    需要在D:\dev\anaconda\lib\site-packages\kashgari\tasks\abs_task_model.py的82行open(filename)as f;的时候修改为open(filename, encoding='utf-8') as f. 

    点赞 1 评论 复制链接分享
  • weixin_41908433 知雀的天空 1月前
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import tensorflow as tf
    import time
    import jieba as jb
    import random
    import kashgari
    import sys, io
    from kashgari.corpus import ChineseDailyNerCorpus
    from kashgari.tasks.labeling import BiLSTM_Model
    from kashgari.embeddings import BertEmbedding
    
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')  # Change default encoding to utf8
    
    start = time.process_time()
    train_x, train_y = ChineseDailyNerCorpus.load_data('train')
    test_x, test_y = ChineseDailyNerCorpus.load_data('test')
    valid_x, valid_y = ChineseDailyNerCorpus.load_data('valid')
    
    embedding = BertEmbedding('chinese_L-12_H-768_A-12')
    model = BiLSTM_Model(embedding, sequence_length=100)
    model.fit(train_x, train_y, valid_x, valid_y, epochs=1)
    
    model.save('model_learn2/bilstm_ner')
    
    end = time.process_time()
    step = end - start
    print("总共耗时:%0.3f 秒,相当于 %0.3f 分钟" % (step, step / 60))
    

    文件头加上编码试试

    点赞 评论 复制链接分享
  • bill20100829 歇歇 1月前

    sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gbk') # Change default encoding to gbk

    或者

    sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') # Change default encoding to gb18030

    试试

    点赞 评论 复制链接分享

为你推荐