问题遇到的现象和发生背景
nlp小白使用spyder运行bert模型的glue项目中的MRPC数据集,但是报错UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 196: surrogates not allowed
模型输入参数如下:
--task_name="MRPC"
--do_train="true"
--do_eval="true"
--data_dir="…//GLUE//glue_data//MRPC//"
--vocab_file="…//GLUE//BERT_BASE_DIR//uncased_L-12_H-768_A-12//vocab.txt"
--bert_config_file="…//GLUE//BERT_BASE_DIR//uncased_L-12_H-768_A-12//bert_config.json"
--init_checkpoint="…//GLUE//BERT_BASE_DIR//uncased_L-12_H-768_A-12//bert_model.ckpt"
--max_seq_length="128"
--train_batch_size="1"
--learning_rate="2e-5"
--num_train_epochs="1.0"
--output_dir="…//GLUE//output"
问题相关代码,请勿粘贴截图
run_classifier.py直接运行,代码在GitHub上google的bert项目即可下载
运行结果及报错内容
ERROR:tornado.general:Uncaught exception in ZMQStream callback
Traceback (most recent call last):
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 431, in _run_callback
callback(*args, **kwargs)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 126, in _handle_event
event_f()
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 498, in _flush
parent=self.parent_header, ident=self.topic)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 742, in send
to_send = self.serialize(msg, ident)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 630, in serialize
content = self.pack(content)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 83, in <lambda>
ensure_ascii=False, allow_nan=False,
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\utils\jsonapi.py", line 25, in dumps
return json.dumps(o, **kwargs).encode("utf8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 196: surrogates not allowed
ERROR:tornado.general:Uncaught exception in zmqstream callback
Traceback (most recent call last):
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 448, in _handle_events
self._handle_recv()
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 477, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 431, in _run_callback
callback(*args, **kwargs)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 126, in _handle_event
event_f()
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 498, in _flush
parent=self.parent_header, ident=self.topic)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 742, in send
to_send = self.serialize(msg, ident)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 630, in serialize
content = self.pack(content)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 83, in <lambda>
ensure_ascii=False, allow_nan=False,
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\utils\jsonapi.py", line 25, in dumps
return json.dumps(o, **kwargs).encode("utf8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 196: surrogates not allowed
Exception in callback BaseAsyncIOLoop._handle_events(2244, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(2244, 1)>
Traceback (most recent call last):
File "C:\Users\40701\.conda\envs\test\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\tornado\platform\asyncio.py", line 189, in _handle_events
handler_func(fileobj, events)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 448, in _handle_events
self._handle_recv()
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 477, in _handle_recv
self._run_callback(callback, msg)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\eventloop\zmqstream.py", line 431, in _run_callback
callback(*args, **kwargs)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 126, in _handle_event
event_f()
File "C:\Users\40701\.conda\envs\test\lib\site-packages\ipykernel\iostream.py", line 498, in _flush
parent=self.parent_header, ident=self.topic)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 742, in send
to_send = self.serialize(msg, ident)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 630, in serialize
content = self.pack(content)
File "C:\Users\40701\.conda\envs\test\lib\site-packages\jupyter_client\session.py", line 83, in <lambda>
ensure_ascii=False, allow_nan=False,
File "C:\Users\40701\.conda\envs\test\lib\site-packages\zmq\utils\jsonapi.py", line 25, in dumps
return json.dumps(o, **kwargs).encode("utf8")
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 196: surrogates not allowed
我的解答思路和尝试过的方法
问了别人说是文件里面可能有中文,但是因为代码能力有限,找不到是什么文件。
然后我把MRPC文件夹中的文件用记事本打开以后,找到第196行,并未发现出现中文
我想要达到的结果
希望能把run_classifier文件跑通,不影响后续学习