你好兄弟,我们在使用贵公司的 paddle uie 进行文本的实体识别,效果挺好的,不过现在遇到了一些问题需要请教你,希望百忙中抽空帮我看下:
- 环境
- 使用 docker 部署:registry.baidubce.com/paddlepaddle/paddle:2.6.1-gpu-cuda11.7-cudnn8.4-trt8.4
模型
模型是使用了 uie-base 进行微调的模型,效果挺好的。问题
我们使用 fastapi 部署了一个调用模型预测功能的接口,只要并发高一些就会出现各种问题。
第一:CUDA 内存问题。
2024-07-23T10:45:57.159119801Z Traceback (most recent call last):
2024-07-23T10:45:57.159148406Z File "/workspace/v2_router.py", line 57, in ner_long_text
2024-07-23T10:45:57.159157383Z return {'data': ner_split_long_text(request.text.strip(), schema)}
2024-07-23T10:45:57.159181888Z File "/workspace/app/information_extraction/ner/ner.py", line 144, in ner_split_long_text
2024-07-23T10:45:57.159189898Z single_result = entity_extraction(sentence, schema, start_index)
2024-07-23T10:45:57.159196862Z File "/workspace/app/information_extraction/ner/ner.py", line 221, in entity_extraction
2024-07-23T10:45:57.159204189Z ner_results = common_ner_uie(txt)
2024-07-23T10:45:57.159217646Z File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/taskflow.py", line 817, in __call__
2024-07-23T10:45:57.159225197Z results = self.task_instance(inputs, **kwargs)
2024-07-23T10:45:57.159232127Z File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/task.py", line 527, in __call__
2024-07-23T10:45:57.159239164Z outputs = self._run_model(inputs, **kwargs)
2024-07-23T10:45:57.159245994Z File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/information_extraction.py", line 1068, in _run_model
2024-07-23T10:45:57.159253112Z results = self._multi_stage_predict(_inputs)
2024-07-23T10:45:57.159259909Z File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/information_extraction.py", line 1166, in _multi_stage_predict
2024-07-23T10:45:57.159267042Z result_list = self._single_stage_predict(examples)
2024-07-23T10:45:57.159273813Z File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/information_extraction.py", line 975, in _single_stage_predict
2024-07-23T10:45:57.159289387Z self.input_handles[0].copy_from_cpu(input_ids.numpy())
2024-07-23T10:45:57.159308544Z File "/usr/local/lib/python3.10/dist-packages/paddle/inference/wrapper.py", line 52, in tensor_copy_from_cpu
2024-07-23T10:45:57.159319715Z self._copy_from_cpu_bind(data)
2024-07-23T10:45:57.159332762Z OSError: (External) CUDA error(700), an illegal memory access was encountered.
2024-07-23T10:45:57.159352327Z [Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:265)
第二:输出奇怪的内容。比如,正确的情况下需要抽取文本中人名,组织等信息。但是请求量大以后就会出现下面这种情况,感觉上应该是抽取位置信息与其他的线程串了一样。
{"person":"的,和","org":"了,的"}
- 疑惑
- 是否目前类似这种模型的预测不支持多线程版本?
- CUDA 内存问题要如何解决? 每次出现这种问题就一定要重启模型才行。
- 我们训练的模型是 1 年多之前训练的,期间用于预测的项目是升级过 paddlepaddle,是不是这个原因导致了很多无法解释的问题,是否需要通过新的版本的 paddlepaddle 再重新进行一次训练?
- 对于微调以后的模型,有没有如何部署的推荐方案?
非常感谢。