我用BlueField3卡做NVMe-oF的Target端,组网如下:
[X86/Initiator]----- NVMe-oF(RDMA)-------[BlueField/Target]--- NVME-SSD
当我不开启attr_offload的时候,一切测试OK,没有任何问题。
当我不开启attr_offload的时候,日志中出现如下"XRQ NVMF backend ctrl timeout error"错误,并且offload的ctx被Removing了
谁能告诉我,XRQ NVMF backend ctrl 为什么会出现timeout错误,可能是哪里出了异常?
我是按照如下文档操作的:
https://enterprise-support.nvidia.com/s/article/howto-configure-nvme-over-fabrics--nvme-of--target-offload
###################################################################
Target侧的相关日志是:
07:15:15 kernel: [ 919.565209] nvmet_rdma: connect request (4): status 0 id 00000000bb6dc5e8
07:15:15 kernel: [ 919.593153] nvmet_rdma: added mlx5_0.
07:15:15 kernel: [ 919.608166] nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 8191 max_sge= 30 sq_size = 204 cm_id= 00000000bb6dc5e8
07:15:15 kernel: [ 919.630662] nvmet_rdma: established (9): status 0 id 00000000bb6dc5e8
07:15:15 kernel: [ 919.643624] nvmet: ctrl 1 start keep-alive timer for 5 secs
07:15:15 kernel: [ 919.654806] nvmet: creating nvm controller 1 for subsystem testsubsystem for NQN nqn.2014-08.org.nvmexpress:uuid:ef0cec00-a846-11ea-8000-ac1f6b3ea450.
07:15:15 kernel: [ 919.692117] nvmet_rdma: connect request (4): status 0 id 0000000020477746
07:15:15 kernel: [ 919.705730] nvmet_rdma: added mlx5_0.
07:15:15 kernel: [ 919.717137] nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 8191 max_sge= 30 sq_size = 102 cm_id= 0000000020477746
07:15:15 kernel: [ 919.738992] nvmet_rdma: established (9): status 0 id 0000000020477746
07:15:15 kernel: [ 919.746198] nvmet_rdma: connect request (4): status 0 id 00000000d0b3c7a0
07:15:15 kernel: [ 919.765477] nvmet_rdma: added mlx5_0.
… …
… …
07:15:18 kernel: [ 922.468332] nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 8191 max_sge= 30 sq_size = 102 cm_id= 0000000031c5eb81
07:15:18 kernel: [ 922.490193] nvmet_rdma: established (9): status 0 id 0000000031c5eb81
07:15:18 kernel: [ 922.503156] nvmet_rdma: connect request (4): status 0 id 00000000675c1077
07:15:18 kernel: [ 922.516756] nvmet_rdma: added mlx5_0.
07:15:18 kernel: [ 922.528174] nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 8191 max_sge= 30 sq_size = 102 cm_id= 00000000675c1077
07:15:18 kernel: [ 922.550005] nvmet_rdma: established (9): status 0 id 00000000675c1077
07:15:18 kernel: [ 922.562943] nvmet_rdma: using dynamic staging buffer 0000000053e1f05e
07:15:18 kernel: [ 922.622009] nvmet: Adding offload ctx 0 to configfs
07:15:18 kernel: [ 922.634362] nvmet: adding queue 1 to ctrl 1.
07:15:18 kernel: [ 922.674259] nvmet: adding queue 2 to ctrl 1.
… …
… …
07:15:20 kernel: [ 924.922805] nvmet: adding queue 47 to ctrl 1.
07:15:20 kernel: [ 924.972539] nvmet: adding queue 48 to ctrl 1.
07:15:23 kernel: [ 927.351996] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:25 kernel: [ 929.909981] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:28 kernel: [ 932.467996] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:30 kernel: [ 935.026018] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:33 kernel: [ 937.584047] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:35 kernel: [ 940.142050] nvmet: ctrl 1 update keep-alive timer for 5 secs
07:15:37 kernel: [ 941.661828] nvme 0000:11:00.0: received IB Backend ctrl event: XRQ NVMF backend ctrl timeout error (22) be_ctrl 00000000f9eb18d8 id 0
07:15:37 kernel: [ 941.685916] nvmet: Removing offload ctx 0 from configfs