。!72 2024-07-12 19:29 采纳率: 0%
浏览 31

yolov5-lite移植树莓派报错:

**报错:ValueError: operands could not be broadcast together with shapes (3,2,40,13) (4800,2) **
我在使用yolov5-lite训练了自己的模型:num1-8.onnx(识别1-8的数字)
我将该模型部署到树莓派中使用onnx进行推理
根据基于树莓派4B的YOLOv5-Lite目标检测的移植与部署(含训练教程)这篇文章按步骤来的,镜像和软件包均无问题。

运行提示如下报错:

Traceback (most recent call last):
  File "/home/assic/Desktop/num-detc/test_video.py", line 146, in <module>
    det_boxes,scores,ids = infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5)
  File "/home/assic/Desktop/num-detc/test_video.py", line 100, in infer_img
    outs = cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride)
  File "/home/assic/Desktop/num-detc/test_video.py", line 58, in cal_outputs
    grid[i], (na, 1))) * int(stride[i])
ValueError: operands could not be broadcast together with shapes (3,2,40,13) (4800,2) 

我的代码如下:


import cv2
import numpy as np
import onnxruntime as ort
import time

def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    """
    description: Plots one bounding box on image img,
                 this function comes from YoLov5 project.
    param: 
        x:      a box likes [x1,y1,x2,y2]
        img:    a opencv image object
        color:  color to draw rectangle, such as (0,255,0)
        label:  str
        line_thickness: int
    return:
        no return
    """
    tl = (
        line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1
    )  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(
            img,
            label,
            (c1[0], c1[1] - 2),
            0,
            tl / 3,
            [225, 255, 255],
            thickness=tf,
            lineType=cv2.LINE_AA,
        )

def _make_grid( nx, ny):
        xv, yv = np.meshgrid(np.arange(ny), np.arange(nx))
        return np.stack((xv, yv), 2).reshape((-1, 2)).astype(np.float32)

def cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride):
    
    row_ind = 0
    grid = [np.zeros(1)] * nl
    for i in range(nl):
        h, w = int(model_w/ stride[i]), int(model_h / stride[i])
        length = int(na * h * w)
        if grid[i].shape[2:4] != (h, w):
            grid[i] = _make_grid(w, h)

        outs[row_ind:row_ind + length, 0:2] = (outs[row_ind:row_ind + length, 0:2] * 2. - 0.5 + np.tile(grid[i], (na, 1))) * int(stride[i])
        
        outs[row_ind:row_ind + length, 2:4] = (outs[row_ind:row_ind + length, 2:4] * 2) ** 2 * np.repeat(anchor_grid[i], h * w, axis=0)
        row_ind += length
    return outs



def post_process_opencv(outputs,model_h,model_w,img_h,img_w,thred_nms,thred_cond):
    conf = outputs[:,4].tolist()
    c_x = outputs[:,0]/model_w*img_w
    c_y = outputs[:,1]/model_h*img_h
    w  = outputs[:,2]/model_w*img_w
    h  = outputs[:,3]/model_h*img_h
    p_cls = outputs[:,5:]
    if len(p_cls.shape)==1:
        p_cls = np.expand_dims(p_cls,1)
    cls_id = np.argmax(p_cls,axis=1)

    p_x1 = np.expand_dims(c_x-w/2,-1)
    p_y1 = np.expand_dims(c_y-h/2,-1)
    p_x2 = np.expand_dims(c_x+w/2,-1)
    p_y2 = np.expand_dims(c_y+h/2,-1)
    areas = np.concatenate((p_x1,p_y1,p_x2,p_y2),axis=-1)
    
    areas = areas.tolist()
    ids = cv2.dnn.NMSBoxes(areas,conf,thred_cond,thred_nms)
    if len(ids)>0:
        return  np.array(areas)[ids],np.array(conf)[ids],cls_id[ids]
    else:
        return [],[],[]
def infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5):
    # 图像预处理
    img = cv2.resize(img0, [model_w,model_h], interpolation=cv2.INTER_AREA)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32) / 255.0
    blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0)

    # 模型推理
    outs = net.run(None, {net.get_inputs()[0].name: blob})[0].squeeze(axis=0)

    # 输出坐标矫正
    outs = cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride)

    # 检测框计算
    img_h,img_w,_ = np.shape(img0)
    boxes,confs,ids = post_process_opencv(outs,model_h,model_w,img_h,img_w,thred_nms,thred_cond)

    return  boxes,confs,ids




if __name__ == "__main__":

    # 模型加载
    model_pb_path = "num1-8.onnx"
    so = ort.SessionOptions()
    net = ort.InferenceSession(model_pb_path, so)
    
    # 标签字典
    dic_labels= {0:'1',
            1:'2',
            2:'3',
            3:'4',
            4:'5',
            5:'6',
            6:'7',
            7:'8'}
    
    # 模型参数
    model_h = 320
    model_w = 320
    nl = 3
    na = 3
    stride=[8.,16.,32.]
    anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
    anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(nl, -1, 2)
    
    video = 0
    cap = cv2.VideoCapture(video)
    flag_det = False
    while True:
        success, img0 = cap.read()
        if success:
            
            if flag_det:
                t1 = time.time()
                det_boxes,scores,ids = infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5)
                t2 = time.time()
            
                
                for box,score,id in zip(det_boxes,scores,ids):
                    label = '%s:%.2f'%(dic_labels[id],score)
            
                    plot_one_box(box.astype(np.int16), img0, color=(255,0,0), label=label, line_thickness=None)
                    
                str_FPS = "FPS: %.2f"%(1./(t2-t1))
                
                cv2.putText(img0,str_FPS,(50,50),cv2.FONT_HERSHEY_COMPLEX,1,(0,255,0),3)
                
            
            cv2.imshow("video",img0)
        key=cv2.waitKey(1) & 0xFF    
        if key == ord('q'):
        
            break
        elif key & 0xFF == ord('s'):
            flag_det = not flag_det
            print(flag_det)
            
    cap.release() 
    
    
    
    
    # # 进行推理
    # img0 = cv2.imread('3.jpg')
    # t1 = time.time()
    # det_boxes,scores,ids = infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5)
    # t2 = time.time()
    # print("%.2f"%(t2-t1))
    # 结果绘图
    # for box,score,id in zip(det_boxes,scores,ids):
        # label = '%s:%.2f'%(dic_labels[id],score)
        
        # plot_one_box(box.astype(np.int), img0, color=(255,0,0), label=label, line_thickness=None)
    # cv2.imshow('img',img0)

    # cv2.waitKey(0)
    





    # img = cv2.resize(img0, [320,320], interpolation=cv2.INTER_AREA)

    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # img = img.astype(np.float32) / 255.0
    # blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0)

    # outs = net.run(None, {net.get_inputs()[0].name: blob})[0].squeeze(axis=0)

    # nl = 3
    # na = 3
    # stride=[8.,16.,32.]
    # anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
    # anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(nl, -1, 2)
    # model_w = 320
    # model_h = 320
    # outs = cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride)

    # print(outs)
    # boxes,confs,ids = post_process_opencv(outs,model_h,model_w,img_h=480,img_w=640,thred_nms=0.4,thred_cond=0.5)
    # print(boxes)







原文代码如下:

import cv2
import numpy as np
import onnxruntime as ort
import time

def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    """
    description: Plots one bounding box on image img,
                 this function comes from YoLov5 project.
    param: 
        x:      a box likes [x1,y1,x2,y2]
        img:    a opencv image object
        color:  color to draw rectangle, such as (0,255,0)
        label:  str
        line_thickness: int
    return:
        no return
    """
    tl = (
        line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1
    )  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(
            img,
            label,
            (c1[0], c1[1] - 2),
            0,
            tl / 3,
            [225, 255, 255],
            thickness=tf,
            lineType=cv2.LINE_AA,
        )

def _make_grid( nx, ny):
        xv, yv = np.meshgrid(np.arange(ny), np.arange(nx))
        return np.stack((xv, yv), 2).reshape((-1, 2)).astype(np.float32)

def cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride):
    
    row_ind = 0
    grid = [np.zeros(1)] * nl
    for i in range(nl):
        h, w = int(model_w/ stride[i]), int(model_h / stride[i])
        length = int(na * h * w)
        if grid[i].shape[2:4] != (h, w):
            grid[i] = _make_grid(w, h)

        outs[row_ind:row_ind + length, 0:2] = (outs[row_ind:row_ind + length, 0:2] * 2. - 0.5 + np.tile(
            grid[i], (na, 1))) * int(stride[i])
        outs[row_ind:row_ind + length, 2:4] = (outs[row_ind:row_ind + length, 2:4] * 2) ** 2 * np.repeat(
            anchor_grid[i], h * w, axis=0)
        row_ind += length
    return outs



def post_process_opencv(outputs,model_h,model_w,img_h,img_w,thred_nms,thred_cond):
    conf = outputs[:,4].tolist()
    c_x = outputs[:,0]/model_w*img_w
    c_y = outputs[:,1]/model_h*img_h
    w  = outputs[:,2]/model_w*img_w
    h  = outputs[:,3]/model_h*img_h
    p_cls = outputs[:,5:]
    if len(p_cls.shape)==1:
        p_cls = np.expand_dims(p_cls,1)
    cls_id = np.argmax(p_cls,axis=1)

    p_x1 = np.expand_dims(c_x-w/2,-1)
    p_y1 = np.expand_dims(c_y-h/2,-1)
    p_x2 = np.expand_dims(c_x+w/2,-1)
    p_y2 = np.expand_dims(c_y+h/2,-1)
    areas = np.concatenate((p_x1,p_y1,p_x2,p_y2),axis=-1)
    
    areas = areas.tolist()
    ids = cv2.dnn.NMSBoxes(areas,conf,thred_cond,thred_nms)
    if len(ids)>0:
        return  np.array(areas)[ids],np.array(conf)[ids],cls_id[ids]
    else:
        return [],[],[]
def infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5):
    # 图像预处理
    img = cv2.resize(img0, [model_w,model_h], interpolation=cv2.INTER_AREA)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32) / 255.0
    blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0)

    # 模型推理
    outs = net.run(None, {net.get_inputs()[0].name: blob})[0].squeeze(axis=0)

    # 输出坐标矫正
    outs = cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride)

    # 检测框计算
    img_h,img_w,_ = np.shape(img0)
    boxes,confs,ids = post_process_opencv(outs,model_h,model_w,img_h,img_w,thred_nms,thred_cond)

    return  boxes,confs,ids




if __name__ == "__main__":

    # 模型加载
    model_pb_path = "best_lite_led_e.onnx"
    so = ort.SessionOptions()
    net = ort.InferenceSession(model_pb_path, so)
    
    # 标签字典
    dic_labels= {0:'led',
            1:'buzzer',
            2:'teeth'}
    
    # 模型参数
    model_h = 320
    model_w = 320
    nl = 3
    na = 3
    stride=[8.,16.,32.]
    anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
    anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(nl, -1, 2)
    
    video = 0
    cap = cv2.VideoCapture(video)
    flag_det = False
    while True:
        success, img0 = cap.read()
        if success:
            
            if flag_det:
                t1 = time.time()
                det_boxes,scores,ids = infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5)
                t2 = time.time()
            
                
                for box,score,id in zip(det_boxes,scores,ids):
                    label = '%s:%.2f'%(dic_labels[id],score)
            
                    plot_one_box(box.astype(np.int16), img0, color=(255,0,0), label=label, line_thickness=None)
                    
                str_FPS = "FPS: %.2f"%(1./(t2-t1))
                
                cv2.putText(img0,str_FPS,(50,50),cv2.FONT_HERSHEY_COMPLEX,1,(0,255,0),3)
                
            
            cv2.imshow("video",img0)
        key=cv2.waitKey(1) & 0xFF    
        if key == ord('q'):
        
            break
        elif key & 0xFF == ord('s'):
            flag_det = not flag_det
            print(flag_det)
            
    cap.release() 
    
    
    
    
    # # 进行推理
    # img0 = cv2.imread('3.jpg')
    # t1 = time.time()
    # det_boxes,scores,ids = infer_img(img0,net,model_h,model_w,nl,na,stride,anchor_grid,thred_nms=0.4,thred_cond=0.5)
    # t2 = time.time()
    # print("%.2f"%(t2-t1))
    # 结果绘图
    # for box,score,id in zip(det_boxes,scores,ids):
        # label = '%s:%.2f'%(dic_labels[id],score)
        
        # plot_one_box(box.astype(np.int), img0, color=(255,0,0), label=label, line_thickness=None)
    # cv2.imshow('img',img0)

    # cv2.waitKey(0)
    





    # img = cv2.resize(img0, [320,320], interpolation=cv2.INTER_AREA)

    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # img = img.astype(np.float32) / 255.0
    # blob = np.expand_dims(np.transpose(img, (2, 0, 1)), axis=0)

    # outs = net.run(None, {net.get_inputs()[0].name: blob})[0].squeeze(axis=0)

    # nl = 3
    # na = 3
    # stride=[8.,16.,32.]
    # anchors = [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]]
    # anchor_grid = np.asarray(anchors, dtype=np.float32).reshape(nl, -1, 2)
    # model_w = 320
    # model_h = 320
    # outs = cal_outputs(outs,nl,na,model_w,model_h,anchor_grid,stride)

    # print(outs)
    # boxes,confs,ids = post_process_opencv(outs,model_h,model_w,img_h=480,img_w=640,thred_nms=0.4,thred_cond=0.5)
    # print(boxes)

其中原文代码复现没有问题,运行成功!
但是我将代码中的模型改成我自己的,并且将标签修改,就报错了!

我在想是不是原文是3个标签,我是8个标签才导致的问题,

还有原文章中提到

img

我也不是很懂。

  • 写回答

6条回答 默认 最新

  • 阿里嘎多学长 2024-07-12 19:29
    关注

    以下内容由CHATGPT及阿里嘎多学长共同生成、有用望采纳:


    您好!看起来您在树莓派上部署 YOLOv5-Lite 模型时遇到了一些困难。首先,我们来分析一下您遇到的问题。

    错误分析

    错误信息 ValueError: operands could not be broadcast together with shapes (3,2,40,13) (4800,2) 指出在执行数组操作时,两个数组的形状不兼容,无法进行预期的广播(broadcasting)。在您的 cal_outputs 函数中,问题可能出在这一行:

    outs[row_ind:row_ind + length, 0:2] = (outs[row_ind:row_ind + length, 0:2] * 2. - 0.5 + np.tile(grid[i], (na, 1))) * int(stride[i])
    

    这里尝试将 grid[i]outs 的一部分相加,但 grid[i] 的形状与 outs 的形状不匹配,导致无法执行广播。

    解决方案

    1. 检查数组形状:首先,确保 grid[i] 的形状与您尝试进行操作的 outs 的部分形状兼容。grid[i] 应该是一个二维数组,其形状为 (h, w),其中 hw 是特征图的高度和宽度。

    2. 调整 np.tile 的使用np.tile 用于重复数组,确保它正确地扩展了 grid[i] 以匹配 outs 的相关部分。

    3. 标签数量问题:增加标签数量不应该导致这个错误,但请确保模型输出的最后部分(类别概率)与标签数量匹配。

    K-means 聚类算法

    K-means 聚类算法可以用来确定最佳的 anchor 框大小。在目标检测中,anchor 框是预先定义的框,用于预测目标的位置和大小。使用 K-means 聚类分析训练数据集中的边界框,可以找到数据集中边界框大小的自然聚类,从而为每个尺度(scale)生成更合适的 anchor 框。

    应用 K-means 聚类

    1. 收集训练数据集中的边界框
    2. 运行 K-means 聚类算法,将边界框的宽度和高度作为特征,聚类数量设置为您想要的 anchor 数量。
    3. 从每个聚类中选择中心点 作为 anchor 框的宽度和高度。

    参考链接

    希望这些信息能帮助您解决问题。如果需要进一步的帮助,请随时告诉我!

    评论 编辑记录

报告相同问题?

问题事件

  • 创建了问题 7月12日

悬赏问题

  • ¥15 欧拉系统opt目录空间使用100%
  • ¥15 ul做导航栏格式不对怎么改?
  • ¥20 用户端如何上传图片到服务器和数据库里
  • ¥15 现在研究生在烦开题,看了一些文献,但不知道自己要做什么,求指导。
  • ¥30 vivado封装时总是显示缺少一个dcp文件
  • ¥100 pxe uefi启动 tinycore
  • ¥15 我pycharm运行jupyter时出现Jupyter server process exited with code 1,然后打开cmd显示如下
  • ¥15 可否使用carsim-simulink进行四轮独立转向汽车的联合仿真,实现四轮独立转向汽车原地旋转、斜向形式、横移等动作,如果可以的话在carsim中如何进行相应设置
  • ¥15 Caché 2016 在Java环境通过jdbc 执行sql报Parameter list mismatch错误,但是同样的sql使用连接工具可以查询出数据
  • ¥15 疾病的获得与年龄是否有关