guet_gjl 2021-04-02 20:08 采纳率: 100%
浏览 1351
已采纳

关于GradNorm(深度学习里面多任务学习的权重设计梯度归一化)代码debug错误

网上下载了一个深度学习代码,用于多任务权重梯度归一化 https://github.com/brianlan/pytorch-grad-norm

作者应该是参照这个论文写的GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

然后debug的时候,有个地方自动求梯度,在最后一行那里

    for t in range(n_iterations):

        # get a single batch
        for (it, batch) in enumerate(data_loader):
            # get the X and the targets values
            X = batch[0]
            ts = batch[1]
            if torch.cuda.is_available():
                X = X.cuda()
                ts = ts.cuda()

            # evaluate each task loss L_i(t)
            task_loss = model(X, ts) # this will do a forward pass in the model and will also evaluate the loss
            # compute the weighted loss w_i(t) * L_i(t)
            weighted_task_loss = torch.mul(model.weights, task_loss)
            # initialize the initial loss L(0) if t=0
            if t == 0:
                # set L(0)
                if torch.cuda.is_available():
                    initial_task_loss = task_loss.data.cpu()
                else:
                    initial_task_loss = task_loss.data
                initial_task_loss = initial_task_loss.numpy()

            # get the total loss
            loss = torch.sum(weighted_task_loss)
            # clear the gradients
            optimizer.zero_grad()
            # do the backward pass to compute the gradients for the whole set of weights
            # This is equivalent to compute each \nabla_W L_i(t)
            loss.backward(retain_graph=True)

            # set the gradients of w_i(t) to zero because these gradients have to be updated using the GradNorm loss
            #print('Before turning to 0: {}'.format(model.weights.grad))
            model.weights.grad.data = model.weights.grad.data * 0.0
            #print('Turning to 0: {}'.format(model.weights.grad))


            # switch for each weighting algorithm:
            # --> grad norm
            if args.mode == 'grad_norm':
                
                # get layer of shared weights
                W = model.get_last_shared_layer()
                print(W.parameters())
                # get the gradient norms for each of the tasks
                # G^{(i)}_w(t) 
                norms = []
                for i in range(len(task_loss)):
                    # get the gradient of this task loss with respect to the shared parameters
                    gygw = torch.autograd.grad(task_loss[i], W.parameters(), retain_graph=True)
                    # compute the norm
                    norms.append(torch.norm(torch.mul(model.weights[i], gygw[0])))
                norms = torch.stack(norms)
                #print('G_w(t): {}'.format(norms))


                # compute the inverse training rate r_i(t) 
                # \curl{L}_i 
                if torch.cuda.is_available():
                    loss_ratio = task_loss.data.cpu().numpy() / initial_task_loss
                else:
                    loss_ratio = task_loss.data.numpy() / initial_task_loss
                # r_i(t)
                inverse_train_rate = loss_ratio / np.mean(loss_ratio)
                #print('r_i(t): {}'.format(inverse_train_rate))


                # compute the mean norm \tilde{G}_w(t) 
                if torch.cuda.is_available():
                    mean_norm = np.mean(norms.data.cpu().numpy())
                else:
                    mean_norm = np.mean(norms.data.numpy())
                #print('tilde G_w(t): {}'.format(mean_norm))


                # compute the GradNorm loss 
                # this term has to remain constant
                constant_term = torch.tensor(mean_norm * (inverse_train_rate ** args.alpha), requires_grad=False)
                if torch.cuda.is_available():
                    constant_term = constant_term.cuda()
                #print('Constant term: {}'.format(constant_term))
                #this is the GradNorm loss itself
                grad_norm_loss = torch.tensor(torch.sum(torch.abs(norms - constant_term)))
                #print('GradNorm loss {}'.format(grad_norm_loss))

                # compute the gradient for the weights
                model.weights.grad = torch.autograd.grad(grad_norm_loss, model.weights)[0]

先是报错runtimeerror:element 0 of tensors does not require grad and does not have a grad_fn,我检查了之后估计是grad_norm_loss这个变量require_grad=False,我改成了True,但是还报错runtimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.我就加上了allow_unsed=True,因此我把最后一行改成这样

model.weights.grad = torch.autograd.grad(
    Variable(torch.tensor(torch.sum(torch.abs(norms - constant_term))), requires_grad=True),
    model.weights)[0]

但是求出来的梯度=none,这样子就导致我这个model.weights就不能更新了,我不知道是怎么回事,求求各位大神帮忙看看

  • 写回答

4条回答 默认 最新

  • CSDN专家-孙老师 2021-04-04 01:55
    关注

    可能跟叶子节点(最初定义的变量)有关,计算梯度的时候,只有叶子节点才会保留梯度,所有中间节点的grad在计算完backward()的时候为了节约内存都会被清除掉。建议检查最初定义的变量是否有改变。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 关于#c语言#的问题:构成555单稳态触发器,采用LED指示灯延时时间,对延时时间进行测量并显示(如楼道声控延时灯)需要Proteus仿真图和C语言代码
  • ¥50 神舟笔记本,没有linux的驱动,装的Ubuntu系统,想把风扇速度调到最大
  • ¥15 workstation加载centos进入emergency模式,查看日志报警如图,怎样解决呢?
  • ¥50 如何用单纯形法寻优不能精准找不到给定的参数,并联机构误差识别,给定误差有7个?matlab
  • ¥15 workstation加载centos进入emergency模式,查看日志报警如图,没有XFS,怎样解决呢?
  • ¥15 应用商店如何检测在架应用内容是否违规?
  • ¥15 Ubuntu系统配置PX4
  • ¥50 nw.js调用activex
  • ¥15 数据库获取信息反馈出错,直接查询了ref字段并且还使用了User文档的_id而不是自己的
  • ¥15 将安全信息用到以下对象时发生以下错误:c:dumpstack.log.tmp 另一个程序正在使用此文件,因此无法访问