Blucoris 2022-08-03 09:05 采纳率: 75%
浏览 99
已结题

pytorch提高正确率,反向传播不会写

问题遇到的现象和发生背景

在改进一个resnet18的模型的过程中,我们遇到了了一个正确率在80%波动的一个障碍,我们想使得正确率更高,于是想让误差反向传播的过程在一个循环里搞两遍,但是出现了这样的报错,请问如何让误差反向传播过程在一个循环里搞两遍呢?

问题相关代码,请勿粘贴截图
# -*- coding: utf-8 -*-
"""
Created on Fri Jul 22 10:26:33 2022
@author: Blucoris Liang
"""

import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models

EPOCH=30
BATCH_SIZE=40
LR=0.0007

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                              std=[0.229, 0.224, 0.225])   

train_dataset = datasets.ImageFolder(
        'C:\\Users\\19544\\.spyder-py3\\leapGestRecog\\00',
        transforms.Compose([
                transforms.RandomResizedCrop(224),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                normalize,
                ]))

train_loader = Data.DataLoader(
        train_dataset,
        batch_size=BATCH_SIZE,
        shuffle=True)

test_loader = Data.DataLoader(
        datasets.ImageFolder(
                'C:\\Users\\19544\\.spyder-py3\\leapGestRecog\\03', 
                transforms.Compose([
                        transforms.Resize(256),
                        transforms.CenterCrop(224),
                        transforms.ToTensor(),
                        normalize,
                        ])),
        batch_size=BATCH_SIZE, shuffle=False,)

# 数据集长度
train_data_size = len(train_dataset)
print('训练集的长度为:{}'.format(train_data_size))



model = models.resnet18(pretrained=True)

################################
if torch.cuda.is_available():  #
    model = models.resnet18(pretrained=True).cuda()   #
################################



model.fc = torch.nn.Linear(in_features=512, out_features=10, bias=True).cuda()

fc_params = list(map(id, model.fc.parameters())) # map函数是将fc.parameters()的id返回并组成一个列表

base_params = filter(lambda p: id(p) not in fc_params, model.parameters()) # filter函数是将model.parameters()中地址不在fc.parameters的id中的滤出来

optimizer = torch.optim.SGD([ {'params': base_params}, {'params': model.fc.parameters(), 'lr': LR * 100}], lr=LR)

loss_func=nn.CrossEntropyLoss()

################################
if torch.cuda.is_available():  #
    loss_func = loss_func.cuda()   #
################################


class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)
            
def accuracy(output, target, topk=(1,)):
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res           
            
train_losses = AverageMeter('TrainLoss', ':.4e')
train_top1 = AverageMeter('TrainAccuracy', ':6.2f')
test_losses = AverageMeter('TestLoss', ':.4e')
test_top1 = AverageMeter('TestAccuracy', ':6.2f')

for epoch in range(EPOCH):
    
    model.train()
    for i,(images,target) in enumerate(train_loader):
        ################################
        if torch.cuda.is_available():  #
            images = images.cuda()         #
            target = target.cuda()   #
        ################################
        output=model(images)
        ################################
        if torch.cuda.is_available():  #
            output = output.cuda()   #
        ################################
        loss= loss_func(output,target)
        
        acc1, = accuracy(output, target, topk=(1,))
        train_losses.update(loss.item(), images.size(0))
        train_top1.update(acc1[0], images.size(0))
        # 反向传播第一遍
        optimizer.zero_grad()
        loss.backward(retain_graph = True)
        optimizer.step()
        torch.autograd.set_detect_anomaly(True)
        # 反向传播第二遍
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        print('Epoch[{}/{}],TrainLoss:{}, TrainAccuracy:{}'.format(epoch,EPOCH,train_losses.val, train_top1.val))
           
    model.eval()
    with torch.no_grad():
        for i,(images,target) in enumerate(test_loader):
            ################################
            if torch.cuda.is_available():  #
                images = images.cuda()    #
                target = target.cuda()   #
            ################################
            output=model(images)
            loss= loss_func(output,target)
            
            acc1, = accuracy(output, target, topk=(1,))
            test_losses.update(loss.item(), images.size(0))
            test_top1.update(acc1[0], images.size(0))
            
    print('TestLoss:{}, TestAccuracy:{}'.format(test_losses.avg, test_top1.avg))




运行结果及报错内容
runfile('C:/Users/19544/.spyder-py3/成功对手势识别用resnet进行了第一次训练.py', wdir='C:/Users/19544/.spyder-py3')
训练集的长度为:2000
D:\ANACONDA\envs\MyEnv\lib\site-packages\torch\autograd\__init__.py:173: UserWarning: Error detected in AddmmBackward0. No forward pass information available. Enable detect anomaly during forward pass for more information. (Triggered internally at  C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\python_anomaly_mode.cpp:85.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):

  File "C:\Users\19544\.spyder-py3\成功对手势识别用resnet进行了第一次训练.py", line 146, in <module>
    loss.backward()

  File "D:\ANACONDA\envs\MyEnv\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

  File "D:\ANACONDA\envs\MyEnv\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 10]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

我的解答思路和尝试过的方法

之前尝试过添加上了retain_graph = True,好像也不行

我想要达到的结果

成功实现在一个循环里误差反向传播两次,或者希望能获取大家的其他提高正确率的好方法

  • 写回答

3条回答 默认 最新

  • herosunly Python领域优质创作者 2022-08-03 10:37
    关注

    反向传播那部分的代码修改如下:

    loss.backward(retain_graph=True)
    optimizer.step()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

问题事件

  • 系统已结题 8月11日
  • 已采纳回答 8月3日
  • 创建了问题 8月3日

悬赏问题

  • ¥60 优博讯DT50高通安卓11系统刷完机自动进去fastboot模式
  • ¥15 minist数字识别
  • ¥15 在安装gym库的pygame时遇到问题,不知道如何解决
  • ¥20 uniapp中的webview 使用的是本地的vue页面,在模拟器上显示无法打开
  • ¥15 网上下载的3DMAX模型,不显示贴图怎么办
  • ¥15 关于#stm32#的问题:寻找一块开发版,作为智能化割草机的控制模块和树莓派主板相连,要求:最低可控制 3 个电机(两个驱动电机,1 个割草电机),其次可以与树莓派主板相连电机照片如下:
  • ¥15 Mac(标签-IDE|关键词-File) idea
  • ¥15 潜在扩散模型的Unet特征提取
  • ¥15 iscsi服务无法访问,如何解决?
  • ¥15 感应式传感器制作的感应式讯响器