qq_16200583 2022-04-01 23:38 采纳率: 33.3%
浏览 228

为什么跑深度学习,matlab与PyTorch结果不一?

我的数据是四维数据,一共8000个样本,做深度学习分类问题。先用PyTorch进行训练测试。

训练集与测试集四比一划分。

模型为卷积-bn层-relu-maxpool,共两层,最后加Softmax分类。

由于bn层的存在,在加入model.eval之后验证集的准确率波动很大,有的时候70%直接跌落20%,检查了归一化,batchsize,学习率等地方都没有发现问题。于是在matlab上使用相同的划分,相同的参数,相同的模型,但最后的验证集结果很稳定,可以收敛,波动也不大。

已知PyTorch代码没有问题,matlab效果不错也证明数据没有问题。

import h5py
import torch
import torch.nn as nn
from torch.utils import data
import torch.nn.functional as F
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
import numpy as np
from torch import optim
import matplotlib.pyplot as plt
from tqdm import tqdm

xy = h5py.File('autodl-nas/data60-80.mat','r')
x  = xy['dataX_new']
y  = xy['dataY_new']
x  = torch.tensor(np.array(x))    #(8000,94,80,60)
y  = torch.squeeze(torch.tensor((np.array(y)-np.ones((8000,1))),dtype = torch.long))   #(8000,)

def train_test_dataset(x,y):
    xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=0.2,shuffle=True,stratify=y)
    trainset = data.TensorDataset(xtrain,ytrain)
    testset  = data.TensorDataset(xtest,ytest)
    return trainset,testset

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(94,32,3,padding=1)
        self.conv2 = nn.Conv2d(32,64,3,padding=1)
        self.bn1   = nn.BatchNorm2d(32)
        self.bn2   = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(5,2,padding=2)
        self.fc1   = nn.Linear(64*15*20,128)
        self.fc2   = nn.Linear(128,32)
        self.fc3   = nn.Linear(32,8)

    def forward(self,x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.maxpool(x)
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.maxpool(x)
        x = x.view(-1,64*15*20)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

device = torch.device("cuda")
lr=8e-4

criterion = nn.CrossEntropyLoss()
criterion.to(device)

def fit(epoch, model, trainloader, testloader):
    correct = 0
    total   = 0
    running_loss = 0.0
    model.train()
    for x,y in tqdm(trainloader):
        x,y = x.to(device),y.to(device)
        y_pred = model(x)
        loss   = criterion(y_pred,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        with torch.no_grad():
            y_pred = torch.argmax(y_pred,dim=1)
            correct += (y_pred == y).sum().item()
            total += y.size(0)
            running_loss += loss.item()
    epoch_loss = running_loss / len(trainloader.dataset)
    epoch_acc  = correct/total

    test_correct = 0
    test_total   = 0
    test_running_loss = 0.0
    model.eval()
    with torch.no_grad():
        for x,y in testloader:
            x,y = x.to(device),y.to(device)
            y_pred = model(x)
            loss = criterion(y_pred, y)
            y_pred = torch.argmax(y_pred,dim=1)
            test_correct += (y_pred == y).sum().item()
            test_total += y.size(0)
            test_running_loss += loss.item()

    epoch_test_loss = test_running_loss/len(testloader.dataset)
    epoch_test_acc  = test_correct/test_total

    print("epoch:{} train_loss:{} train_accuracy:{} test_loss:{} test_accuracy:{}".format(
        epoch,round(epoch_loss,3),round(epoch_acc,3),round(epoch_test_loss,3),round(epoch_test_acc,3)))

    return epoch_loss, epoch_acc, epoch_test_loss, epoch_test_acc

if __name__ =='__main__':
    epochs = 80
    batchsize = 128
    train_loss = []
    train_acc  = []
    test_loss  = []
    test_acc   = []
    epoch_list = []
    # 随机划分
    trainset, testset = train_test_dataset(x,y)
    model = Model()
    model.type(dst_type='torch.DoubleTensor')
    model.to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    trainloader = data.DataLoader(trainset,batch_size=batchsize,shuffle=True)
    testloader  = data.DataLoader(testset,batch_size=batchsize,shuffle=False)
    for epoch in range(epochs):
        epoch_loss, epoch_acc, epoch_test_loss, epoch_test_acc = fit(epoch,model,trainloader, testloader)
        train_loss.append(epoch_loss)
        train_acc.append(epoch_acc)
        test_loss.append(epoch_test_loss)
        test_acc.append(epoch_test_acc)
        epoch_list.append(epoch)
    # 画图
    plt.figure(figsize=(10,10))
    plt.subplot(2,1,1)
    plt.plot(epoch_list,train_loss)
    plt.plot(epoch_list,test_loss)
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['train','test'])
    plt.grid(True)
    plt.subplot(2,1,2)
    plt.plot(epoch_list,train_acc)
    plt.plot(epoch_list,test_acc)
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.legend(['train','test'])
    plt.grid(True)
    plt.show()

  • 写回答

2条回答 默认 最新

  • 爱晚乏客游 2022-04-02 09:46
    关注

    就算不一致也不至于像你说的差距那么大,不然深度学习早就用matlab而不是py了。你不贴代码谁知道你说的代码没问题是真没问题还是假没问题
    我用你的代码跑的mnist,所以代码问题也不是很大。
    关键就是你说的数据归一化在哪里?然后epoch加大一些看下是不是没有收敛。
    下面两图就是跑mnist的时候对数据归一化(上图,img/255.0)和没有对数据归一化(下图)的loss曲线和acc曲线,你可以看到没有归一化的数据曲线波动比较大

    img


    img

    评论 编辑记录

报告相同问题?

问题事件

  • 修改了问题 4月20日
  • 创建了问题 4月1日

悬赏问题

  • ¥15 12864只亮屏 不显示汉字
  • ¥20 三极管1000倍放大电路
  • ¥15 vscode报错如何解决
  • ¥15 前端vue CryptoJS Aes CBC加密后端java解密
  • ¥15 python随机森林对两个excel表格读取,shap报错
  • ¥15 基于STM32心率血氧监测(OLED显示)相关代码运行成功后烧录成功OLED显示屏不显示的原因是什么
  • ¥100 X轴为分离变量(因子变量),如何控制X轴每个分类变量的长度。
  • ¥30 求给定范围的全体素数p的(p-2)/p的连乘积值
  • ¥15 VFP如何使用阿里TTS实现文字转语音?
  • ¥100 需要跳转番茄畅听app的adb命令