Mnist两层神经网络梯度一直为零

最近开始学习机器学习，在编写简单的二层神经网络的过程中，发现损失函数一直居高不下，然后看了一下损失函数的梯度，发现梯度一直都是零。
我在途中试着改变损失函数，将cross entropy error函数换成mean_squared_error函数；又试着改变激活函数:sigmoid函数，Relu函数，softmax函数都试着用过了；然后又试着改变训练次数，mini_batch的大小，学习率，可惜都无济于事，我希望能够找到问题所在。以下是我的完整代码。希望有大佬能指点一下迷津。

from keras.datasets import mnist
import keras
import numpy as np
from PIL import Image
import matplotlib.pylab as plt


# 显示图形
def img_show(img):
    pil_img = Image.fromarray(np.uint8(img))
    pil_img.show()


# 2层神经网络的类
class TwoLayerNet:
    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        self.params = {}
        self.params['w1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['w2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def predict(self, x):  # x=输入
        w1, w2 = self.params['w1'], self.params['w2']#权重
        b1, b2 = self.params['b1'], self.params['b2']#偏移
        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)#第一层输出
        a2 = np.dot(z1, w2) + b2
        z2 = softmax(a2)#第二层输出
        return z2

    def loss(self, x, t):  # x=输入，t=监督数据
        y = self.predict(x)
        return cross_entropy_error(y, t)

    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy

    def numerical_gradient(self, x, t):
        loss_w = lambda w: self.loss(x, t)#损失函数
        grads = {
            'w1': numerical_gradient(loss_w, self.params['w1']),
            'b1': numerical_gradient(loss_w, self.params['b1']),
            'w2': numerical_gradient(loss_w, self.params['w2']),
            'b2': numerical_gradient(loss_w, self.params['b2'])
        }
        return grads


# 梯度函数
def numerical_gradient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    for idx in range(x.shape[0]):
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x)
        x[idx] = tmp_val - h
        fxh2 = f(x)
        grad[idx] = (fxh1 - fxh2) / (2 * h)#求梯度
        x[idx] = tmp_val#还原x
    return grad


# 误差函数cross entropy error
def cross_entropy_error(y, t):
    delta = 1e-7
    return -np.sum(t * np.log(y + delta))


# softmax函数
def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a - c)  # 防止溢出
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a
    return y


# sigmoid函数
def sigmoid(a):
    out = a.copy()
    sel = ((a > 100) & (a < -100))
    out = 1 / (1 + np.exp(-a))#sigmoid计算
    out[sel] = 1 / (1 + np.exp(-100))#防止指数爆炸
    return out


# 验证集转one-hot
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

# 数据改为60000*784浮点格式
x_train = x_train.reshape(x_train.shape[0], 784).astype('float')

train_loss_list = []

# 参数初始化
iters_num = 100  # 循环次数
train_size = x_train.shape[0]  # 总数据量
batch_size = 32  # 每次取出的数据量
learning_rate = 0.1  # 学习率
network = TwoLayerNet(input_size=784, hidden_size=100, output_size=10)  # 创建对象

for i in range(iters_num):
    # 获取mini_batch
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    y_batch = y_train[batch_mask]

    # 计算梯度
    grad = network.numerical_gradient(x_batch, y_batch)
    # 更新参数
    for key in ('w1', 'b1', 'w2', 'b2'):
        network.params[key] = network.params[key] - learning_rate * grad[key]
    # 损失量
    loss = network.loss(x_batch, y_batch)
    train_loss_list.append(loss)

# 损失量图像
x = np.arange(0, iters_num / 10, 0.1)
y = np.array(train_loss_list)
plt.plot(x, y)
plt.show()

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

herosunly Python领域优质创作者 2022-08-04 13:45

关注

导包的代码我省略了，里面主要修改的是数据需要归一化、权重系统weight_init_std=0.01修改为了1、学习率降低(修改为0.05)。如果想做的更好一些，可以增加dropout

# 显示图形
def img_show(img):
    pil_img = Image.fromarray(np.uint8(img))
    pil_img.show()
 
 
# 2层神经网络的类
class TwoLayerNet:
    def __init__(self, input_size, hidden_size, output_size, weight_init_std=1):
        self.params = {}
        self.params['w1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['w2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)
 
    def predict(self, x):  # x=输入
        w1, w2 = self.params['w1'], self.params['w2']#权重
        b1, b2 = self.params['b1'], self.params['b2']#偏移
        a1 = np.dot(x, w1) + b1
        z1 = sigmoid(a1)#第一层输出
        a2 = np.dot(z1, w2) + b2
        z2 = softmax(a2)#第二层输出
        return z2
 
    def loss(self, x, t):  # x=输入，t=监督数据
        y = self.predict(x)
        return cross_entropy_error(y, t)
 
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
 
    def numerical_gradient(self, x, t):
        loss_w = lambda w: self.loss(x, t)#损失函数
        grads = {
            'w1': numerical_gradient(loss_w, self.params['w1']),
            'b1': numerical_gradient(loss_w, self.params['b1']),
            'w2': numerical_gradient(loss_w, self.params['w2']),
            'b2': numerical_gradient(loss_w, self.params['b2'])
        }
        return grads
 
 
# 梯度函数
def numerical_gradient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    for idx in range(x.shape[0]):
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x)
        x[idx] = tmp_val - h
        fxh2 = f(x)
        grad[idx] = (fxh1 - fxh2) / (2 * h)#求梯度
        x[idx] = tmp_val#还原x
    return grad
 
 
# 误差函数cross entropy error
def cross_entropy_error(y, t):
    delta = 1e-7
    return -np.sum(t * np.log(y + delta))
 
 
# softmax函数
def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a - c)  # 防止溢出
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a
    return y
 
 
# sigmoid函数
def sigmoid(a):
    out = a.copy()
    sel = ((a > 100) & (a < -100))
    out = 1 / (1 + np.exp(-a))#sigmoid计算
    out[sel] = 1 / (1 + np.exp(-100))#防止指数爆炸
    return out
 
 
# 验证集转one-hot
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)
 
# 数据改为60000*784浮点格式
x_train = x_train.reshape(x_train.shape[0], 784).astype('float')
x_train = x_train / 255.0
 
train_loss_list = []
 
# 参数初始化
iters_num = 100  # 循环次数
train_size = x_train.shape[0]  # 总数据量
batch_size = 32  # 每次取出的数据量
learning_rate = 0.05  # 学习率
network = TwoLayerNet(input_size=784, hidden_size=100, output_size=10)  # 创建对象
 
for i in range(iters_num):
    # 获取mini_batch
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    y_batch = y_train[batch_mask]
 
    # 计算梯度
    grad = network.numerical_gradient(x_batch, y_batch)
    # 更新参数
    for key in ('w1', 'b1', 'w2', 'b2'):
        network.params[key] = network.params[key] - learning_rate * grad[key]
    # 损失量
    loss = network.loss(x_batch, y_batch)
    train_loss_list.append(loss)
 
# 损失量图像
x = np.arange(0, iters_num / 10, 0.1)
y = np.array(train_loss_list)
plt.plot(x, y)
plt.show()

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

Mnist两层神经网络梯度一直为零 python 神经网络
2022-08-03 19:10

回答 2 已采纳导包的代码我省略了，里面主要修改的是数据需要归一化、权重系统weight_init_std=0.01修改为了1、学习率降低(修改为0.05)。如果想做的更好一些，可以增加dropout # 显示图形
使用mnist数据搭建的神经网络训练时GPU占用率剧烈波动人工智能机器学习神经网络
2023-04-08 12:36

回答 1 已采纳 mnist数据集太小了，所以gpu用不满，这是很正常的
用tensorflow写一个简单的神经网络识别mnist出现问题（python） python tensorflow 神经网络
2017-08-25 02:45

回答 1 已采纳 b1 = tf.Variable(tf.zeros([784,100]))改成b1 = tf.Variable(tf.zeros([100,])) 应该可以解决这个报错
MNIST前馈神经网络实现1
2022-08-04 11:49

神经网络在计算机科学、人工智能和机器学习领域扮演着至关重要的角色。 **神经网络的结构** 神经网络主要由三个部分组成：输入层、隐藏层和输出层。输入层接收原始数据，隐藏层进行复杂的计算，而输出层则产生最终...
在神经网络搭建里后缀一个(x)是什么 keras python 神经网络
2022-09-05 14:20

回答 1 已采纳 a=b()(x)这看起来很怪吗如果你知道函数b的返回值是一个函数，像这样def b(): def c(): ... return c还怪吗b()，其实就是cb()(x)其实就是c(x
pytorch的MNIST代码中loss输出的疑问人工智能机器学习深度学习神经网络
2020-06-29 00:25

回答 1 已采纳 https://www.jianshu.com/p/3db91d6bdf83
用Python实现MNIST字体识别语法错误 python tensorflow 人工智能深度学习神经网络
2020-05-06 13:20

回答 2 已采纳 print (sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})) 加上括号看看
Mnist分别用两层神经网络和三层神经网络实现(二)
2022-11-25 17:20

我就是人工智能的博客 Mnist分别用两层神经网络和三层神经网络实现(二)
tensorflow CNN训练mnist数据集后识别自己写的数字效果不好 cnn tensorflow 神经网络
2018-04-15 16:32

回答 5 已采纳 MNIST数据集与你自己采集的图像，实际上是两个不同的数据集，你在MNIST上训练，然后在你的数据集上测试，测试性能不好是十分正常的。这实际上涉及在两个相似但是不同的域之间的迁移学习的问题。有三个办法
jupyter 中tf 搭建神经网络 如何确定自己的GPU参与了计算？ python 分类神经网络
2022-03-02 23:40

回答 1 已采纳有个简单的办法，提前打开任务管理器翻到GPU那一栏，在训练或者预测数据的时候观察各项指标特别是复制有没有起伏
如何自己做一个类似Fashion-MNIST的数据集 python 深度学习神经网络
2019-09-03 16:43

回答 1 已采纳 https://blog.csdn.net/sdoddyjm68/article/details/78430209
深度学习之使用BP神经网络识别MNIST数据集
2024-04-08 02:33

老肝犯的博客搭建这次的BP神经网络我的隐藏层有三层，分别是128，64，32个神经元，因为我们的图片是28*28=784得，我们需要把其展开成一维，所以第一层网络是784*128得，这样输入层中每一行代表一个样本（或者说一张图片得所有...
手写数字识别，神经网络交叉商结果正确，正确率总是不变深度学习神经网络
2018-09-18 12:53

回答 3 已采纳正确率是多少？如果保持在0.1左右，说明没有学习到（因为有10个分类，随机的权重识别出来正确概率就是0.1） Adam换成SGD，学习率设置小一点看看。调试下，输出下损失函数的损失率。
基于BP神经网络对MNIST数据集检测识别（Pytorch，Tensorflow版本）
2023-06-07 14:29

ZHW_AI课题组的博客 1756769702@qq.com张思怡，女，西安工程大学电子信息学院，2022级研究生，张宏伟人工智能课题组研究方向：机器视觉与人工智能电子邮件：981664791@qq.com首先从名称中可以看出，Bp神经网络可以分为两个部分，bp和...
基于BP神经网络对MNIST数据集检测识别（numpy版本）
2023-06-07 15:18

ZHW_AI课题组的博客 981664791@qq.com搭建一个两层（两个权重矩阵，一个隐藏层）的神经网络，其中输入节点和输出节点的个数是确定的，分别为 784 和 10。而隐藏层节点的个数还未确定，并没有明确要求隐藏层的节点个数，所以在这里取50个...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 8月12日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 8月4日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
赞助了问题酬金6元 8月3日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 8月3日

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

Mnist两层神经网络梯度一直为零

2条回答 默认 最新

问题事件

悬赏问题

2条回答默认最新