因为课题需求,需要求神经网络的二阶偏导,但是经过数次尝试,发现无论怎么实现,神经网络的二阶偏导结果都为0,以为对神经网络的理论知识掌握不是特别扎实,所以无法确定原因,但是猜想是不是神经网络较难拟合高次幂函数呢。
实现的目的如下:
f(x,y)=x²+y
用神经网络拟合f(x,y),即model(x,y)=f(x,y)
求model"xx,即∂²model/∂x²
使用的模型如下
model = keras.Sequential([
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(1),
])
模型已完成学习,随后用自动求导求偏分
with tf.GradientTape(persistent=True) as tape3:
tape3.watch(X)
tape3.watch(Y)
with tf.GradientTape(persistent=True) as tape4:
tape4.watch(X)
tape4.watch(Y)
Z = tf.concat([X, Y], 1)
ff = model(Z)
dy = tape4.gradient(ff, Y)
dx = tape4.gradient(ff, X)
print(tf.concat([dx, dy], 1))
dxdx = tape3.gradient(dx, X)
print(dxdx)
dx和dy都基本正确,但dxdx全部为0,按理说dxdx应该为2左右。找不到问题的原因,是我哪里写的不对吗,还是说,神经网络的拟合,是只针对x和y进行一次线性拟合,所以二阶导均为0...应该不是吧……
import tensorflow as tf
import numpy as np
from tensorflow import keras
import sys
import random
np.set_printoptions(threshold=np.inf)
np.set_printoptions(suppress=True)
x = np.arange(0, 101, dtype=float)
y = np.arange(0, 101, dtype=float)
for i in range(101):
x[i]=np.round(random.random()*10,2)
y[i]=np.round(random.random()*10,2)
x=x.reshape(101,1)
y=y.reshape(101,1)
z = np.arange(1, 203, dtype=float).reshape(101, 2)
lis = np.arange(2, dtype=float)
for i in range(101):
lis[0]=float(x[i])
lis[1]=float(y[i])
z[i] = lis
ans=(x*x+y).reshape(101,1)
X=tf.convert_to_tensor(x, dtype=float)
Y=tf.convert_to_tensor(y, dtype=float)
model = keras.Sequential([
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(1),
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, epsilon=1e-07)
for i in range(2000):
with tf.GradientTape(persistent=True) as tape:
tape.watch(X)
tape.watch(Y)
with tf.GradientTape(persistent=True) as tape2:
tape2.watch(X)
tape2.watch(Y)
Z = tf.concat([X, Y], 1)
f=model(Z)
loss = tf.reduce_mean(tf.square(ans - f))
grads = tape2.gradient(loss, model.variables)
dx = tape2.gradient(f, X)
dy = tape2.gradient(f, Y)
dxdx=tape.gradient(dx, X)
#print(dxdx)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
if i%10==0:
print(i, loss)
with tf.GradientTape(persistent=True) as tape3:
tape3.watch(X)
tape3.watch(Y)
with tf.GradientTape(persistent=True) as tape4:
tape4.watch(X)
tape4.watch(Y)
Z = tf.concat([X, Y], 1)
ff = model(Z)
dy = tape4.gradient(ff, Y)
dx = tape4.gradient(ff, X)
print(tf.concat([dx, dy], 1))
dxdx = tape3.gradient(dx, X)
print(dxdx)