Hey i found something .. change the learn mode of the CRF normal mode is join ..if you change it to marginal you will get sparse categorical cross entropy is sparse argument is kept true or else categorical cross entropy.
Negative CRF loss if mask_zero=False
When training a simple model with CRF its loss becomes negative after some time if
mask_zero=False, which I've noticed while working on a larger BiLSTM+CRF for NER. My bigger model converges to ~90% acc without the CRF layer and to ~95% with the CRF layer, but its loss starts slightly positive and keeps decreasing until it becomes negative. This behaviour is quite unexpected since, as far as I can tell, the model is optimised through minimising the predictions negative log likelihood.
In order to further study this situation I've developed the following toy model, which yields training loss of -0.0777 if
mask_zero=False and 0.0217 otherwise. The reader might argue that the amount of epochs is exaggerated, and indeed it is, however I would like to remind that the purpose of this code is to simple reproduce an issue which was observed on a model trained with way more data on only 5 epochs. Furthermore the loss function stays positive if
mask_zero=True even for larger values for
EPOCH. I've tried to investigating this issue by myself by reading the code without Much success... maybe , who is its main author, could point to some direction...
PS: Please notice that my word index on the embedding layer start at 1, hence mask_zero should not change anything...
python import numpy from keras.models import Sequential from keras.layers import Embedding from keras_contrib.layers import CRF from numpy.random import seed seed(1) from tensorflow import set_random_seed set_random_seed(1) def build_dict(items): table = dict() for item in items: if item not in table: table[item] = len(table) + 1 return table def prepare_sequence(sequences, table): prepared = list() for seq in sequences: prep_seq = list() for item in seq: prep_seq.append(table.get(item, -1)) prepared.append(prep_seq) return numpy.asarray(prepared) data = [ ('I went to Chicago from New York yesterday'.split(), 'O O O B_LOC O B_LOC I_LOC O'.split()) ] words_table = build_dict(data) labels_table = build_dict(data) train_x = prepare_sequence([data], words_table) train_y = prepare_sequence([data], labels_table) train_y = numpy.expand_dims(train_y, -1) EPOCHS = 700 EMBED_DIM = 10 print(train_x.shape) model = Sequential() model.add(Embedding(len(words_table) + 1, EMBED_DIM, mask_zero=False)) # Random embedding crf = CRF(len(labels_table) + 1, sparse_target=True) model.add(crf) model.summary() model.compile('adam', loss=crf.loss_function, metrics=[crf.accuracy]) history = model.fit(train_x, train_y, epochs=EPOCHS, validation_data=[train_x, train_y], verbose=0) print(history.history['loss'][-1]) # outputs -0.0777437686920166 if mask_zero=False and 0.02171158790588379 otherwise.
- 点赞 评论 复制链接分享
I just found a bug in CRF code, which is related to computing loss and causing negative loss when
mask_zero=Falseand staying at quite big positive loss when
mask_zero=True. The CRF loss (negative log-likelihood; nlogL) is composed of two parts: one is logZ and the other is energy (E; input/emission energy plus chain/transition energy). Among these two, the code computing logZ had a bug related to pad/mask. logZ is computed by recursion and each recursion computes intermediate term
logS_k = logsumexp(logS_k-1 - E_k). The final logS_L becomes logZ (L means the length of sequence). The code applies mask to computing E_k but still updates logS_k even for padded inputs and this causes negative loss or big positive loss. The code computing logS_k is in step() method in crf.py. In this method, the '
if return_logZ' clause had a bug and need to be modified as follows:
if return_logZ: energy = chain_energy + K.expand_dims(input_energy_t - prev_target_val, 2) new_target_val = K.logsumexp(-energy, 1) # added from here if len(states) > 3: if K.backend() == 'theano': m = states[:, t:(t + 2)] else: m = K.slice(states, [0, t], [-1, 2]) is_valid = K.expand_dims(m[:, 0]) new_target_val = is_valid * new_target_val + (1 - is_valid) * prev_target_val # added until here return new_target_val, [new_target_val, i + 1]
I've checked this solved the issues about negative loss when
mask_zero=Falseand big positive loss when
mask_zero=Trueat Embedding layer. However, this seems to have no effect on the performance of the model learning.点赞 评论 复制链接分享
🎉 Congratulations on solving this issue that has been around for so long! Unfortunately, Keras has been discontinued, but you might try opening a new PR with your fix.点赞 评论 复制链接分享
Found this too点赞 评论 复制链接分享
I also encountered this situation. But I am using CNN to do sequence labeling，so I cannot set mask_zero=True in Keras. When using RNN，the CRF loss is about 5-6, but when using CNN, the loss becomes quite small and then become negative.点赞 评论 复制链接分享
I am facing the exact same issue of . Any updates?点赞 评论 复制链接分享
What if you add a masking layer?点赞 评论 复制链接分享
If you add a masking layer, the problem disappears, if I remember correctly.点赞 评论 复制链接分享
I think, there is a log () in CRF's loss_function, the value of loss will be negative when 0<X<1.点赞 评论 复制链接分享
- 使用keras进行分类问题时，验证集loss,accuracy 显示0.0000e+00，但是最后画图像时能显示出验证曲线
- windows保存输出到文件的命令行程序, 能不能直接输出到python内存?
- Caused by: java.sql.SQLException: ORA-00911: 无效字符