attention可视化

如果在画模型attention的可视化，就像论文中那些热力图，如何通过颜色的深浅进行可视化？
希望可以分享代码。

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

shiter 人工智能领域优质创作者 2022-06-18 23:19

关注

https://poloclub.github.io/dodrio/

论文题目：DODRIO: Exploring Transformer Models with Interactive Visualization

论文链接：http://arxiv-download.xixiaoyao.cn/pdf/2103.14625.pdf

Github:https://poloclub.github.io/dodrio/

#!/usr/bin/python
# -*- coding:utf-8 -*-

"""
Evaluation Method for summarization tasks, including BLUE and ROUGE score
Visualization of Attention Mask Matrix: plot_attention() method
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys

import matplotlib
matplotlib.use('Agg') # Must be before importing matplotlib.pyplot or pylab!
import matplotlib.pyplot as plt # drawing heat map of attention weights
plt.rcParams['font.sans-serif']=['SimSun'] # set font family

import time

def evaluate(X, Y, method = "rouge_n", n = 2):
  score = 0.0
  if (method == "rouge_n") :
    score = eval_rouge_n(X, Y, n)
  elif (method == "rouge_l"):
    score = eval_rouge_l(X, Y)
  elif (method == "bleu"):
    score = eval_bleu(X, Y, n)
  else:
    print ("method not found")
    score = 0.0
  return score

def eval_bleu(y_candidate, y_reference, n = 2):
  '''
  Args: 
    y_candidate: list of words, machine generated prediction
    y_reference: list of list, [[], [],], human generated referenced line
  Return:
    rouge_n score:double, maximum of pairwise rouge-n score
  '''
  if (type(y_reference[0]) != list):
    print ('y_reference should be list of list')
    return
  m = len(y_reference)
  bleu_score = 0.0
  ngram_cand = generate_ngrams(y_candidate, n)
  total_cand_count = len(ngram_cand)
  ngram_ref_list = [] # list of ngrams for each reference sentence
  for i in range(m): 
    ngram_ref_list.append(generate_ngrams(y_reference[i], n))
  
  total_clip_count = 0
  for tuple in set(ngram_cand):
    # for each unique n-gram tuple in ngram_cand, calculate the clipped count
    cand_count = count_element(ngram_cand, tuple)
    max_ref_count = 0 # max_ref_count for this tuple in the references sentences
    for i in range(m): 
      # tuple count in reference sentence i
      num = count_element(ngram_ref_list[i], tuple)
      max_ref_count = num if max_ref_count < num else max_ref_count # compare max_ref_count and num
    total_clip_count += min(cand_count, max_ref_count)  
  
  bleu_score = total_clip_count/total_cand_count
  return bleu_score

def count_element(list, element):
  if element in list:
    return list.count(element)
  else:
    return 0

def eval_rouge_n(y_candidate, y_reference, n = 2):
  '''
  Args: 
    y_candidate: list of words, machine generated prediction
    y_reference: list of list, [[], [],], human generated referenced line
  Return:
    rouge_n score:double, maximum of pairwise rouge-n score
  '''
  if (type(y_reference[0]) != list):
    print ('y_reference should be list of list')
    return
  
  m = len(y_reference)
  rouge_score = []
  ngram_cand = generate_ngrams(y_candidate, n)
  for i in range(m):
    ngram_ref = generate_ngrams(y_reference[i], n)
    num_match = count_match(ngram_cand, ngram_ref)
    rouge_score.append(num_match/len(ngram_ref))
  return max(rouge_score)

def generate_ngrams(input_list, n):
  '''
  zip(x, x[1:,],x[2,],...x[n,]), end with shorted list
  '''
  return zip(*[input_list[i:] for i in range(n)])

def count_match(listA, listB):
  match_list = [tuple for tuple in listA if tuple in listB]
  return len(match_list)

def eval_rouge_l(y_candidate, y_reference):
  '''
  Args: 
    y_candidate: list of words, machine generated prediction
    y_reference: list of list, [[], [],], human generated referenced line
  Return:
    rouge_l score:double, F1 score of longest common sequence
  '''
  if (type(y_reference[0]) != list):
    print ('y_reference should be list of list')
    return
  K = len(y_reference)
  lcs_count = 0.0
  total_cand = len(y_candidate) # total of candidate words
  total_ref = 0.0  # total of reference words
  
  for k in range(K):
    cur_lcs = LCS(y_candidate, y_reference[k])
    lcs_count += len(cur_lcs)
    total_ref += len(y_reference[k])
  
  recall = lcs_count/total_ref
  precision = lcs_count/total_cand
  beta = 8.0 # coefficient
  f1 = (1 + beta * beta) * precision * recall/(recall + beta * beta * precision)
  return f1

def LCS(X, Y):
  '''Get the element of longest common sequence
  '''
  length, flag = calc_LCS(X, Y)
  common_seq_rev = [] # reverse sequence
  # starting from end of X and Y
  start_token = "START"
  X_new = [start_token] + list(X)
  Y_new = [start_token] + list(Y)
  i = len(X_new) - 1
  j = len(Y_new) - 1
  while(i >= 0 and j >= 0):
    if (flag[i][j] == 1):
      common_seq_rev.append(X_new[i])
      i -= 1
      j -= 1
    elif (flag[i][j] == 2):
      i -= 1   # i -> i-1
    else:
      j -= 1   # flag[i][j] == 3, j -> j-1
  common_seq =[common_seq_rev[len(common_seq_rev) - 1 - i] for i in range(len(common_seq_rev))]
  return common_seq

def calc_LCS(X, Y):
  '''
    Calculate Longest Common Sequence
    Get the length[][] matrix and flag[][] matrix of X and Y;
    length[i][j]: longest common sequence length up to X[i] and Y[j];
    flag[i][j]: path of LCS, (1,2,3) 1: jump diagonal, 2: jump down i-1 ->i, 3: jump right j-1 -> j 
  '''
  start_token = "START"
  X_new = [start_token] + list(X) # adding start token to X sequence
  Y_new = [start_token] + list(Y)
  
  m = len(X_new)
  n = len(Y_new)
  # starting length and flag matrix size : (m + 1) * (n + 1)
  length = [[0 for j in range(n)] for i in range(m)]
  flag = [[0 for j in range(n)] for i in range(m)]
  
  for i in range(1, m):
    for j in range(1, n):
      if (X_new[i] == Y_new[j]): # compare string
        length[i][j] = length[i-1][j-1] + 1
        flag[i][j] = 1 # diagonal
      else:
        if (length[i-1][j] > length[i][j-1]):
          length[i][j] = length[i-1][j]
          flag[i][j] = 2 # (i-1) -> i
        else:
          length[i][j] = length[i][j-1]
          flag[i][j] = 3 # (j-1) -> j
  return length, flag

def plot_attention(data, X_label=None, Y_label=None):
  '''
    Plot the attention model heatmap
    Args:
      data: attn_matrix with shape [ty, tx], cutted before 'PAD'
      X_label: list of size tx, encoder tags
      Y_label: list of size ty, decoder tags
  '''
  fig, ax = plt.subplots(figsize=(20, 8)) # set figure size
  heatmap = ax.pcolor(data, cmap=plt.cm.Blues, alpha=0.9)
  
  # Set axis labels
  if X_label != None and Y_label != None:
    X_label = [x_label.decode('utf-8') for x_label in X_label]
    Y_label = [y_label.decode('utf-8') for y_label in Y_label]
    
    xticks = range(0,len(X_label))
    ax.set_xticks(xticks, minor=False) # major ticks
    ax.set_xticklabels(X_label, minor = False, rotation=45)   # labels should be 'unicode'
    
    yticks = range(0,len(Y_label))
    ax.set_yticks(yticks, minor=False)
    ax.set_yticklabels(Y_label, minor = False)   # labels should be 'unicode'
    
    ax.grid(True)
  
  # Save Figure
  plt.title(u'Attention Heatmap')
  timestamp = int(time.time())
  file_name = 'img/attention_heatmap_' + str(timestamp) + ".jpg"
  print ("Saving figures %s" % file_name)
  fig.savefig(file_name)   # save the figure to file
  plt.close(fig)    # close the figure

def test():
  #strA = "ABCBDAB"
  #strB = "BDCABA" 
  #m = LCS(strA, strB)

  #listA = ['但是','我', '爱' ,'吃', '肉夹馍']
  #listB = ['我', '不是', '很', '爱', '肉夹馍']
  #m = LCS(listA, listB)
  
  y_candidate = ['我', '爱', '吃', '北京', '烤鸭']
  y_reference = [['我', '爱', '吃', '北京', '小吃', '烤鸭'], ['他', '爱', '吃', '北京', '烤鹅'],['但是', '我', '很','爱', '吃', '西湖', '醋鱼']]
  p1 = eval_rouge_l(y_candidate, y_reference)
  print ("ROUGE-L score %f" % p1)
  
  p2 = eval_rouge_n(y_candidate, y_reference, 2)
  print ("ROUGE-N score %f" % p2)
  
  p3 = eval_bleu(y_candidate, y_reference, 2)
  print ("BLEU score %f" % p3)

if __name__ == "__main__":
  test()

报告相同问题？

关注问题

attention注意力机制人工智能神经网络自然语言处理
2022-12-07 17:24

回答 1 已采纳望采纳如果你使用的是一个一维的tensor，那么在计算注意力时，注意力机制会对每个位置进行计算。如果你使用的是二维的tensor，那么注意力机制会对每行的数据进行计算。
SimpleRNN+attention中注意力α的原理？ rnn 人工智能深度学习
2023-01-10 17:13

回答 1 已采纳注意力机制在 RNN 中的应用是用来解决 RNN 在处理长序列时会遗忘早期信息的问题。在 SimpleRNN+attention 模型中，注意力机制通过计算每个时间步的输出和编码器的最终输出之间的相似
Pytorch调用bertEncoderbaTypeError: forward() missing 1 required positional argument: 'attention_mask' bert pytorch 深度学习
2022-07-07 15:35

回答 2 已采纳已解决，根本原因是数据格式的问题，在使用bert_encoder之前，需要将数据格式转换为BertData()格式
场景分类attention可视化
2019-11-19 20:23

carry_hjr的博客 README 对网络的注意力可视化，可以很快的看出网络存在的问题以及可以改进的空间 refer: ...attention可视化的思路: 通过hook获得特征层的grad,维度是(batch, 2048, 8, 8) 然后对每个channel...
这个深度学习图像分割论文审稿意见怎么回复人工智能图像处理神经网络
2022-08-24 23:46

回答 5 已采纳怎么回复你就解释一下选择sigmoid函数的原因，还有实验过程中有没有权衡标准，是否有考虑过审稿意见中提到的idea，没有的话就如实回复会进一步改善，并将实验内容补充进去。
HI3518ev300 HI_MPI_VPSS_SetExtChnAttr 失败 0xA0078003，报错参数设置无 c语言人工智能
2019-09-29 10:25

回答 2 已采纳 http://bbs.ebaina.com/thread-52865-1-1.html
如何在不初始化的情况下给字符数组加\0 c++ c语言开发语言
2020-05-13 12:50

回答 1 已采纳没有问题 n=9，strlen是对的 ! ! ! 123 3个数字，3个感叹号3个空格，other=6，没有字母输出都对的
推荐一个可交互的 Attention 可视化工具！我的Transformer可解释性有救啦？
2021-06-29 11:49

夕小瑶的博客我们看论文的时候，通过图表来确定文章的大致内容往往也是一个更高效的说到深度神经网络的可视化，最经典的莫过于的CNN密恐图了：这种可视化方法可以让我们清晰的知道每一层的每一个Cell关注何种信息，模型最后是...
PHP加入MySQL表，输出格式化JSON json mysql php
2015-01-29 17:03

回答 1 已采纳 Push_back each task inside your loop: while(...fetching) { if (!isset($output[$row['id']])) {
Brilliant Programmers Show
2017-09-29 08:14

回答 1 已采纳 http://www.tceic.com/905g5857ii22hij127ll0i8j.html
有人复现论文Semantic Grouping Network for Video Captioning吗深度学习计算机视觉
2021-09-09 01:46

回答 2 已采纳只要按照步骤来，可以复现
【PyTorch】6 法语英语翻译RNN实战——基于Attention的seq2seq模型、Attention可视化
2021-03-08 09:53

Yang SiCheng的博客 可视化注意力 11. 全部代码小结这是官方NLP From Scratch的一个教程（3/3），原中文链接，在这个项目中，我们将搭建神经网络，将法语翻译成英语，本文是其详细的注解 1. 本文工作编码器网络将输入序列压缩为一个...
create-react-app创建项目默认代码报错 npm react.js 前端
2022-06-12 13:51

回答 3 已采纳删除node_modules 重新安装一下。用yarn安装
可交互的 Attention 可视化工具！我的Transformer可解释性有救了？
2021-05-09 00:38

夕小瑶的博客文 | Sherry视觉是人和动物最重要的感觉，至少有80%以上的外界信息是经过视觉获得的。我们看论文的时候，通过图表来确定文章的大致内容往往也是一个更高效的说到深度神经网络的可视化，最...
可交互的 Attention 可视化工具！我的Transformer可解释性有救了？.rar
2023-10-18 18:01

可交互的 Attention 可视化工具！我的Transformer可解释性有救了？.rar
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 6月18日

悬赏问题

¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog
¥15 Excel发现不可读取的内容
¥15 关于#stm32#的问题：CANOpen的PDO同步传输问题

attention可视化

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新