为什么transformer的embedding和位置编码后都要有一个pos_drop?
self.pos_drop = nn.Dropout(p=drop_rate)
x = self.embedding(x)
x += self.pos_embed
x = self.pos_drop(x)
为什么transformer的embedding和位置编码后都要有一个pos_drop?
self.pos_drop = nn.Dropout(p=drop_rate)
x = self.embedding(x)
x += self.pos_embed
x = self.pos_drop(x)