我是看到了一篇论文上的优化大模型,使用新的方法来处理market1501公共数据集有更高的map和r-1,所以我想拿来试着跑一下比赛数据集,我了解到的market1501公共数据集约2k内存一张,1万张左右,比赛数据集2万张,且内存30k左右,也就是不仅更多而且分辨率更高。
原本论文的训练配置:
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: "../../.cache/torch/hub/checkpoints" # root of pretrain path
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'on'
IF_WITH_CENTER: 'no'
NAME: 'part_attention_vit'
NO_MARGIN: True
DEVICE_ID: ('0')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]
INPUT:
SIZE_TRAIN: [256,128]
SIZE_TEST: [256,128]
REA:
ENABLED: False
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]
LGT: # Local Grayscale Transfomation
DO_LGT: False
PROB: 0.5
DATASETS:
TRAIN: ('Market1501',)
TEST: ("DukeMTMC",)
ROOT_DIR: ('../../data') # root of datasets
DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8
SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 60
BASE_LR: 0.001 # 0.0004 for msmt
IMS_PER_BATCH: 64
WARMUP_METHOD: 'linear'
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 5
LOG_PERIOD: 60
EVAL_PERIOD: 1
WEIGHT_DECAY: 1e-4
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2
SEED: 1234
TEST:
EVAL: True
IMS_PER_BATCH: 128
RE_RANKING: False
WEIGHT: ''
NECK_FEAT: 'before'
FEAT_NORM: True
LOG_ROOT: '../../data/exp/' # root of log file
TB_LOG_ROOT: './tb_log/'
LOG_NAME: 'PAT/market/vit_base'
我直接用这个配置跑比赛数据集,30轮之后出现梯度爆炸,训练失败。
后来我又先调整了一波,进一步降低学习率,同时加上预热,任然出现梯度爆炸代码如下:
MODEL:
PRETRAIN_CHOICE: 'imagenet'
PRETRAIN_PATH: "C:/Users/30145/Downloads/Part-Aware-Transformer-main" # root of pretrain path
METRIC_LOSS_TYPE: 'triplet'
IF_LABELSMOOTH: 'on'
IF_WITH_CENTER: 'no'
NAME: 'part_attention_vit'
NO_MARGIN: True
DEVICE_ID: ('0')
TRANSFORMER_TYPE: 'vit_base_patch16_224_TransReID'
STRIDE_SIZE: [16, 16]
INPUT:
SIZE_TRAIN: [256,128]
SIZE_TEST: [256,128]
REA:
ENABLED: True
PIXEL_MEAN: [0.5, 0.5, 0.5]
PIXEL_STD: [0.5, 0.5, 0.5]
LGT: # Local Grayscale Transfomation
DO_LGT: False
PROB: 0.5
DATASETS:
TRAIN: ('Market1501',)
TEST: ("Market1501",)
ROOT_DIR: ('C:/Users/30145/Downloads/Part-Aware-Transformer-main/data') # root of datasets
DATALOADER:
SAMPLER: 'softmax_triplet'
NUM_INSTANCE: 4
NUM_WORKERS: 8
SOLVER:
OPTIMIZER_NAME: 'SGD'
MAX_EPOCHS: 120
BASE_LR: 0.0004 # 0.0004 for msmt
IMS_PER_BATCH: 64
WARMUP_METHOD: 'linear'
WARMUP_EPOCHS: 10
WARMUP_FACTOR: 0.01
STEPS: [50, 90]
GAMMA: 0.1
LARGE_FC_LR: False
CHECKPOINT_PERIOD: 5
LOG_PERIOD: 60
EVAL_PERIOD: 2
WEIGHT_DECAY: 0.0005
WEIGHT_DECAY_BIAS: 1e-4
BIAS_LR_FACTOR: 2
SEED: 1234
TEST:
EVAL: True
IMS_PER_BATCH: 128
RE_RANKING: False
WEIGHT: ''
NECK_FEAT: 'before'
FEAT_NORM: True
向大家请教一下,还能怎么调,再进一步调小吗?