目前已经实现了 ChatGLM的医疗数据微调,如何进行perplexity 评估对比的代码实现方式呀 cblue或者glue也可以? 或者说这一段代码怎么加到代码里面去执行尼
```python
def evaluate(model, val_dataloader, config):
model.eval()
total_val_loss = 0
with torch.no_grad():
for step, batch in enumerate(val_dataloader):
batch[0].clone().detach().to(config.device)
batch[1].clone().detach().to(config.device)
loss, logits = model(batch[0], token_type_ids=None, attention_mask=(batch[0] > 0), labels=batch[1])
if isinstance(model, torch.nn.DataParallel):
loss = loss.mean()
total_val_loss += loss.mean().item()
loss = total_val_loss / len(val_dataloader)
perplexity = math.exp(loss)
perplexity = torch.tensor(perplexity)
return loss, perplexity
```