下面这段代码来自:
https://github.com/isaaccorley/torchrs#remote-sensing-image-captioning-dataset-rsicd
整个功能是怎样的,最终能生成字幕嘛?
import torchvision.transforms as T
from torchrs.datasets import RSICD
transform = T.Compose([T.ToTensor()])
dataset = RSICD(
root="path/to/dataset/",
split="train", # or 'val', 'test'
transform=transform
)
x = dataset[0]
"""
x: dict(
x: (3, 224, 224)
captions: List[str]
)
"""