FCN语义分割模型 cityscapes数据集语义分割图像处理

我现在正在使用预训练完毕的fcn模型对cityscapes数据集进行预测。该模型由* https://github.com/pytorch/vision/tree/main/torchvision/models/segmentation的源码修改而得，数据集也从PASCAL VOC2012数据集换成了cityscapes进行语义分割，以下是我的代码：
fnc_models.py


from collections import OrderedDict

from typing import Dict

import torch
from torch import nn, Tensor
from torch.nn import functional as F
from .backbone import resnet50, resnet101


class IntermediateLayerGetter(nn.ModuleDict):
    """
    Module wrapper that returns intermediate layers from a model

    It has a strong assumption that the modules have been registered
    into the model in the same order as they are used.
    This means that one should **not** reuse the same nn.Module
    twice in the forward if you want this to work.

    Additionally, it is only able to query submodules that are directly
    assigned to the model. So if `model` is passed, `model.feature1` can
    be returned, but not `model.feature1.layer2`.

    Args:
        model (nn.Module): model on which we will extract the features
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
    """
    _version = 2
    __annotations__ = {
        "return_layers": Dict[str, str],
    }

    def __init__(self, model: nn.Module, return_layers: Dict[str, str]) -> None:
        if not set(return_layers).issubset([name for name, _ in model.named_children()]):
            raise ValueError("return_layers are not present in model")
        orig_return_layers = return_layers
        return_layers = {str(k): str(v) for k, v in return_layers.items()}

        # 重新构建backbone，将没有使用到的模块全部删掉
        layers = OrderedDict()
        for name, module in model.named_children():
            layers[name] = module
            if name in return_layers:
                del return_layers[name]
            if not return_layers:
                break

        super(IntermediateLayerGetter, self).__init__(layers)
        self.return_layers = orig_return_layers

    def forward(self, x: Tensor) -> Dict[str, Tensor]:
        out = OrderedDict()
        for name, module in self.items():
            x = module(x)
            if name in self.return_layers:
                out_name = self.return_layers[name]
                out[out_name] = x
        return out


class FCN(nn.Module):
    """
    Implements a Fully-Convolutional Network for semantic segmentation.

    Args:
        backbone (nn.Module): the network used to compute the features for the model.
            The backbone should return an OrderedDict[Tensor], with the key being
            "out" for the last feature map used, and "aux" if an auxiliary classifier
            is used.
        classifier (nn.Module): module that takes the "out" element returned from
            the backbone and returns a dense prediction.
        aux_classifier (nn.Module, optional): auxiliary classifier used during training
    """
    __constants__ = ['aux_classifier']

    def __init__(self, backbone, classifier, aux_classifier=None):
        super(FCN, self).__init__()
        self.backbone = backbone
        self.classifier_new = classifier
        self.aux_classifier_new = aux_classifier

    def forward(self, x: Tensor) -> Dict[str, Tensor]:
        input_shape = x.shape[-2:]
        # contract: features is a dict of tensors
        features = self.backbone(x)

        result = OrderedDict()
        x = features["out"]
        x = self.classifier_new(x)
        # 原论文中虽然使用的是ConvTranspose2d，但权重是冻结的，所以就是一个bilinear插值
        x = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False)
        result["out"] = x

        if self.aux_classifier_new is not None:
            x = features["aux"]
            x = self.aux_classifier_new(x)
            # 原论文中虽然使用的是ConvTranspose2d，但权重是冻结的，所以就是一个bilinear插值
            x = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False)
            result["aux"] = x

        return result


# class FCNHead(nn.Sequential):
#     def __init__(self, in_channels, channels):
#         inter_channels = in_channels // 4
#         layers = [
#             nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
#             nn.BatchNorm2d(inter_channels),
#             nn.ReLU(),
#             nn.Dropout(0.1),
#             nn.Conv2d(inter_channels, channels, 1)
#         ]

#         super(FCNHead, self).__init__(*layers)


# def fcn_resnet50(aux, num_classes=21, pretrain_backbone=False):
#     # 'resnet50_imagenet': 'https://download.pytorch.org/models/resnet50-0676ba61.pth'
#     # 'fcn_resnet50_coco': 'https://download.pytorch.org/models/fcn_resnet50_coco-1167a1af.pth'
#     backbone = resnet50(replace_stride_with_dilation=[False, True, True])

#     if pretrain_backbone:
#         # 载入resnet50 backbone预训练权重
#         backbone.load_state_dict(torch.load("resnet50.pth", map_location='cpu'))

#     out_inplanes = 2048
#     aux_inplanes = 1024

#     return_layers = {'layer4': 'out'}
#     if aux:
#         return_layers['layer3'] = 'aux'
#     backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

#     aux_classifier = None
#     # why using aux: https://github.com/pytorch/vision/issues/4292
#     if aux:
#         aux_classifier = FCNHead(aux_inplanes, num_classes)

#     classifier = FCNHead(out_inplanes, num_classes)

#     model = FCN(backbone, classifier, aux_classifier)

#     return model


# def fcn_resnet101(aux, num_classes=21, pretrain_backbone=False):
#     # 'resnet101_imagenet': 'https://download.pytorch.org/models/resnet101-63fe2227.pth'
#     # 'fcn_resnet101_coco': 'https://download.pytorch.org/models/fcn_resnet101_coco-7ecb50ca.pth'
#     backbone = resnet101(replace_stride_with_dilation=[False, True, True])

#     if pretrain_backbone:
#         # 载入resnet101 backbone预训练权重
#         backbone.load_state_dict(torch.load("resnet101.pth", map_location='cpu'))

#     out_inplanes = 2048
#     aux_inplanes = 1024

#     return_layers = {'layer4': 'out'}
#     if aux:
#         return_layers['layer3'] = 'aux'
#     backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

#     aux_classifier = None
#     # why using aux: https://github.com/pytorch/vision/issues/4292
#     if aux:
#         aux_classifier = FCNHead(aux_inplanes, num_classes)

#     classifier = FCNHead(out_inplanes, num_classes)

#     model = FCN(backbone, classifier, aux_classifier)

#     return model


class FCNHead(nn.Sequential):
    def __init__(self, in_channels, out_channels, num_classes):
        inter_channels = in_channels // 4
        layers = [
            nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
            nn.BatchNorm2d(inter_channels),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Conv2d(inter_channels, num_classes, 1)
        ]

        super(FCNHead, self).__init__(*layers)
        
        
def fcn_resnet50(aux, num_classes=21, pretrain_backbone=False):
    # 'resnet50_imagenet': 'https://download.pytorch.org/models/resnet50-0676ba61.pth'
    # 'fcn_resnet50_coco': 'https://download.pytorch.org/models/fcn_resnet50_coco-1167a1af.pth'
    backbone = resnet50(replace_stride_with_dilation=[False, True, True])

    if pretrain_backbone:
        # 载入resnet50 backbone预训练权重
        backbone.load_state_dict(torch.load("resnet50.pth", map_location='cpu'))

    out_inplanes = 2048
    aux_inplanes = 1024

    return_layers = {'layer4': 'out'}
    if aux:
        return_layers['layer3'] = 'aux'
    backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

    aux_classifier = None
    # why using aux: https://github.com/pytorch/vision/issues/4292
    if aux:
        aux_classifier = FCNHead(aux_inplanes, num_classes)


    classifier = FCNHead(out_inplanes=2048, num_classes=num_classes)  # 添加 num_classes 参数
 
    model = FCN(backbone, classifier, aux_classifier)


    return model


def fcn_resnet101(aux, num_classes=21, pretrain_backbone=False):
    # 'resnet101_imagenet': 'https://download.pytorch.org/models/resnet101-63fe2227.pth'
    # 'fcn_resnet101_coco': 'https://download.pytorch.org/models/fcn_resnet101_coco-7ecb50ca.pth'
    backbone = resnet101(replace_stride_with_dilation=[False, True, True])

    if pretrain_backbone:
        # 载入resnet101 backbone预训练权重
        backbone.load_state_dict(torch.load("resnet101.pth", map_location='cpu'))

    out_inplanes = 2048
    aux_inplanes = 1024

    return_layers = {'layer4': 'out'}
    if aux:
        return_layers['layer3'] = 'aux'
    backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

    aux_classifier = None
    # why using aux: https://github.com/pytorch/vision/issues/4292
    if aux:
        aux_classifier = FCNHead(aux_inplanes, num_classes)


    classifier = FCNHead(out_inplanes, num_classes=num_classes)  # 添加 num_classes 参数
 
    model = FCN(backbone, classifier, aux_classifier)


    return model

predict.py如下：

import os
import time
import json

import torch
from torchvision import transforms
import numpy as np
from PIL import Image

from src import fcn_resnet50


def time_synchronized():
    torch.cuda.synchronize() if torch.cuda.is_available() else None
    return time.time()


def main():
    aux = False  # inference time not need aux_classifier
    classes = 19
    weights_path = "/root/autodl-tmp/test/fcn_4_24/fcn/save_weights/best1-model_eval1_city_69_270.pth"
    img_path = "./test1.png"
    palette_path = "./palette.json"
    assert os.path.exists(weights_path), f"weights {weights_path} not found."
    assert os.path.exists(img_path), f"image {img_path} not found."
    assert os.path.exists(palette_path), f"palette {palette_path} not found."
    with open(palette_path, "rb") as f:
        pallette_dict = json.load(f)
        pallette = []
        for v in pallette_dict.values():
            pallette += v

    # get devices
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    # create model
    model = fcn_resnet50(aux=aux, num_classes=21)


    # delete weights about aux_classifier
    weights_dict = torch.load(weights_path, map_location='cpu')['model']
    for k in list(weights_dict.keys()):
        if "aux" in k:
            del weights_dict[k]

    # load weights
    model.load_state_dict(weights_dict)
    model.to(device)

    # load image
    original_img = Image.open(img_path)

    # from pil image to tensor and normalize
#     data_transform = transforms.Compose([transforms.Resize(512),
#                                          transforms.ToTensor(),
#                                          transforms.Normalize(mean=(0.485, 0.456, 0.406),
#                                                               std=(0.229, 0.224, 0.225))])

# 修改这里
    data_transform = transforms.Compose([
        transforms.Resize((512, 1024)),  # 将图像调整为512x1024
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.485, 0.456, 0.406),
                             std=(0.229, 0.224, 0.225))])
    
    
    
    img = data_transform(original_img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    model.eval()  # 进入验证模式
    with torch.no_grad():
        # init model
        img_height, img_width = img.shape[-2:]
        init_img = torch.zeros((1, 3, img_height, img_width), device=device)
        model(init_img)

        t_start = time_synchronized()
        output = model(img.to(device))
        t_end = time_synchronized()
        print("inference time: {}".format(t_end - t_start))
        print("fps: {}".format(1//(t_end - t_start)))

        prediction = output['out'].argmax(1).squeeze(0)
        prediction = prediction.to("cpu").numpy().astype(np.uint8)
        mask = Image.fromarray(prediction)
        mask.putpalette(pallette)
        mask.save("test_result.png")


if __name__ == '__main__':
    main()

报错如下，，cityscapes的数据集我将它分成了19类，请帮我修改代码

(py38) root@autodl-container-3132448cc7-14f214ad:~/autodl-tmp/test/fcn_4_24/fcn# python predict-Copy1.py
using cuda:0 device.
Traceback (most recent call last):
  File "predict-Copy1.py", line 94, in <module>
    main()
  File "predict-Copy1.py", line 38, in main
    model = fcn_resnet50(aux=aux, num_classes=21)
  File "/root/autodl-tmp/test/fcn_4_24/fcn/src/fcn_model.py", line 216, in fcn_resnet50
    classifier = FCNHead(out_inplanes=2048, num_classes=num_classes)  # 添加 num_classes 参数
TypeError: __init__() got an unexpected keyword argument 'out_inplanes'

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

专家-赤兔[在线] 优质创作者: 编程框架技术领域 2024-04-27 10:45

关注

引自免费微信小程序：皆我百晓生

看起来你在尝试加载一个预训练的FCN模型，并对其进行微调以适应Cityscapes数据集的19个类别。在提供的代码中，错误提示是因为在创建FCNHead时传递了out_inplanes参数，但这个参数并未定义在FCNHead类的构造函数中。

要解决这个问题，我们需要移除FCNHead构造函数中的out_inplanes参数。由于你已经正确地设置了num_classes参数，这应该足以初始化分类器。下面是修改后的FCNHead类和相应的fcn_resnet50函数：

class FCNHead(nn.Sequential):
    def __init__(self, in_channels, num_classes):
        inter_channels = in_channels // 4
        layers = [
            nn.Conv2d(in_channels, inter_channels, 3, padding=1, bias=False),
            nn.BatchNorm2d(inter_channels),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Conv2d(inter_channels, num_classes, 1)
        ]

        super(FCNHead, self).__init__(*layers)


def fcn_resnet50(aux, num_classes=21, pretrain_backbone=False):
    # 'resnet50_imagenet': 'https://download.pytorch.org/models/resnet50-0676ba61.pth'
    # 'fcn_resnet50_coco': 'https://download.pytorch.org/models/fcn_resnet50_coco-1167a1af.pth'
    backbone = resnet50(replace_stride_with_dilation=[False, True, True])

    if pretrain_backbone:
        # 载入resnet50 backbone预训练权重
        backbone.load_state_dict(torch.load("resnet50.pth", map_location='cpu'))

    out_inplanes = 2048
    aux_inplanes = 1024

    return_layers = {'layer4': 'out'}
    if aux:
        return_layers['layer3'] = 'aux'
    backbone = IntermediateLayerGetter(backbone, return_layers=return_layers)

    aux_classifier = None
    # why using aux: https://github.com/pytorch/vision/issues/4292
    if aux:
        aux_classifier = FCNHead(aux_inplanes, num_classes)


    classifier = FCNHead(num_classes=num_classes)  # 移除了 out_inplanes 参数

    model = FCN(backbone, classifier, aux_classifier)


    return model

然后，在predict.py文件中，确保其他部分仍然与新模型兼容。现在，你应该可以成功地加载模型并进行预测。如果你在运行新的代码后遇到任何其他问题，请随时告诉我。

编辑记录

报告相同问题？

关注问题

综述 | 语义分割经典网络及轻量化模型盘点
2022-05-22 10:05

小白学视觉的博客点击上方“小白学视觉”，选择加"星标"或“置顶”重磅干货，第一时间送达基于图像的语义分割又被理解为密集的像素预测，即将每个像素进行分类，这不仅仅对于算法是一个考验，而且对于硬件的计算性能也有很高的要求。...
PaddlePaddle语义分割DeepLabv3+实战教程
2025-12-26 07:17

无声远望的博客借助PaddlePaddle与PaddleSeg，快速实现DeepLabv3+语义分割模型的训练与部署。从ASPP多尺度感知到解码器细节恢复，结合工业级工具链支持动静态切换、模型压缩与多场景落地，显著降低开发门槛，助力自动驾驶、医疗...
语义分割综述
2021-10-04 00:00

3Ｄ视觉工坊的博客前言本文对语义分割相关重要论文进行了简要概述，介绍了它们的主要改进方法和改进效果，并提供了这些论文的下载方式。作者：Derrick Mwiti编译：CV技术指南语义分割 (Semanti...
毕业设计项目，基于深度学习的实时语义分割算法研究，python实现。.zip
2024-03-28 19:54

在这个毕业设计项目中，主题聚焦于利用深度学习技术进行实时语义分割的算法研究，主要通过Python编程语言来实现。语义分割是计算机视觉领域的一个关键任务，它旨在将图像中的每个像素分配到预定义的类别中，如背景、...
*基于类平衡自我训练的无监督域自适应用于语义分割
2019-01-03 21:02

初学者的历练的博客基于类平衡自我训练的无监督域自适应用于语义分割 摘要：最近的深度网络实现了最先进的性能在各种语义分割任务中。尽管有这样的进步，但是这些模型在现实世界中面临挑战，它们存在很大的差别在已标签训练/源文件和...
深度学习与计算机视觉教程(14) | 图像分割 (FCN,SegNet,U-Net,PSPNet,DeepLab,RefineNet)（CV通关指南·完结）
2022-06-11 11:24

ShowMeAI的博客本文讲解了图像语义分割的定义，常见应用（自动驾驶、医学影像诊断），评估指标（mIoU、mAcc），典型语义分割算法等【对应 CS231n Lecture 11】
计算机视觉论文总结系列（二）：图像分割篇
2023-03-29 10:58

GoAI的博客本系列主要面向计算机视觉目标检测、图像分割及OCR等领域论文总结，每章将分别从最新方法、开源框架、模型、等方面展开介绍，主要面向深度学习CV方向同学学习，希望大家能够多多交流，欢迎订阅本专栏，如有错误请...
基于深度学习的语义分割综述
2021-03-29 01:00

小白学视觉的博客点击上方“小白学视觉”，选择加"星标"或“置顶”重磅干货，第一时间送达本文转自 | 计算机视觉工坊摘要图像分割是图像处理和计算机视觉领域的一个重要课题，在场景理解、医学...
计算机视觉竞赛技巧总结（二）：图像分割基础篇
2023-03-25 10:05

GoAI的博客本系列主要面向计算机视觉目标检测、图像分割及OCR等领域进行竞赛总结，本文为第二篇，主要介绍图像分割领域知识，分别从概述、开源框架、模型选择、常用Tricks等方面展开介绍，主要面向深度学习CV方向同学学习，...
语义分割入门
2025-12-26 08:51

天一生水water的博客本文系统介绍了语义分割技术的核心概念、经典算法和实践方法。主要内容包括：1）语义分割的定义与计算机视觉其他任务的对比；2）深度学习语义分割关键算法如FCN、U-Net、DeepLab系列原理；3）基于PyTorch实现U-Net...
汇总|实时性语义分割算法（全）
2020-10-27 07:00

3Ｄ视觉工坊的博客点击上方“3D视觉工坊”，选择“星标”干货第一时间送达我们在上篇——汇总|实时性语义分割算法（上篇）中，已经总结了【1】~【12】，这里我们继续。【13】用于实时语义分割的双向分割网络《...
综述 | 实例分割研究
2022-07-19 12:00

3Ｄ视觉工坊的博客作者丨youtober@知乎来源丨https://zhuanlan.zhihu.com/p/412675982编辑丨极市平台摘要在计算机视觉领域，实例...本文综述基于实例分割的最新进展和发展历程，首先介绍了实例分割的基本逻辑,总结了目前主要研究方法...
深度卷积网络，多孔卷积和全连接条件随机场的图像语义分割
2018-06-24 12:32

GL_a_的博客深度卷积网络，多孔卷积和全连接条件随机场的图像语义分割DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFsTaylor Guo, 2017年5月03日星期三...
实例分割研究综述总结
2021-12-06 07:00

3Ｄ视觉工坊的博客作者丨youtober@知乎（已授权）来源丨https://zhuanlan.zhihu.com/p/412675982编辑丨极市平台导读本文综述基于实例分割的最新进展和发展历程，首先介...
语义分割：道路像素标注工具
2024-06-18 09:36

郎轶诺的博客本项目提供了一个基于全卷积网络（Fully Convolutional Network, FCN）的语义分割工具，专门用于标注图像中的道路像素。通过使用预训练的VGG16模型，该项目能够高效地对道路图像进行分割，生成精确的像素级标注结果...
PyTorch-CUDA-v2.6镜像如何实现语义分割任务？DeepLabV3+实战
2025-12-29 03:18

SS VANES的博客利用PyTorch-CUDA-v2.6容器镜像高效部署DeepLabV3+语义分割模型，实现从环境配置、多卡训练到ONNX导出的全流程加速。结合空洞卷积与ASPP模块，在保留空间细节的同时提升推理精度，适用于自动驾驶与医学影像等高要求...
业内首个动静统一的图像分割套件，模型精度全面领先，最高可达87%
2021-02-09 18:38

飞桨PaddlePaddle的博客点击左上方蓝字关注我们图像分割技术是计算机视觉领域的一个重要的研究方向，对于广大AI开发者来说可谓耳熟能详！对于如此重要的领域，早在2019年开源的图像分割套件PaddleSeg已经广泛被...
DeepLab：深度卷积网络，多孔卷积和全连接条件随机场的图像语义分割 Semantic Image Segmentation with Deep Convolutional Nets, Atro
2017-05-04 19:08

新新大熊的博客深度卷积网络，多孔卷积和全连接条件随机场的图像语义分割 DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs 本文的主要任务是深度学习的...
目标检测、图像分类、语义分割：核心区别+技术选型+实战场景
2025-12-20 10:31

AI规划师-南木的博客图像分类、目标检测、语义分割的核心区别，本质是对图像信息的“理解粒度”和“输出目标”不同——从“全局类别”到“边界框定位”，再到“像素级分割”，技术难度和应用场景逐步聚焦，计算成本也逐步提升。...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 4月27日

FCN语义分割模型 cityscapes数据集 语义分割 图像处理

4条回答 默认 最新

问题事件

FCN语义分割模型 cityscapes数据集语义分割图像处理

4条回答默认最新