m0_52062055 2022-11-24 17:21 采纳率: 30%
浏览 17
已结题

将数据集写入hdf5格式文件时出现bug

将东北大学数据集写入hdf5文件格式过程中出现bug
from config import gray_config as config
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from gaoimage.io import HDF5DatasetWriter
from imutils import paths
import numpy as np
import progressbar
# import json
import cv2
import os

imagePaths = list(paths.list_images(config.IMAGE_PATH))
imageLabels = [p.split(os.path.sep)[-2] for p in imagePaths]
le = LabelEncoder()
imageLabels = le.fit_transform(imageLabels)

# split the original paths to res and test, 240(1440) for res, 60(360) for test
(resPaths, testPaths, resLabels, testLabels) = train_test_split(
    imagePaths, imageLabels, test_size=0.2, random_state=42)

# split the res paths to train and validation, 180(1080) for train, 60(360) for validation
(trainPaths, valPaths, trainLabels, valLabels) = train_test_split(
    resPaths, resLabels, test_size=0.25, random_state=42)

# construct a list pairing the training, validation, and testing
# image paths along with their corresponding labels and output HDF5 files
datasets = [
    ("train", trainPaths, trainLabels, config.TRAIN_HDF5),
    ("val", valPaths, valLabels, config.VAL_HDF5),
    ("test", testPaths, testLabels, config.TEST_HDF5)
]

# initialize the image preprocessor and the list of RGB channel averages
# (R, G, B) = ([], [], [])

# loop over the dataset tuples
for (dType, paths, labels, outputPath) in datasets:
    # create HDF5 writer
    print("[INFO] building {}...".format(outputPath))
    writer = HDF5DatasetWriter((len(paths), 200, 200, 1), outputPath)

    # initialize the progress bar
    widgets = ["Building Dataset: ", progressbar.Percentage(), " ",
               progressbar.Bar(), " ", progressbar.ETA()]
    pbar = progressbar.ProgressBar(maxval=len(paths),
                                   widgets=widgets).start()

    # loop over the image paths
    for (i, (path, label)) in enumerate(zip(paths, labels)):
        # load the image and process it
        image = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        image = np.expand_dims(image, axis=2)

        # if we are building the training dataset, then compute the
        # mean of each channel in the image, then update the respective lists
        # if dType == "train":
        #     (b, g, r) = cv2.mean(image)[:3]
        #     R.append(r)
        #     G.append(g)
        #     B.append(b)

        # add the image and label to the HDF5 dataset
        writer.add([image], [label])
        pbar.update(i)

    # close the HDF5 writer
    pbar.finish()
    writer.close()

from os import path

IMAGE_PATH = "../zhai/dataset1/NEU-CLS/images"


TRAIN_HDF5 = "../zhai/dataset1/NEU-CLS/hdf5/train.hdf5"
VAL_HDF5 = "../zhai/dataset1/NEU-CLS/hdf5/val.hdf5"
TEST_HDF5 = "../zhai/dataset1/NEU-CLS/hdf5/test.hdf5"

OUTPUT_PATH = "gray_output"

figPath = path.sep.join([OUTPUT_PATH, "ms_test1.png"])
jsonPath = path.sep.join([OUTPUT_PATH, "ms_test1.json"])
DATASET_MEAN = "gray_output/NEU_DET_1_mean.json"

报错内容:With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
  • 写回答

4条回答 默认 最新

  • Jackyin0720 2022-11-24 17:50
    关注

    With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
    如果n_samples=0,test_size=0.2,train_size=None,则生成的训练集将为空。调整任何上述参数。
    分析:原数据集训练图片使用的是png格式,实际数据集图片为jpg格式。
    解决思路:
    将训练图片从jpg格式转成png格式。
    参考代码示例:

    import os
    import cv2
    
    
    def transform(input_path, output_path):
        for root, dirs, files in os.walk(input_path):
            for name in files:
                file = os.path.join(root, name)
                print('transform' + name)
                im = cv2.imread(file)
                if output_path:
                    cv2.imwrite(os.path.join(output_path, name.replace('jpg', 'png')), im)
                else:
                    cv2.imwrite(file.replace('jpg', 'png'), im)
    
    
    if __name__ == '__main__':
        input_path = input("请输入目标文件夹: ")
    
        output_path = input("请输入输出文件夹: (回车则输出到原地址)")
        if not os.path.exists(input_path):
            print("文件夹不存在!")
        else:
            print("Start to transform!")
            transform(input_path, output_path)
            print("Transform end!")
    
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 12月2日
  • 创建了问题 11月24日

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!