图文检索模型内存报错！用的ResNet50训练模型

处理完所有向量等待的时间太痛苦了！！因为有五万张照片
结果最后报错了我应该怎么优化一下呀
这个保存向量是我刚刚加的如果有错希望可以帮我一起改正
麻烦了！


import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# 训练权重
num_classes = 1000

# 构建ResNet50模型
base_model = ResNet50(weights=None, include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

# 加载预训练的ResNet-50模型
base_model = ResNet50(weights=None, include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

# 读取word_test.csv文件
file_path = "C:/Users/wyf/Desktop/泰迪杯/B题-全部数据/B题-数据/附件2/word_test.csv"
word_test = pd.read_csv(file_path, encoding='utf-8')

# 图像路径
image_path = "C:/Users/wyf/Desktop/泰迪杯/B题-全部数据/B题-数据/附件2/ImageData"

# 图像特征提取
model = ResNet50(weights='imagenet', include_top=False)

def image_feature_extraction(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    features = model.predict(img_array)
    return features.flatten()

# 文本特征提取
tfidf = TfidfVectorizer()
word_embeddings = tfidf.fit_transform(word_test['caption'])

# 保存图像特征向量
image_features = np.array([image_feature_extraction(os.path.join(image_path, img)) for img in os.listdir(image_path)])
np.save('image_features.npy', image_features)

# 保存文本特征向量
pd.DataFrame(word_embeddings.toarray()).to_csv('word_embeddings.csv', index=False)

# 多模态特征融合
image_features = np.array([image_feature_extraction(os.path.join(image_path, img)) for img in os.listdir(image_path)])
combined_features = np.hstack((image_features, word_embeddings.toarray()))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = []

    for i, feature in enumerate(combined_features):
        similarity = cosine_similarity([feature], text_embedding)
        similarities.append((i, similarity))

    similarities.sort(key=lambda x: x[1], reverse=True)
    top_five_images = [os.listdir(image_path)[i] for i, _ in similarities[:5]]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('result1.csv', index=False)

下面是我的报错

...

1/1 [==============================] - 0s 60ms/step
1/1 [==============================] - 0s 56ms/step
Traceback (most recent call last):
  File "c:/Users/wyf/Desktop/泰迪杯/B题-示例数据/2.py", 
line 53, in <module>
    image_features = np.array([image_feature_extraction(os.path.join(image_path, img)) for img in os.listdir(image_path)])
numpy.core._exceptions.MemoryError: Unable to allocate 18.7 GiB for an array with shape (50000, 100352) and data type float32

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

8条回答默认最新

专家-赤兔[在线] 优质创作者: 编程框架技术领域 2024-04-23 14:00

关注

引自免费微信小程序：皆我百晓生

在你的代码中，内存错误是因为尝试一次性加载所有的图像特征到内存中。为了解决这个问题，你可以采用分批处理的方式，每次处理一部分图像，而不是一次性处理所有图像。

这里是一个修改后的版本，使用了生成器函数来逐批处理图像特征的计算，这样就可以避免一次性加载所有特征到内存中：

import concurrent.futures
import glob
import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# 训练权重
num_classes = 1000

# 构建ResNet50模型
base_model = ResNet50(weights=None, include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

# 读取word_test.csv文件
file_path = "C:/Users/wyf/Desktop/泰迪杯/B题-全部数据/B题-数据/附件2/word_test.csv"
word_test = pd.read_csv(file_path, encoding='utf-8')

# 图像路径
image_path = "C:/Users/wyf/Desktop/泰迪杯/B题-全部数据/B题-数据/附件2/ImageData"

# 文本特征提取
tfidf = TfidfVectorizer()
word_embeddings = tfidf.fit_transform(word_test['caption'])

# 图像特征提取生成器
def batch_image_feature_extraction(image_dir, batch_size=1000):
    images = glob.glob(os.path.join(image_dir, '*'))
    for start in range(0, len(images), batch_size):
        end = min(start + batch_size, len(images))
        batch_images = images[start:end]
        batch_features = np.array([image_feature_extraction(img) for img in batch_images])
        yield batch_features

# 保存图像特征向量（分批处理）
batch_size = 1000
image_features_list = []
for batch_features in batch_image_feature_extraction(image_path, batch_size=batch_size):
    image_features_list.append(batch_features)
image_features = np.concatenate(image_features_list)
np.save('image_features.npy', image_features)

# 保存文本特征向量
pd.DataFrame(word_embeddings.toarray()).to_csv('word_embeddings.csv', index=False)

# 多模态特征融合
combined_features = np.hstack((image_features, word_embeddings.toarray()))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = []

    for i, feature in enumerate(combined_features):
        similarity = cosine_similarity([feature], text_embedding)
        similarities.append((i, similarity))

    similarities.sort(key=lambda x: x[1], reverse=True)
    top_five_images = [os.listdir(image_path)[i] for i, _ in similarities[:5]]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('result1.csv', index=False)

def image_feature_extraction(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    features = model.predict(img_array)
    return features.flatten()

这段代码将图像特征的提取改为了一个生成器函数batch_image_feature_extraction，它会按批次处理图像，并返回特征数组。通过设置合适的batch_size（例如，这里是1000），可以减少对内存的需求。同时，我也调整了保存图像特征向量的部分，使其能适应新的生成器函数。这样应该可以解决你的内存问题。

编辑记录

报告相同问题？

关注问题

【AI企业】【信息科学与工程学】计算机科学与自动化第八十篇人工智能数学方程式16 千万级token的大语言模型01
2026-03-21 05:18

flyair_China的博客步骤1：问题形式化与目标定义目标：设计一个支持亿级token上下文的大语言模型（LLM），在推理时能实现秒级处理10万级token，并保障上下文一致性、逻辑相关性、顺序性、可用性等。数学形式化：设上下文长度为 L...
转行 AI 大模型必看：从零基础到入行的学习指南（附学习资源）
2025-08-21 14:03

冻感糕人~的博客转行 AI 大模型必看：从零基础到入行的学习指南（附学习资源）
用deepseek学大模型04-模型与网络
2025-02-14 19:24

wyg_031113的博客目前已经学完深度学习的数学基础，开始学习各种模型和网络阶段，给出一个从简单到入门的，层层递进的学习路线。并给出学习每种模型需要的前置知识。增加注意力机制，bert, 大...并分析每种模型的使用场景，优缺点。
51c大模型~合集157
2025-07-21 19:59

whaosoft-143的博客首先，研究者从模型拒绝回答不安全输入的响应中，统计出一组高频出现的、具有明确拒绝语义的 token（如 “sorry”, “unable”, “unfortunately” 等），并利用 one-hot 编码的方式，在词汇空间中构造出一个 “拒绝...
RTX4090驱动视觉语言大模型优化广告短视频创作部署教程
2025-09-28 03:11

胡匪的博客本文介绍如何利用RTX4090显卡部署视觉语言大模型，优化广告短视频的智能生成流程，涵盖驱动配置、模型量化、推理加速及端到端系统构建，实现高效本地化AI内容创作。
借助RTX4090的ChatGPT多语言大模型优化工业仿真部署案例
2025-09-28 07:25

来自日本的亮仔的博客本文探讨基于RTX4090的多语言大模型在工业仿真中的本地化部署，涵盖技术架构、性能优化与跨语言应用，实现从语义解析到仿真脚本生成的智能闭环。
51c大模型~合集143
2025-06-22 16:44

whaosoft-143的博客最近，我们撰写并发布了第一篇系统性的 SAE 综述文章，对该领域的技术、...在 ChatGPT 等大语言模型（LLMs）席卷全球的今天，越来越多的研究者意识到：我们需要的不只是 “会说话” 的 LLM，更是 “能解释” 的 LLM。
51c大模型~合集173
2025-08-25 14:19

whaosoft-143的博客另外我们也在 dLLM 上用 RL 训练推理模型（Sandwiched Policy Gradient），也有在小模型上学习推理的尝试（MobileLLM-R1）。在可解释性方面，Grokking（顿悟）这个方向我大概两年前就在关注了。因为之前我做表征...
51c大模型~合集184
2025-09-19 19:00

whaosoft-143的博客针对精准抑制大模型行为这一挑战问题，未来可进一步与强化学习算法融合，构建混合优化框架，例如利用逆学习思想高效抑制不期望行为，同时引导模型学习更优的替代策略，以填补行为抑制后的策略空缺并增强决策的鲁棒性...
51c大模型~合集139
2025-06-14 00:47

whaosoft-143的博客该研究的主要贡献是一组名为 PixMo 的新数据集，其中包括一个用于预训练的高精度图像字幕数据集、一个用于微调的自由格式图像问答数据集以及一个创新的 2D 指向数据集，所有这些数据集均无需使用外部 VLM 即可收集。...
51c大模型~合集151
2025-07-08 15:57

whaosoft-143的博客 KAG 框架 V0.8 版本为 Thinker 模型应用提供支持，融入 KAG 框架后的 Thinker 模型， Math、Deduce 都使用框架中的求解器进行求解，再用 Thinker 模型进行答案汇总，可以看到 KAG-Thinker 7B 的平均 EM 和 F1 性能...
【保姆级】从 AI 小白到全栈专家：这份可落地系统学习指南，帮你少走 3 年弯路
2025-08-29 09:54

大模型研究院的博客避坑关键：避开“过拟合”与“欠拟合” 过拟合是“模型太复杂，死记硬背训练数据”（比如训练时准确率99%，测试时准确率只有60%），欠拟合是“模型太简单，没学会核心规律”（训练和测试准确率都只有50%）。...
大模型应用理论与实战（第五部分多模态大模型AI实战训练营）
2026-01-03 14:52

天涯明月1993的博客多模态大模型（MLLMs）实现了从单模态处理到跨模态统一认知的范式变革，其发展经历了四个关键阶段：概念探索期（1970s-2000s）的模块化尝试、任务突破期（2000s-2018）的专用系统构建、预训练崛起期（2018-2021）的...
Qwen3-VL跨模态检索实战：小白友好镜像，2块钱玩一下午
2026-01-15 02:31

violetgrove43的博客本文介绍了基于星图GPU平台自动化部署Qwen3-VL-2B-Instruct镜像的完整实践，该镜像专为跨模态检索设计，支持图文混合搜索、视频内容定位等场景。用户无需技术背景，仅需几分钟即可启动一个智能文献检索系统，适用于...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 4月23日

图文检索模型内存报错！用的ResNet50训练模型

8条回答 默认 最新

问题事件

8条回答默认最新