多模态特种融合图文检索代码报错

这个向量数量不匹配怎么解决呀
还是我的方向错误了
可以帮忙看一下嘛谢谢


import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# 读取word_test.csv文件
file_path = "路径"
word_test = pd.read_csv(file_path, encoding='utf-8')

# 图像路径
image_path = "路径"

# 加载预训练的ResNet-50模型
model = ResNet50(weights='imagenet', include_top=False, pooling='avg')

# 图像特征提取函数
def image_feature_extraction(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    features = model.predict(img_array)
    return features.flatten()

# 分批次处理图像
def process_images_in_batches(image_path, batch_size):
    features_list = []
    filenames_list = []
    for img in os.listdir(image_path):
        if len(features_list) >= batch_size:
            features_array = np.vstack(features_list)
            yield features_array, filenames_list
            features_list = []
            filenames_list = []
        
        features = image_feature_extraction(os.path.join(image_path, img))
        features_list.append(features)
        filenames_list.append(img)
    
    if features_list:
        features_array = np.vstack(features_list)
        yield features_array, filenames_list

# 保存文本模型
tfidf = TfidfVectorizer()
word_embeddings = tfidf.fit_transform(word_test['caption'])
np.save('text_model.npy', tfidf)

# 图像特征提取
batch_size = 100
image_features_list = []
for batch_features, _ in process_images_in_batches(image_path, batch_size):
    image_features_list.append(batch_features)

image_features = np.concatenate(image_features_list, axis=0)
np.save('image_features.npy', image_features)

# 多模态特征融合
combined_features = np.hstack((image_features, word_embeddings.toarray()))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = cosine_similarity(image_features, text_embedding)
    top_five_indices = np.argsort(similarities.flatten())[::-1][:5]
    top_five_images = [os.listdir(image_path)[i] for i in top_five_indices]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('路径', index=False)

这段代码的报错问题是


File "c:/Users/wyf/Desktop/泰迪杯/5.py", line 66, in <module>
    combined_features = np.hstack((image_features, word_embeddings.toarray()))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 50000 and the array at index 1 has size 5000
PS C:\Users\wyf>

我试图让特征向量匹配更改了下面代码

#...上面不变
# 调整文本特征向量的行数，使其与图像特征向量的行数匹配
word_embeddings_padded = np.pad(word_embeddings.toarray(), ((0, 45000), (0, 0)), mode='constant')

# 多模态特征融合
combined_features = np.hstack((image_features, word_embeddings_padded))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = cosine_similarity(combined_features, text_embedding)
    top_five_indices = np.argsort(similarities.flatten())[::-1][:5]
    top_five_images = [os.listdir(image_path)[i] for i in top_five_indices]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('C:/Users/wyf/Desktop/泰迪杯/result1.csv', index=False)

旋即又出现了以下报错！
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\utils_param_validation.py", line 214, in wrapper
return func(*args, **kwargs)
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\metrics\pairwise.py", line 1578, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\metrics\pairwise.py", line 190, in check_pairwise_arrays
raise ValueError(
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 13898 while Y.shape[1] == 11850
PS C:\Users\wyf>

我是不是应该改成


#调整向量行数
word_embeddings_padded = np.pad(word_embeddings.toarray(), ((0, 45000), (0, 2038)), mode='constant')

还是我的方向是有问题的呀

PS：我的图片文件5w，文本应该是5k

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
专家-赤兔[在线] 优质创作者: 编程框架技术领域 2024-04-24 10:01
关注
引自免费微信小程序：皆我百晓生

根据您提供的错误信息，看起来在计算cosine_similarity时出现了维度不匹配的问题。在调整文本特征向量的行数时，您需要确保列数也匹配。从错误信息来看，原始的文本特征向量（word_embeddings）有11850列，而图像特征向量（image_features）有13898列。

为了使两个特征向量能正确地进行堆叠（hstack），我们需要对文本特征向量的列数进行调整，而不是仅仅增加行数。您可以尝试将图像特征向量的列数减少到与文本特征向量相同，然后进行堆叠。这里，我们可以使用GlobalAveragePooling2D层来达到降维的效果。

以下是修改后的代码片段：

import tensorflow.keras.layers as layers # 使用GlobalAveragePooling2D层来减少图像特征的维度 pooling_layer = layers.GlobalAveragePooling2D() reduced_image_features = pooling_layer(image_features) # 确保两者的形状匹配 if reduced_image_features.shape[1] != word_embeddings.shape[1]: # 如果仍然不匹配，尝试重新调整列数 reduced_image_features = layers.Flatten()(reduced_image_features) reduced_image_features = layers.Dense(word_embeddings.shape[1], activation=None)(reduced_image_features) # 现在可以安全地堆叠这两个特征向量了 combined_features = np.hstack((reduced_image_features, word_embeddings.toarray())) # 接下来继续执行图像检索部分...

这段代码首先创建了一个GlobalAveragePooling2D层，用于平均池化图像特征，从而降低其维度。之后，检查处理后的图像特征与文本特征的列数是否匹配。如果不匹配，我们进一步使用Flatten层展平图像特征，然后通过一个全连接层将其转换为与文本特征相同的列数。这样应该就能解决您的问题并允许您继续进行多模态特征融合和图像检索。

注意：这个解决方案假设您想保持文本特征的维度不变，并且图像特征可以通过降维来匹配。如果这不是您期望的行为，可能需要重新考虑数据处理流程，例如使用不同的预训练模型或在特征提取阶段进行适当的调整。
解决无用
评论打赏
分享
举报编辑记录

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

基于多模态特征融合的电商图文检索系统
2024-09-02 15:40

星眸逐月华的博客本文围绕图文跨模态检索展开研究，采用Chinese-CLIP预训练模型，对电商图文数据进行训练。在此基础上，引入知识蒸馏技术，对模型进行进一步的优化和增强。同时，应用Prompt条件提示学习，引导模型更好地完成任务。在...
【2024泰迪杯】B 题：基于多模态特征融合的图像文本检索Python代码实现
2024-03-11 18:41

Better Bench的博客 2024 年（第 12 届）“泰迪杯”数据挖掘挑战赛—B 题：基于多模态特征融合的图像文本检索一、问题背景随着近年来智能终端设备和多媒体社交网络平台的飞速发展，多媒体数据呈现海量增长的趋势，使当今主流的社交网络...
多模态图文融合
2021-05-01 15:26

浪里摸鱼的博客目录图文融合的数据集动作情感分析方法表情情感分析语义情感分析图文融合的数据集动作情感分析方法表情情感分析语义情感分析
多模态数据融合技术的理论基础与人工智能的创新应用
2025-02-02 18:48

一键难忘的博客 多模态AI指的是利用多种数据模态（如文本、图像、音频等）进行联合学习的人工智能系统。不同模态的数据通常包含互补的信息，合并这些信息可以帮助AI更好地理解复杂的情境和任务。例如，在视频内容理解中，图像、语音...
深度学习+多模态数据融合，顶刊超神了！
2025-03-07 17:21

深度之眼的博客不过深度学习+多模态数据融合仍然处于快速发展期，数据异构性、模态缺失等问题尚未解决，还是有创新空间的，尤其在解决实际挑战或提出新型融合架构方面。在未来更复杂场景下，显然这类多模态融合技术将成为核心支撑...
多模态融合技术综述和应用
2020-12-03 11:58

龙海L的博客文章目录多模态技术基础1，多模态融合架构（神经网络模型的基本结构形式）1.1联合架构1.2协同架构1.3编解码架构（自监督）2，多模态融合方法2.1早期融合2.2 晚期融合2.3混合融合3，模态对齐方法3.1显式对齐方法3.2...
多模态融合 Multimodal Fusion
2021-06-23 22:03

Slientsakke的博客 多模态机器学习MultiModal Machine Learning (MMML)，旨在通过机器学习...包括多模态表示学习Multimodal Representation，模态转化Translation，对齐Alignment，多模态融合Multimodal Fusion，协同学习Co-learning等
多模态融合：顶级一区idea，创新思路汇总
2024-11-22 17:30

AI科研灵感的博客 多模态融合架构搜索（MFAS）：提出了一种新的搜索空间，涵盖大量可能的融合架构，并通过高效的序贯模型基础探索方法找到给定数据集的最优架构。模态共享和特定信息的利用：首次提出同时利用模态共享和特定信息，以...
深入解析多模态融合技术
2025-05-05 17:05

CarlowZJ的博客本文详细介绍了多模态融合的概念、应用场景、实现方法、代码示例以及注意事项，并通过架构图和流程图帮助读者更好地理解整个过程。希望本文对您有所帮助！如果您有任何问题或建议，欢迎在评论区留言。
【RAG进阶】多模态图片检索：融合文本与图像的高级应用！
2025-01-07 09:30

AI大模型-大飞的博客摘要是基于图片的分析来做的，拿到摘要后可以用来做retrieval 也就是检索。"""content=[},# 使用的是base64编码的图片])# 返回的就图片数据和LLM返回当我们有了类，就可以和上篇文章一样做原图片和summarize的关联。...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 4月24日

多模态特种融合图文检索代码报错

4条回答 默认 最新

问题事件

4条回答默认最新