璠宝今天写代码了吗 2024-04-24 10:01 采纳率: 0%
浏览 26

多模态特种融合图文检索代码报错

这个向量数量不匹配怎么解决呀
还是我的方向错误了
可以帮忙看一下嘛谢谢


import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import os
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# 读取word_test.csv文件
file_path = "路径"
word_test = pd.read_csv(file_path, encoding='utf-8')

# 图像路径
image_path = "路径"

# 加载预训练的ResNet-50模型
model = ResNet50(weights='imagenet', include_top=False, pooling='avg')

# 图像特征提取函数
def image_feature_extraction(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    features = model.predict(img_array)
    return features.flatten()

# 分批次处理图像
def process_images_in_batches(image_path, batch_size):
    features_list = []
    filenames_list = []
    for img in os.listdir(image_path):
        if len(features_list) >= batch_size:
            features_array = np.vstack(features_list)
            yield features_array, filenames_list
            features_list = []
            filenames_list = []
        
        features = image_feature_extraction(os.path.join(image_path, img))
        features_list.append(features)
        filenames_list.append(img)
    
    if features_list:
        features_array = np.vstack(features_list)
        yield features_array, filenames_list

# 保存文本模型
tfidf = TfidfVectorizer()
word_embeddings = tfidf.fit_transform(word_test['caption'])
np.save('text_model.npy', tfidf)

# 图像特征提取
batch_size = 100
image_features_list = []
for batch_features, _ in process_images_in_batches(image_path, batch_size):
    image_features_list.append(batch_features)

image_features = np.concatenate(image_features_list, axis=0)
np.save('image_features.npy', image_features)

# 多模态特征融合
combined_features = np.hstack((image_features, word_embeddings.toarray()))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = cosine_similarity(image_features, text_embedding)
    top_five_indices = np.argsort(similarities.flatten())[::-1][:5]
    top_five_images = [os.listdir(image_path)[i] for i in top_five_indices]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('路径', index=False)

这段代码的报错问题是


File "c:/Users/wyf/Desktop/泰迪杯/5.py", line 66, in <module>
    combined_features = np.hstack((image_features, word_embeddings.toarray()))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 50000 and the array at index 1 has size 5000
PS C:\Users\wyf>

我试图让特征向量匹配 更改了下面代码

#...上面不变
# 调整文本特征向量的行数,使其与图像特征向量的行数匹配
word_embeddings_padded = np.pad(word_embeddings.toarray(), ((0, 45000), (0, 0)), mode='constant')

# 多模态特征融合
combined_features = np.hstack((image_features, word_embeddings_padded))

# 图像检索
result1 = []

for text in word_test['caption']:
    text_embedding = tfidf.transform([text])
    similarities = cosine_similarity(combined_features, text_embedding)
    top_five_indices = np.argsort(similarities.flatten())[::-1][:5]
    top_five_images = [os.listdir(image_path)[i] for i in top_five_indices]
    result1.append(top_five_images)

result1_df = pd.DataFrame(result1, columns=['image1', 'image2', 'image3', 'image4', 'image5'])
result1_df.to_csv('C:/Users/wyf/Desktop/泰迪杯/result1.csv', index=False)

旋即又出现了以下报错!
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\utils_param_validation.py", line 214, in wrapper
return func(*args, **kwargs)
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\metrics\pairwise.py", line 1578, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "C:\Users\wyf\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\sklearn\metrics\pairwise.py", line 190, in check_pairwise_arrays
raise ValueError(
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 13898 while Y.shape[1] == 11850
PS C:\Users\wyf>

我是不是应该改成


#调整向量行数
word_embeddings_padded = np.pad(word_embeddings.toarray(), ((0, 45000), (0, 2038)), mode='constant')

还是我的方向是有问题的呀

PS:我的图片文件5w,文本应该是5k

  • 写回答

4条回答 默认 最新

  • 专家-赤兔[在线] 优质创作者: 编程框架技术领域 2024-04-24 10:01
    关注

    引自免费微信小程序:皆我百晓生

    根据您提供的错误信息,看起来在计算cosine_similarity时出现了维度不匹配的问题。在调整文本特征向量的行数时,您需要确保列数也匹配。从错误信息来看,原始的文本特征向量(word_embeddings)有11850列,而图像特征向量(image_features)有13898列。

    为了使两个特征向量能正确地进行堆叠(hstack),我们需要对文本特征向量的列数进行调整,而不是仅仅增加行数。您可以尝试将图像特征向量的列数减少到与文本特征向量相同,然后进行堆叠。这里,我们可以使用GlobalAveragePooling2D层来达到降维的效果。

    以下是修改后的代码片段:

    import tensorflow.keras.layers as layers
    
    # 使用GlobalAveragePooling2D层来减少图像特征的维度
    pooling_layer = layers.GlobalAveragePooling2D()
    reduced_image_features = pooling_layer(image_features)
    
    # 确保两者的形状匹配
    if reduced_image_features.shape[1] != word_embeddings.shape[1]:
        # 如果仍然不匹配,尝试重新调整列数
        reduced_image_features = layers.Flatten()(reduced_image_features)
        reduced_image_features = layers.Dense(word_embeddings.shape[1], activation=None)(reduced_image_features)
    
    # 现在可以安全地堆叠这两个特征向量了
    combined_features = np.hstack((reduced_image_features, word_embeddings.toarray()))
    
    # 接下来继续执行图像检索部分...
    

    这段代码首先创建了一个GlobalAveragePooling2D层,用于平均池化图像特征,从而降低其维度。之后,检查处理后的图像特征与文本特征的列数是否匹配。如果不匹配,我们进一步使用Flatten层展平图像特征,然后通过一个全连接层将其转换为与文本特征相同的列数。这样应该就能解决您的问题并允许您继续进行多模态特征融合和图像检索。

    注意:这个解决方案假设您想保持文本特征的维度不变,并且图像特征可以通过降维来匹配。如果这不是您期望的行为,可能需要重新考虑数据处理流程,例如使用不同的预训练模型或在特征提取阶段进行适当的调整。

    评论 编辑记录

报告相同问题?

问题事件

  • 创建了问题 4月24日