大模型Expected all tensors to be on the same device

大模型小bai，四张卡推理，加载模型时


```python
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=load_type, device_map="auto").half().cuda()

指定了device_map="auto"，模型被划分到四块卡上，看着好像没有问题

```bash
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A800 80GB PCIe          Off | 00000000:4F:00.0 Off |                    0 |
| N/A   38C    P0              66W / 300W |  65255MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A800 80GB PCIe          Off | 00000000:50:00.0 Off |                    0 |
| N/A   39C    P0              66W / 300W |  17787MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A800 80GB PCIe          Off | 00000000:53:00.0 Off |                    0 |
| N/A   40C    P0              68W / 300W |  17787MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A800 80GB PCIe          Off | 00000000:57:00.0 Off |                    0 |
| N/A   39C    P0              69W / 300W |  14287MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

但是当调用推理后，报错：

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "openai_api_codellama-34b-2.py", line 194, in create_chat_completion
    generation_output = model.generate(
  File "/usr/local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/generation/utils.py", line 1681, in generate
    return self.beam_search(
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/generation/utils.py", line 3020, in beam_search
    outputs = self(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/models/llama/modeling_llama.py", line 820, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/models/llama/modeling_llama.py", line 708, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/models/llama/modeling_llama.py", line 421, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers-4.33.0-py3.8.egg/transformers/models/llama/modeling_llama.py", line 89, in forward
    return self.weight * hidden_states.to(input_dtype)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

我明白这个报错是说模型和数据不在同一张卡上，但是device_map=auto本来是指自动切分模型(?),从显存占用量来看每块卡的使用率也是不同的，所以我不明白本来就是要将模型分散到不同卡上，可是现在看起来不合并又不行？请问这要怎么改呢？
以及，torchrun的方式是不是就是把模型复制到四张卡上一模一样，然后就不会报这个错？

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
CSDN-Ada助手 CSDN-AI 官方账号 2023-12-09 20:31
关注
【以下回答由 GPT 生成】

问题：能否添加更多有关您代码的信息，特别是在处理模型加载方面的问题时，例如您想讨论的错误消息，以便更好地理解上下文。此外，您是否尝试过在加载模型时手动指定GPU设备的参数？

回答：非常抱歉，您提供的问题比较模糊，没有明确的代码和错误消息。为了更好地理解上下文，请提供相关代码和错误消息，以便我可以帮助您解决问题。

关于手动指定GPU设备的参数，在许多深度学习框架中，例如TensorFlow和PyTorch，我们可以通过指定设备的索引或名称来将模型加载到特定的GPU上。下面是使用PyTorch加载模型并指定GPU设备的示例代码：

import torch # 检查是否有可用的GPU设备 if torch.cuda.is_available(): device = torch.device("cuda") # 使用默认的GPU设备 # 或者可以指定特定的GPU设备 # device = torch.device("cuda:0") # 使用索引为0的GPU设备 # device = torch.device("cuda:1") # 使用索引为1的GPU设备 # 加载模型到指定的GPU设备 model = Model().to(device) # 进行训练或推理等操作 output = model(input_tensor.to(device)) else: device = torch.device("cpu") # 如果没有可用的GPU设备，则使用CPU设备 model = Model().to(device) output = model(input_tensor)

请提供更多详细信息，以便我能够更好地帮助您解决问题。

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

训练bilstm模型，报错 RuntimeError: input.size(-1) must be equal to input_size. Expected 768, got 128，求解？ lstm nlp python
2022-01-13 03:53

回答 1 已采纳 self.lstm = nn.LSTM的input_size改成768
ValueError: too many values to unpack (expected 3)，很崩溃 python windows 人工智能
2021-08-31 17:01

回答 3 已采纳 logits, conv5s, attens, scaleds = net(x) 或者 logits, conv5s, attens, *scaleds = net(x)
Expected one result (or null) to be returned by selectOne(), but found: 3 java spring boot vue.js
2023-04-13 02:29

回答 13 已采纳 xml写的有问题，返回类型写错了。mybatis返回List< Integer >时resultType写java.lang.Integer而不是java.util.List
RuntimeError:Expected all tensors to be on the same device, but found at least two devices解决方案
2023-09-05 07:15

爱编程的喵喵的博客本文主要介绍了RuntimeError:Expected all tensors to be on the same device, but found at least two devices解决方案，希望能对学习python的同学们有所帮助。文章目录 1. 问题描述 2. 解决方案
ValueError: too many values to unpack (expected 2) python 深度学习
2022-09-09 09:46

回答 4 已采纳 eat_pool, feat_fc = net(input, input, test_mode[1])这段话的net函数的返回值给多了，看下net的return几个变量
ValueError: not enough values to unpack (expected 2, got 0) python 深度学习
2023-03-26 01:25

回答 9 已采纳虽然报错看着跟labels.caches文件有关系，但是可能问题出在labels的生成代码上，大概率是生成labels的代码的里面路径出了问题，如果以前训练成功过，这边建议可以比较一下以前成功的生成文
not enough values to unpack (expected 3, got 1) python
2022-04-29 13:58

回答 1 已采纳你是用错了，举个例子 image = cv2.imread(args["image"]) # 通道分离，注意顺序BGR不是RGB (B, G, R) = cv2.split(image)
Expected all tensors to be on the same device, but found at least two devices
2023-08-23 10:36

坐在墙上的猫的博客 Expected all tensors to be on the same device, but found at least two devices, 原因是计算的过程中，两个不同类型的变量在一起进行运算，即一个变量存储在gpu中，一个变量存储在cpu中，两个变量的存储位置冲突...
如何解决ValueError: not enough values to unpack (expected 4, got 2) pycharm python 深度学习
2023-03-27 16:09

回答 2 已采纳 model(template, source)看看这个返回了什么应该返回的不是四元组，而是二元组（即你不能通过4个变量接收）
python调用requests时发生too many values to unpack (expected 2)错误 python
2022-09-17 14:33

回答 3 已采纳一、headers应该是字典，你这里写成了集合，需要把逗号改成冒号二、requests.get的用法要使用关键字传参而不是位置传参，需要改成response = requests.get(url=ur
加载自定义数据集时出错RuntimeError: stack expects each tensor to be equal size pycharm pytorch 深度学习
2022-07-20 19:00

回答 2 已采纳瞅瞅行不行https://blog.csdn.net/balcklist/article/details/119033591
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0
2024-08-30 20:31

zhangfeng1133的博客 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat) 这个错误再次指出了...
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cud
2024-07-29 20:14

Yan-英杰的博客 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cud
猫头虎分享如何解决 RuntimeError: Expected all tensors to be on the same device, but found at least two device
2024-06-10 23:14

虎头金猫的博客这个错误通常出现在深度学习模型训练过程中，尤其是在使用 ...问题原因解决方法避免措施张量分布在不同设备上使用to()方法将张量同步到同一设备养成良好的编码习惯，定期代码审查模型和数据未同步到同一设备使用to()
optimizer.step() 出现 Expected all tensors to be on the same device
2023-02-17 18:25

lanlinbuaa的博客在optimizer.step()行出现RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 的解决方案
RuntimeError: Expected all tensors to be on the same device, but found at least two devices
2023-01-11 00:56

德国Viviane的博客运行以下代码时报错，这是在函数定义里，且当时loss和model都有.to(device)的操作在网上找了很久都没有找到原因，报错的地方在cat：即在在数据拼接的时候，即一个数据在GPU0上，一个数据在GPU1上，这就会出现错误...
【python报错已解决】RuntimeError: Expected all tensors to be on the same device, but found at least two dev
2024-08-15 09:32

鸽芷咕的博客在运行Python深度学习代码时，你是否遇到过`RuntimeError: Expected all tensors to be on the same device, but found at least two devices`这样的报错？这通常是由于在代码中混合使用了CPU和GPU运算，或者不同...
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and
2024-03-18 16:37

shimly123456的博客来源：https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least。这是一个 pytorch 中常见的错误，表示训练或者预测过程中，有些张量不在同...
解决RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cp
2022-05-26 09:00

AiCharm的博客 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_mm) 原因代码中的Tensor**，...
【pytorch】RuntimeError: Expected all tensors to be on the same device, but found at least two devices
2023-08-26 11:54

qq_44862379的博客项目场景：学习pytorch途中将cpu代码转移到gpu上时遇到的一些错误问题描述 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 原因分析：在将代码...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 12月9日

悬赏问题

¥15 Android studio AVD启动不了
¥15 陆空双模式无人机怎么做
¥15 想咨询点问题，与算法转换，负荷预测，数字孪生有关
¥15 C#中的编译平台的区别影响
¥15 软件供应链安全是跟可靠性有关还是跟安全性有关？
¥15 电脑蓝屏logfilessrtsrttrail问题
¥20 关于wordpress建站遇到的问题！(语言-php)（相关搜索：云服务器）
¥15 【求职】怎么找到一个周围人素质都很高不会欺负他人，并且未来月薪能够达到一万以上（技术岗）的工作？希望可以收到写有具体，可靠，已经实践过了的路径的回答？
¥15 Java+vue部署版本反编译
¥100 对反编译和ai熟悉的开发者。

大模型Expected all tensors to be on the same device

2条回答 默认 最新

问题事件

悬赏问题

2条回答默认最新