LKLiug 2025-05-16 17:21 采纳率: 25%
浏览 50

部署github上的weclone项目时遇到的问题

不知道有人遇到过吗?想知道如何解决。

img


完整日志为

(.venv) root@autodl-container-e5aa47b621-4bee74ab:~/autodl-tmp/WeClone# weclone-cli make-dataset
INFO 05-16 17:15:48 [__init__.py:239] Automatically detected platform cuda.
[WeClone] I | 17:15:50 | Loading configuration from: ./settings.jsonc
[WeClone] I | 17:15:50 | 聊天记录禁用词: ['例如 密码', '例如 姓名', '//.....']
[WeClone] I | 17:15:50 | 开始使用llm对数据打分
[INFO|configuration_utils.py:697] 2025-05-16 17:15:50,895 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:771] 2025-05-16 17:15:50,897 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen2.5-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:50,935 >> loading file chat_template.jinja
[INFO|tokenization_utils_base.py:2313] 2025-05-16 17:15:51,266 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:697] 2025-05-16 17:15:51,267 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:771] 2025-05-16 17:15:51,269 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen2.5-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}

[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:15:51,270 >> loading file chat_template.jinja
[INFO|tokenization_utils_base.py:2313] 2025-05-16 17:15:51,584 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-05-16 17:15:51] llamafactory.data.template:157 >> Add <|im_end|> to stop words.
[INFO|configuration_utils.py:697] 2025-05-16 17:15:51,618 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:697] 2025-05-16 17:15:51,618 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json
[INFO|configuration_utils.py:771] 2025-05-16 17:15:51,619 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen2.5-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}

[INFO|image_processing_auto.py:301] 2025-05-16 17:15:51,621 >> Could not locate the image processor configuration file, will try to use the model config instead.
INFO 05-16 17:15:59 [config.py:585] This model supports multiple tasks: {'generate', 'embed', 'score', 'reward', 'classify'}. Defaulting to 'generate'.
INFO 05-16 17:15:59 [config.py:1697] Chunked prefill is enabled with max_num_batched_tokens=8192.
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2048] 2025-05-16 17:16:00,833 >> loading file chat_template.jinja
[INFO|tokenization_utils_base.py:2313] 2025-05-16 17:16:01,136 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:1093] 2025-05-16 17:16:01,230 >> loading configuration file ./Qwen2.5-7B-Instruct/generation_config.json
[INFO|configuration_utils.py:1140] 2025-05-16 17:16:01,231 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.05,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}

WARNING 05-16 17:16:01 [utils.py:2181] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing for more information. Reason: CUDA is initialized
INFO 05-16 17:16:05 [__init__.py:239] Automatically detected platform cuda.
INFO 05-16 17:16:07 [core.py:54] Initializing a V1 LLM engine (v0.8.2) with config: model='./Qwen2.5-7B-Instruct', speculative_config=None, tokenizer='./Qwen2.5-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=3072, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=./Qwen2.5-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 05-16 17:16:08 [utils.py:2321] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f54bb068c10>
INFO 05-16 17:16:09 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 05-16 17:16:09 [cuda.py:220] Using Flash Attention backend on V1 engine.
INFO 05-16 17:16:09 [gpu_model_runner.py:1174] Starting to load model ./Qwen2.5-7B-Instruct...
WARNING 05-16 17:16:09 [topk_topp_sampler.py:63] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]

ERROR 05-16 17:16:09 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 05-16 17:16:09 [core.py:343]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 05-16 17:16:09 [core.py:343]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 60, in __init__
ERROR 05-16 17:16:09 [core.py:343]     self.model_executor = executor_class(vllm_config)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 05-16 17:16:09 [core.py:343]     self._init_executor()
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 05-16 17:16:09 [core.py:343]     self.collective_rpc("load_model")
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-16 17:16:09 [core.py:343]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/utils.py", line 2255, in run_method
ERROR 05-16 17:16:09 [core.py:343]     return func(*args, **kwargs)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 136, in load_model
ERROR 05-16 17:16:09 [core.py:343]     self.model_runner.load_model()
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1177, in load_model
ERROR 05-16 17:16:09 [core.py:343]     self.model = get_model(vllm_config=self.vllm_config)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 05-16 17:16:09 [core.py:343]     return loader.load_model(vllm_config=vllm_config)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 444, in load_model
ERROR 05-16 17:16:09 [core.py:343]     loaded_weights = model.load_weights(
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 490, in load_weights
ERROR 05-16 17:16:09 [core.py:343]     return loader.load_weights(weights)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 235, in load_weights
ERROR 05-16 17:16:09 [core.py:343]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 187, in _load_module
ERROR 05-16 17:16:09 [core.py:343]     for child_prefix, child_weights in self._groupby_prefix(weights):
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 101, in _groupby_prefix
ERROR 05-16 17:16:09 [core.py:343]     for prefix, group in itertools.groupby(weights_by_parts,
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 98, in <genexpr>
ERROR 05-16 17:16:09 [core.py:343]     weights_by_parts = ((weight_name.split(".", 1), weight_data)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 420, in _get_all_weights
ERROR 05-16 17:16:09 [core.py:343]     yield from self._get_weights_iterator(primary_weights)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 403, in <genexpr>
ERROR 05-16 17:16:09 [core.py:343]     return ((source.prefix + name, tensor)
ERROR 05-16 17:16:09 [core.py:343]   File "/root/autodl-tmp/WeClone/.venv/lib/python3.10/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 441, in safetensors_weights_iterator
ERROR 05-16 17:16:09 [core.py:343]     with safe_open(st_file, framework="pt") as f:
ERROR 05-16 17:16:09 [core.py:343] safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooSmall
ERROR 05-16 17:16:09 [core.py:343] 
CRITICAL 05-16 17:16:09 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Killed

  • 写回答

3条回答 默认 最新

  • 檀越@新空间 2025-05-16 17:22
    关注

    下午好🌅🌅🌅
    本答案参考通义千问

    根据提供的日志信息,你在部署 weclone 项目时遇到了一些问题,但日志中并未明确显示具体的错误或异常。为了更好地帮助你解决问题,以下是一些可能的原因和对应的解决方案:


    可能的原因分析

    1. 环境依赖未正确安装

      • 某些依赖库可能未正确安装或版本不匹配。
    2. 配置文件问题

      • settings.jsonc 或其他配置文件可能存在格式错误或缺失关键字段。
    3. 模型路径问题

      • 模型文件路径可能指向了错误的目录或文件不存在。
    4. 特殊字符处理问题

      • 日志中提到 `Add
    评论

报告相同问题?

问题事件

  • 创建了问题 5月16日