Benje 2026-03-20 14:21 采纳率: 33.3%
浏览 3

data-juicer运行报错,日志文件路径无法找到

data-juicer日志文件路径出现了问题,按照官方网站的技术文档的uv pip方式下载的data-juicer,运行后出现报错,求问是出现了什么问题

我的配置文件是

project_name: 'windows-fix'
dataset_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/raw_data.jsonl'
np: 4
export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/processed_data.jsonl'

process:
  - language_id_score_filter:
      lang: 'zh'
      min_score: 0.8
(D:\WorkRes\condaData\envs_dirs\env2-dj) PS D:\WorkRes\EnvDataJuicer\dj-practice> dj-process --config .\process.yaml
2026-03-20 14:18:24.879 | ERROR    | __main__:10 - An error has been caught in function '<module>', process 'MainProcess' (16736), thread 'MainThread' (30512):
Traceback (most recent call last):

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': '', '__loader__': <zipimporter object "D:\WorkRes\condaData\envs_dir...
           │         └ <code object <module> at 0x000001A6B57DFD60, file "D:\WorkRes\condaData\envs_dirs\env2-dj\Scripts\dj-process.exe\__main__.py"...
           └ <function _run_code at 0x000001A6B557F640>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': '', '__loader__': <zipimporter object "D:\WorkRes\condaData\envs_dir...
         └ <code object <module> at 0x000001A6B57DFD60, file "D:\WorkRes\condaData\envs_dirs\env2-dj\Scripts\dj-process.exe\__main__.py"...

> File "D:\WorkRes\condaData\envs_dirs\env2-dj\Scripts\dj-process.exe\__main__.py", line 10, in <module>
    sys.exit(main())
    │   │    └ <function main at 0x000001A6A7A15C60>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\Lib\site-packages\tools\process_data.py", line 21, in main
    cfg = init_configs()
          └ <function init_configs at 0x000001A69E5049D0>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\config\config.py", line 824, in init_configs
    cfg = init_setup_from_cfg(cfg, load_configs_only)
          │                   │    └ False
          │                   └ Namespace(config=[Path_fr(.\process.yaml, cwd=D:\WorkRes\EnvDataJuicer\dj-practice)], auto=False, auto_num=1000, hpo_config=N...
          └ <function init_setup_from_cfg at 0x000001A69E504F70>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\config\config.py", line 920, in init_setup_from_cfg
    setup_logger(
    └ <function setup_logger at 0x000001A69C3383A0>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\utils\logger_utils.py", line 170, in setup_logger
    logger.add(
    │      └ <function Logger.add at 0x000001A6B76FE680>
    └ <loguru.logger handlers=[(id=2, level=20, sink=<stderr>)]>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_file_sink.py", line 192, in __init__
    self._create_file(path)
    │    │            └ 'D:\\WorkRes\\EnvDataJuicer\\dj-practice\\20260320_061824_de1d34\\logs\\export_..\\processed_data.jsonl_time_20260320141824.txt'
    │    └ <function FileSink._create_file at 0x000001A6B76917E0>
    └ <loguru._file_sink.FileSink object at 0x000001A6A7B5D450>

  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_file_sink.py", line 228, in _create_file
    self._file = open(path, **self._kwargs)
    │    │            │       │    └ {'mode': 'a', 'buffering': 1, 'encoding': 'utf8'}
    │    │            │       └ <loguru._file_sink.FileSink object at 0x000001A6A7B5D450>
    │    │            └ 'D:\\WorkRes\\EnvDataJuicer\\dj-practice\\20260320_061824_de1d34\\logs\\export_..\\processed_data.jsonl_time_20260320141824.txt'
    │    └ None
    └ <loguru._file_sink.FileSink object at 0x000001A6A7B5D450>

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\WorkRes\\EnvDataJuicer\\dj-practice\\20260320_061824_de1d34\\logs\\export_..\\processed_data.jsonl_time_20260320141824.txt'
Traceback (most recent call last):
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\Scripts\dj-process.exe\__main__.py", line 10, in <module>
    sys.exit(main())
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_logger.py", line 1297, in catch_wrapper
    return function(*args, **kwargs)
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\Lib\site-packages\tools\process_data.py", line 21, in main
    cfg = init_configs()
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\config\config.py", line 824, in init_configs
    cfg = init_setup_from_cfg(cfg, load_configs_only)
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\config\config.py", line 920, in init_setup_from_cfg
    setup_logger(
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\data_juicer\utils\logger_utils.py", line 170, in setup_logger
    logger.add(
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_logger.py", line 802, in add
    wrapped_sink = FileSink(path, **kwargs)
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_file_sink.py", line 192, in __init__
    self._create_file(path)
  File "D:\WorkRes\condaData\envs_dirs\env2-dj\lib\site-packages\loguru\_file_sink.py", line 228, in _create_file
    self._file = open(path, **self._kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'D:\\WorkRes\\EnvDataJuicer\\dj-practice\\20260320_061824_de1d34\\logs\\export_..\\processed_data.jsonl_time_20260320141824.txt'

  • 写回答

1条回答 默认 最新

  • 檀越@新空间 2026-03-20 14:26
    关注

    下午好🌅🌅🌅
    本答案参考通义千问

    根据你提供的错误日志,问题出在 Data-Juicer 无法正确创建日志文件路径。具体来说,错误信息是:

    FileNotFoundError: [Errno 2] No such file or directory: 'D:\\WorkRes\\EnvDataJuicer\\dj-practice\\20260320_061824_de1d34\\logs\\export_..\\processed_data.jsonl_time_20260320141824.txt'
    

    这表明 Data-Juicer 在尝试写入日志文件时,目录结构中缺少某些层级的文件夹,或者路径中存在非法字符(如 ..),导致无法正常创建文件。


    ✅ 问题分析

    1. 路径中的 .. 引起的问题

    从错误日志中可以看到,日志文件路径中包含 export_..,这可能是由于配置文件中 export_path 的设置不当,导致系统自动拼接了错误的路径。

    例如:

    export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/processed_data.jsonl'
    

    如果这个路径没有被正确解析,Data-Juicer 可能会尝试将它作为子路径处理,从而生成类似 export_.. 这样的路径片段,这是不合法的。

    2. 目录结构缺失

    Data-Juicer 会自动创建一个以时间戳命名的临时目录(如 20260320_061824_de1d34)用于存储日志文件。如果该目录或其父目录不存在,就会导致 FileNotFoundError


    🔧 解决方案

    以下是详细的解决方案,按照步骤进行操作即可解决问题:

    1. 检查并修正配置文件中的路径

    修改前:

    export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/processed_data.jsonl'
    

    修改后(只保留路径,不带文件名):

    export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/'
    

    说明: Data-Juicer 会自动为输出文件添加时间戳和扩展名,因此只需要指定目录即可。


    2. 手动创建所需的目录结构

    确保以下目录存在,否则 Data-Juicer 会报错:

    D:\WorkRes\EnvDataJuicer\dj-practice\
    

    如果没有,请手动创建:

    mkdir D:\WorkRes\EnvDataJuicer\dj-practice\
    

    3. 检查并修复路径中的非法字符

    如果你的配置文件中有如下内容:

    export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/processed_data.jsonl'
    

    请确保该路径中没有 .. 或其他非法字符。建议使用绝对路径,避免相对路径混淆。


    4. 运行命令时确保当前目录正确

    确保你在执行 dj-process 命令时,处于正确的项目目录下:

    cd D:\WorkRes\EnvDataJuicer\dj-practice
    dj-process --config .\process.yaml
    

    5. 修改后的配置文件示例

    project_name: 'windows-fix'
    dataset_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/raw_data.jsonl'
    np: 4
    export_path: 'D:/WorkRes/EnvDataJuicer/dj-practice/'
    
    process:
      - language_id_score_filter:
          lang: 'zh'
          min_score: 0.8
    

    🛠️ 附加建议

    如果仍然报错,可以尝试以下方法:

    1. 删除现有日志目录
      删除 D:\WorkRes\EnvDataJuicer\dj-practice\20260320_061824_de1d34 文件夹(如果有),让 Data-Juicer 自动重新创建。

    2. 检查权限问题
      确保你有权限在目标路径中创建文件和目录。

    3. 使用 Python 脚本测试路径
      你可以编写一个简单的 Python 脚本,验证路径是否可访问:

      import os
      
      path = r'D:\WorkRes\EnvDataJuicer\dj-practice'
      if not os.path.exists(path):
          os.makedirs(path)
      print("Path exists:", os.path.exists(path))
      

    ✅ 总结

    | 问题原因 | 解决方案 | |---------|---------| | 配置文件中 export_path 包含文件名,导致路径错误 | 将 export_path 改为仅目录路径 | | 目录结构缺失 | 手动创建所需目录 | | 路径中出现非法字符(如 ..) | 检查并修正配置文件中的路径 | | 权限不足或路径不可写 | 检查权限并确保路径可写 |


    如果你还有其他错误或需要进一步帮助,请随时提供完整的日志文件内容。

    评论

报告相同问题?

问题事件

  • 创建了问题 3月20日