使用BigBird模型过程中导入数据集阶段报错,求解答
WARNING:datasets.builder:Using custom data configuration default-433ca95c1a890d37
Downloading and preparing dataset json/default to /home/mat/.cache/huggingface/datasets/json/default-433ca95c1a890d37/0.0.0/a3e658c4731e59120d44081ac10bf85dc7e1388126b92338344ce9661907f253...
Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2584.29it/s]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 448.35it/s]
0 tables [00:00, ? tables/s]ERROR:datasets.packaged_modules.json.json:Failed to read file '/home/mat/project/1429.json' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Column() changed from object to array in row 0
Traceback (most recent call last):
File "/home/mat/.local/lib/python3.6/site-packages/datasets/packaged_modules/json/json.py", line 110, in _generate_tables
io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
File "pyarrow/_json.pyx", line 246, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BigBird.py", line 29, in <module>
dataset = load_dataset('json', data_files="./1429.json")
File "/home/mat/.local/lib/python3.6/site-packages/datasets/load.py", line 1751, in load_dataset
use_auth_token=use_auth_token,
File "/home/mat/.local/lib/python3.6/site-packages/datasets/builder.py", line 705, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/home/mat/.local/lib/python3.6/site-packages/datasets/builder.py", line 793, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/mat/.local/lib/python3.6/site-packages/datasets/builder.py", line 1275, in _prepare_split
generator, unit=" tables", leave=False, disable=(not logging.is_progress_bar_enabled())
File "/home/mat/.local/lib/python3.6/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/mat/.local/lib/python3.6/site-packages/datasets/packaged_modules/json/json.py", line 135, in _generate_tables
f"Not able to read records in the JSON file at {file}. "
AttributeError: 'list' object has no attribute 'keys'
数据集查错后没有空值,不知道为何出错。
数据集可见图