问题遇到的现象和发生背景
使用pyspark中的_map方法时,最后print(rdd2.collect())时报错
遇到的现象和发生背景,请写出第一个错误信息
IndexError: tuple index out of range
用代码块功能插入代码,请勿粘贴截图。 不用代码块回答率下降 50%
# 演示RDD的map成员使用方法
from pyspark import SparkConf, SparkContext
import os
os.environ['PYSPARK_PYTHON'] = 'D:/python/pytuon3.11.0/python.exe'
conf = SparkConf().setMaster('local[*]').setAppName('test_spark')
sc = SparkContext(conf=conf)
rdd = sc.parallelize([1, 2, 3, 4, 5])
def func(data):
return data * 10
rdd2 = rdd.map(func)
print(rdd2.collect())
运行结果及详细报错内容
D:\python\pytuon3.11.0\python.exe D:\python·learn\pyspark_map.training.py
22/12/08 10:51:16 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/12/08 10:51:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
return self._function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
state = _function_getstate(func)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.__code__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
return self._function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
state = _function_getstate(func)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.__code__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
~~~~~^^^^^^^
IndexError: tuple index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\python·learn\pyspark_map.training.py", line 17, in <module>
print(rdd2.collect())
^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 1197, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3505, in _jrdd
wrapped_func = _wrap_function(
^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3362, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3345, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
^^^^^^^^^^^^^^^^^^
File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 468, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range
Process finished with exit code 1
我的解答思路和尝试过的方法,不写自己思路的,回
使用findspark也会报错,手动序列化也不行
我想要达到的结果,如果你需要快速回答,请尝试 “付费悬赏”
这段代码应该将我定义的rdd中的数据每个乘以10后输出