weixin_48323671 2023-01-13 10:09 采纳率: 50%
浏览 396
已结题

pycharm 使用pyspark 调用map算子一直报错

pycharm 使用pyspark 调用map算子一直报错
from pyspark import SparkConf, SparkContext
import os

os.environ['PYSPARK_PYTHON'] = 'D:/myApps/python/python.exe'

conf = SparkConf().setMaster("local[*]").setAppName("test_app_name")
sc = SparkContext(conf=conf)
wyyList = {"name": "刘德华", 'age': 18}
rdd1 = sc.parallelize([1, 2, 3, 4])


def func(data):
    return data * 2


rdd2 = rdd1.map(func)
print("rdd1rdd1", rdd1.collect())  # 这行打印正常
print("rdd2rdd2", rdd2.collect())

sc.stop()

Traceback (most recent call last):

File "D:\myApps\python\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
return self._function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
state = _function_getstate(func)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
File "D:\myApps\python\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
return self._function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
return self._dynamic_function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
state = _function_getstate(func)
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
f_globals_ref = _extract_code_globals(func.code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in
out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
~~~~~^^^^^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\wyyStudyDocument\python\python-learn\pyspark\pyspark_02.py", line 23, in
print("rdd2rdd2", rdd2.collect())
^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\rdd.py", line 1194, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\rdd.py", line 3500, in _jrdd
wrapped_func = _wrap_function(
^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\rdd.py", line 3359, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\rdd.py", line 3342, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
^^^^^^^^^^^^^^^^^^
File "D:\myApps\python\Lib\site-packages\pyspark\serializers.py", line 468, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

  • 写回答

5条回答 默认 最新

  • qq_43961432 2023-01-13 10:21
    关注

    看看是不是环境配置不正确导致的,请检查SparkContext是否正确配置,以及Python版本是否与Spark版本兼容。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

问题事件

  • 系统已结题 1月21日
  • 已采纳回答 1月13日
  • 创建了问题 1月13日

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么