sakura_aqi 2022-12-08 15:55
浏览 71
已结题

pyspark数据分析

问题遇到的现象和发生背景

使用pyspark中的_map方法时,最后print(rdd2.collect())时报错

遇到的现象和发生背景,请写出第一个错误信息

IndexError: tuple index out of range

用代码块功能插入代码,请勿粘贴截图。 不用代码块回答率下降 50%

# 演示RDD的map成员使用方法

from pyspark import SparkConf, SparkContext
import os
os.environ['PYSPARK_PYTHON'] = 'D:/python/pytuon3.11.0/python.exe'

conf = SparkConf().setMaster('local[*]').setAppName('test_spark')
sc = SparkContext(conf=conf)

rdd = sc.parallelize([1, 2, 3, 4, 5])

def func(data):
    return data * 10

rdd2 = rdd.map(func)

print(rdd2.collect())

运行结果及详细报错内容
D:\python\pytuon3.11.0\python.exe D:\python·learn\pyspark_map.training.py 
22/12/08 10:51:16 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/12/08 10:51:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 458, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\python·learn\pyspark_map.training.py", line 17, in <module>
    print(rdd2.collect())
          ^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 1197, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
                                                        ^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3505, in _jrdd
    wrapped_func = _wrap_function(
                   ^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3362, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\rdd.py", line 3345, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
                      ^^^^^^^^^^^^^^^^^^
  File "D:\python\pytuon3.11.0\Lib\site-packages\pyspark\serializers.py", line 468, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

Process finished with exit code 1


我的解答思路和尝试过的方法,不写自己思路的,回

使用findspark也会报错,手动序列化也不行

我想要达到的结果,如果你需要快速回答,请尝试 “付费悬赏”

这段代码应该将我定义的rdd中的数据每个乘以10后输出

  • 写回答

0条回答 默认 最新

    报告相同问题?

    问题事件

    • 系统已结题 12月16日
    • 创建了问题 12月8日

    悬赏问题

    • ¥100 如何寻找到黑客帮助,愿意付丰厚的酬劳
    • ¥15 java代码写在记事本上后在cmd上运行时无报错但又没生成文件
    • ¥15 关于#python#的问题:在跑ldsc数据整理的时候一直抱这种错误,要么--out识别不了参数,要么--merge-alleles识别不了参数(操作系统-linux)
    • ¥15 PPOCRLabel
    • ¥15 混合键合键合机对准标识
    • ¥100 现在不懂的是如何将当前的相机中的照片,作为纹理贴图,映射到扫描出的模型上
    • ¥15 魔霸ROG7 pro,win11.息屏后会显示黑屏,如图,如何解决?(关键词-重新启动)
    • ¥15 有没有人知道这是哪里出了问题啊?要怎么改呀?
    • ¥200 C++表格文件处理-悬赏
    • ¥15 Windows Server2016本地登录失败