George_huhu 2023-12-26 16:10 采纳率: 0%
浏览 9
已结题

Python 学习 pyspark遇到问题

学习黑马pySpark 中rdd.map()方法,只要执行了rdd.map(),再执行rdd.collect()就会报错
代码如下:


from pyspark import SparkConf, SparkContext
import os
os.environ["PYSPARK_PYTHON"] = "D:/python-workSpace/Testpython1/.venv/Scripts/python.exe"
conf = SparkConf().setMaster("local").setAppName("sparkRDD")
sc = SparkContext(conf=conf)
rdd7 = sc.parallelize([6, 7]).map(lambda x: x + 1)
print(rdd7.collect())
sc.stop()

报错如下:

img

img

img

23/12/26 15:18:02 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "D:\python-workSpace\Testpython1\spark\sparkRDD.py", line 13, in <module>
    print(rdd7.collect())
          ^^^^^^^^^^^^^^
  File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\pyspark\rdd.py", line 1833, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "D:\python-workSpace\Testpython1\.venv\Lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-2JGNEKL executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
    at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
    at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
    at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
    at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
    at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
    at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
    at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
    at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1046)
    at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2438)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
    ... 32 more

  • 写回答

0条回答 默认 最新

    报告相同问题?

    问题事件

    • 系统已结题 1月3日
    • 创建了问题 12月26日

    悬赏问题

    • ¥15 WPF RichTextBox格式化大量文本卡顿如何解决
    • ¥15 什么设备可以研究OFDM的60GHz毫米波信道模型
    • ¥15 不知道是该怎么引用多个函数片段
    • ¥15 pip install后修改模块路径,import失败,需要在哪里修改环境变量?
    • ¥15 爬取1-112页所有帖子的标题但是12页后要登录后才能 我使用selenium模拟登录 账号密码输入后 会报错 不知道怎么弄了
    • ¥30 关于用python写支付宝扫码付异步通知收不到的问题
    • ¥50 vue组件中无法正确接收并处理axios请求
    • ¥15 隐藏系统界面pdf的打印、下载按钮
    • ¥15 MATLAB联合adams仿真卡死如何解决(代码模型无问题)
    • ¥15 基于pso参数优化的LightGBM分类模型