spark sparkcontext 初始化失败 5C

环境 Ubuntu 16.04
hadoop 2.7.3
scala 2.11.8
spark 2.1.0
已经安装好了hadoop scala,之后配置了下 spark 运行 spark-shell 就爆出来下面的错误

 18/05/22 15:43:30 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: For input string: "true #是否记录Spark事件,用于应用程序在完成后重构webUI"
    at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290)
    at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260)
    at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29)
    at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:407)
    at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:407)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.SparkConf.getBoolean(SparkConf.scala:407)
    at org.apache.spark.SparkContext.isEventLogEnabled(SparkContext.scala:238)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:407)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
    at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
    at $line3.$read$$iw$$iw.<init>(<console>:15)
    at $line3.$read$$iw.<init>(<console>:42)
    at $line3.$read.<init>(<console>:44)
    at $line3.$read$.<init>(<console>:48)
    at $line3.$read$.<clinit>(<console>)
    at $line3.$eval$.$print$lzycompute(<console>:7)
    at $line3.$eval$.$print(<console>:6)
    at $line3.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
    at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
    at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
    at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
    at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
    at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
    at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
    at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
    at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
    at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
    at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
    at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
    at org.apache.spark.repl.Main$.doMain(Main.scala:68)
    at org.apache.spark.repl.Main$.main(Main.scala:51)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.IllegalArgumentException: For input string: "true #是否记录Spark事件,用于应用程序在完成后重构webUI"
  at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290)
  at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260)
  at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29)
  at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:407)
  at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:407)
  at scala.Option.map(Option.scala:146)
  at org.apache.spark.SparkConf.getBoolean(SparkConf.scala:407)
  at org.apache.spark.SparkContext.isEventLogEnabled(SparkContext.scala:238)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:407)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
  ... 47 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql

1个回答

你看下几个配置文件是不是在写或者保存的时候有问题

Abrohambaby
NSDL 这里应该就是配置文件的问题 可就是找不到,我肯定是哪一步没有配置
一年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
关于sparkcontext初始化问题
在sparkcontext初始化时会创建一个call site,请教大神这个call site的主要作用是什么![图片](https://img-ask.csdn.net/upload/201507/22/1437551412_308247.jpg)![图片](https://img-ask.csdn.net/upload/201507/22/1437551444_935433.jpg)
spark.SparkContext Error initializingSparkContext.
17/09/22 11:07:06 ERROR inject.Errors: The following errors and warnings have been detected with resource and/or provider classes: SEVERE: Missing dependency for field: javax.ws.rs.core.UriInfo com.alibaba.fastjson.support.jaxrs.FastJsonProvider.uriInfo 17/09/22 11:07:06 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state INITED; cause: com.sun.jersey.spi.inject.Errors$ErrorMessagesException com.sun.jersey.spi.inject.Errors$ErrorMessagesException at com.sun.jersey.spi.inject.Errors.processErrorMessages(Errors.java:170) at com.sun.jersey.spi.inject.Errors.postProcess(Errors.java:136) at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:199) at com.sun.jersey.api.client.Client.<init>(Client.java:187) at com.sun.jersey.api.client.Client.<init>(Client.java:170) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:268) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:164) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:125) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:530) at com.lotuseed.loadfile_HdfsToHbase.GetAppName$.sparkOperation(GetAppName.scala:18) at com.lotuseed.loadfile_HdfsToHbase.GetAppName$.main(GetAppName.scala:68) at com.lotuseed.loadfile_HdfsToHbase.GetAppName.main(GetAppName.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/09/22 11:07:06 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl failed in state INITED; cause: com.sun.jersey.spi.inject.Errors$ErrorMessagesException 这种错误是什么原因引起的!怎么解决,使用的spark版本为1.6.1 求大神!
spring boot 集成spark 初始化spark context 报错,"datanucleus" yet this has not been found
``` @Configuration public class SparkContextBean{ private String sparkHome = "."; private String appName = "sparkTest"; private String master = "local"; @Autowired KerborseUtil kerborseUtil; @Bean @ConditionalOnMissingBean(SparkConf.class) public SparkConf sparkConf() throws Exception { kerborseUtil.zkAuthentication(); kerborseUtil.hiveAuthentication(); SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); return conf; } @Bean @ConditionalOnMissingBean(JavaSparkContext.class) public JavaSparkContext javaSparkContext() throws Exception { return new JavaSparkContext(sparkConf()); } @Bean @ConditionalOnMissingBean(HiveContext.class) public HiveContext hiveContext() throws Exception { return new HiveContext(javaSparkContext().sc()); } } ``` 然后项目启动的时候报错 Caused by: java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) ... 75 common frames omitted Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification. at org.datanucleus.NucleusContext.<init>(NucleusContext.java:283) at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247) at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:416) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:301) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) ... 83 common frames omitted 环境:hadoop2.7.2+hive1.3.0+spark1.5.1+scala2.10.4 网上看了 https://blog.csdn.net/qq_38426934/article/details/81902830 但是没有搞定,不知道怎么解决,哎惭愧 猜想应该是和datanucleus相关的三个jar包有关系,jar包依赖也贴上 求大神们帮忙,看看 c币没了,不好意思,还是想得到大神的帮助,不胜感激 ``` <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-core</artifactId> <version>3.2.10</version> </dependency> <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-api-jdo</artifactId> <version>3.2.6</version> </dependency> <dependency> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-rdbms</artifactId> <version>3.2.9</version> </dependency> ```
map算子里面使用sparkContext 报 java.io.NotSerializableException: org.apache.spark.SparkContext错?
val receiverStream: ReceiverInputDStream[ String ] = RabbitMQUtils.createStream[ String ](ssc, params) receiverStream.print() receiverStream.map(value => { //@transient val sc = spark.sparkContext val jsonS = JSON.parseFull(value) val mapjson: Map[ String, String ] = regJson(jsonS) val alarmContent = mapjson.get("alarmContent").toString.replace("Some(", "").replace(")", "") val alarmEventId = mapjson.get("alarmEventId").toString.replace("Some(", "").replace(")", "") val alarmLevel = mapjson.get("alarmLevel").toString.replace("Some(", "").replace(")", "") val alarmType = mapjson.get("alarmType").toString.replace("Some(", "").replace(")", "") val buildingId = mapjson.get("buildingId").toString.replace("Some(", "").replace(")", "") val chargesCode = mapjson.get("chargesCode").toString.replace("Some(", "").replace(")", "") val createDate = mapjson.get("createDate").toString.replace("Some(", "").replace(")", "").toDouble val delFlag = mapjson.get("delFlag").toString.replace("Some(", "").replace(")", "") val deviceId = mapjson.get("deviceId").toString.replace("Some(", "").replace(")", "") val happenTime = mapjson.get("happenTime").toString.replace("Some(", "").replace(")", "").toDouble val isNewRecord = mapjson.get("isNewRecord").toString.replace("Some(", "").replace(")", "").toBoolean val page = mapjson.get("page").toString.replace("Some(", "").replace(")", "") val producerCode = mapjson.get("producerCode").toString.replace("Some(", "").replace(")", "") val sqlMap = mapjson.get("sqlMap").toString.replace("Some(", "").replace(")", "") println(alarmEventId) val strings: Apple = Apple(alarmContent, alarmEventId, alarmLevel, alarmType, buildingId, chargesCode, createDate, delFlag, deviceId, happenTime, isNewRecord, page, producerCode, sqlMap) val apples: Seq[ Apple ] = Seq(strings) //println("走到这里了!") println("logs:" + apples) // val appRdd: RDD[ Apple ] = sc.makeRDD(apples) /* value1.foreachPartition(iter =>{ import spark.implicits._ val frameDF: DataFrame = value1.toDF() frameDF.createTempView("t_1") frameDF.show() })*/ val value1: RDD[ Apple ] = sc.parallelize(apples) import spark.implicits._ val frameDF: DataFrame = value1.toDF() frameDF.createTempView("t_1") frameDF.show() }).print()
使用spark的standalone模式调整心跳时间时出现Error(Invalid argument to --conf: spark.worker.timeout)?
使用spark集群运行程序时报错日志显示: ERROR TaskSchedulerImpl:70 - Lost executor 1 on : Executor heartbeat timed out after 381181 ms 所以使用spark submit更改心跳时间 [hadoop@Master spark2.4.0]$ bin/spark-submit --master spark://master:7077 --conf spark.worker.timeout 10000000 --py-files id.py id.py --name id 但是显示没有指令,请问该怎么做? Error: Invalid argument to --conf: spark.worker.timeout
Hive on spark查询报错。
求助!!!在hadoop使用Hive on spark执行Bigbench测试时,一直会有报错,log信息: FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation. An error occured while running command: ========== runEngineCmd -f /var/lib/hadoop-hdfs/Big-Bench/engines/hive/queries/q04/q04.sql ========== 在网上查了很多资料,有说版本不匹配的,有说是概率性问题,有没有大佬来瞅一眼啊。。哭了
找不到对应的类,然而jar包里实际上是有的
![图片说明](https://img-ask.csdn.net/upload/202001/22/1579700058_547525.png) 写了一个wordcount 本地没有问题。打包以后放到spark集群上,就一直包classNotFoundException.
spark on yarn 资源调度问题
为什么spark在yarn上运行时,资源使用情况如下图:有一个结点的资源使用很少。 ![图片说明](https://img-ask.csdn.net/upload/201912/17/1576545833_408284.png) 我的集群配置,一共六台电脑,一台运行驱动器,五台执行器,均为8g 8核, spark启动如下: ``` pyspark --master yarn --num-executors 4 --executor-memory 6g --executor-cores 6 --conf spark.default.parallelism=50 --deploy-mode client ``` 同时我设置--num-executors为4为什么会有5个contains,且不管--num-executors设置为多少,contaiers总是会+1
spark jdbc连接impala报错Method not supported
各位好 我的spark是2.1.0,用的hive-jdbc 2.1.0,现在写入impala的时候报以下错: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2304) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) at com.aoyou.data.CustomerVisitProduct$.saveToHive(CustomerVisitProduct.scala:281) at com.aoyou.data.CustomerVisitProduct$.main(CustomerVisitProduct.scala:221) at com.aoyou.data.CustomerVisitProduct.main(CustomerVisitProduct.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 以下是代码实现 val sparkConf = new SparkConf().setAppName("save").set("spark.sql.crossJoin.enabled", "true"); val sparkSession = SparkSession .builder() .enableHiveSupport() .getOrCreate(); val dataframe = sparkSession.createDataFrame(rddSchema, new Row().getClass()) val property = new Properties(); property.put("user", "xxxxx") property.put("password", "xxxxx") dataframe.write.mode(SaveMode.Append).option("driver", "org.apache.hive.jdbc.HiveDriver").jdbc("jdbc:hive2://xxxx:21050/rawdata;auth=noSasl", "tablename", property) 请问这是怎么回事啊?感觉是驱动版本问题
spark在yarn集群上执行client模式代码
spark的wordcount提交到yarn集群上运行时,出现以下报错:请问有大神知道如何解决吗? ``` [hadoop00@hadoop02 ~]$ ./spark-submit-wordcount-yarn-client.sh //下面是执行过程: 19/07/31 17:12:36 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.2.102:4040 19/07/31 17:12:36 INFO spark.SparkContext: Added JAR file:/home/hadoop00/spark-core-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://192.168.2.102:43723/jars/spark-core-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1564564356841 19/07/31 17:12:40 INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers 19/07/31 17:12:41 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 19/07/31 17:12:41 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 19/07/31 17:12:41 INFO yarn.Client: Setting up container launch context for our AM 19/07/31 17:12:41 INFO yarn.Client: Setting up the launch environment for our AM container 19/07/31 17:12:41 INFO yarn.Client: Preparing resources for our AM container 19/07/31 17:12:45 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 19/07/31 17:12:53 INFO yarn.Client: Uploading resource file:/tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe/__spark_libs__5797595590401639249.zip -> hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001/__spark_libs__5797595590401639249.zip 19/07/31 17:13:07 INFO yarn.Client: Uploading resource file:/tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe/__spark_conf__627970737981952935.zip -> hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001/__spark_conf__.zip 19/07/31 17:13:07 INFO spark.SecurityManager: Changing view acls to: hadoop00 19/07/31 17:13:07 INFO spark.SecurityManager: Changing modify acls to: hadoop00 19/07/31 17:13:07 INFO spark.SecurityManager: Changing view acls groups to: 19/07/31 17:13:07 INFO spark.SecurityManager: Changing modify acls groups to: 19/07/31 17:13:07 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop00); groups with view permissions: Set(); users with modify permissions: Set(hadoop00); groups with modify permissions: Set() 19/07/31 17:13:07 INFO yarn.Client: Submitting application application_1564523762236_0001 to ResourceManager 19/07/31 17:13:08 INFO impl.YarnClientImpl: Submitted application application_1564523762236_0001 19/07/31 17:13:08 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1564523762236_0001 and attemptId None 19/07/31 17:13:09 INFO yarn.Client: Application report for application_1564523762236_0001 (state: ACCEPTED) 19/07/31 17:13:09 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1564523805324 final status: UNDEFINED tracking URL: http://hadoop03:8088/proxy/application_1564523762236_0001/ user: hadoop00 19/07/31 17:13:10 INFO yarn.Client: Application report for application_1564523762236_0001 (state: FAILED) 19/07/31 17:13:10 INFO yarn.Client: client token: N/A diagnostics: Application application_1564523762236_0001 failed 2 times due to Error launching appattempt_1564523762236_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1564564389887 found 1564524406596 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) . Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1564523805324 final status: FAILED tracking URL: http://hadoop03:8088/cluster/app/application_1564523762236_0001 user: hadoop00 19/07/31 17:13:10 INFO yarn.Client: Deleted staging directory hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001 19/07/31 17:13:10 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at p2._01ScalaWordCountRemoteOps$.main(_01ScalaWordCountRemoteOps.scala:21) at p2._01ScalaWordCountRemoteOps.main(_01ScalaWordCountRemoteOps.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/07/31 17:13:10 INFO server.AbstractConnector: Stopped Spark@6f2bafef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 19/07/31 17:13:10 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.2.102:4040 19/07/31 17:13:10 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 19/07/31 17:13:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 19/07/31 17:13:10 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 19/07/31 17:13:10 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 19/07/31 17:13:10 INFO cluster.YarnClientSchedulerBackend: Stopped 19/07/31 17:13:10 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/07/31 17:13:10 INFO memory.MemoryStore: MemoryStore cleared 19/07/31 17:13:10 INFO storage.BlockManager: BlockManager stopped 19/07/31 17:13:10 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/07/31 17:13:10 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running 19/07/31 17:13:10 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/07/31 17:13:10 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at p2._01ScalaWordCountRemoteOps$.main(_01ScalaWordCountRemoteOps.scala:21) at p2._01ScalaWordCountRemoteOps.main(_01ScalaWordCountRemoteOps.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/07/31 17:13:10 INFO util.ShutdownHookManager: Shutdown hook called 19/07/31 17:13:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe ```
spark中序列化用kryo,遇到hibernate则报延迟加载的错误,咋解决?
spark中序列化用kryo,遇到hibernate则报延迟加载的错误,咋解决?
java 远程连接spark 出现错误
我使用的是sequenceiq/spark 搭建的docker集群,但是本机上能正常的运行,但是通过java远程连接访问的时候出现错误 代码为: ``` SparkConf sparkConf = new SparkConf().setAppName("JavaTopGroup").setMaster("spark://10.73.21.221:7077"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); ``` 出现的错误为: ``` 17/12/07 19:17:47 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 17/12/07 19:17:47 WARN StandaloneSchedulerBackend: Application ID is not initialized yet. 17/12/07 19:17:47 INFO SparkUI: Stopped Spark web UI at http://10.73.7.25:4040 17/12/07 19:17:47 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 8163. 17/12/07 19:17:47 INFO StandaloneSchedulerBackend: Shutting down all executors 17/12/07 19:17:47 INFO NettyBlockTransferService: Server created on 10.73.7.25:8163 17/12/07 19:17:47 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/12/07 19:17:47 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down 17/12/07 19:17:47 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.73.7.25, 8163, None) 17/12/07 19:17:47 INFO BlockManagerMasterEndpoint: Registering block manager 10.73.7.25:8163 with 900.6 MB RAM, BlockManagerId(driver, 10.73.7.25, 8163, None) 17/12/07 19:17:47 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.73.7.25, 8163, None) 17/12/07 19:17:47 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master 17/12/07 19:17:47 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.73.7.25, 8163, None) 17/12/07 19:17:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/12/07 19:17:47 INFO MemoryStore: MemoryStore cleared 17/12/07 19:17:47 INFO BlockManager: BlockManager stopped 17/12/07 19:17:47 INFO BlockManagerMaster: BlockManagerMaster stopped 17/12/07 19:17:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/12/07 19:17:47 ERROR TransportResponseHandler: Still have 3 requests outstanding when connection from /10.73.21.21:7077 is closed 17/12/07 19:17:47 INFO SparkContext: Successfully stopped SparkContext 17/12/07 19:17:47 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:224) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91) at org.apache.spark.SparkContext.<init>(SparkContext.scala:524) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at org.com.will.sparkl.App.main(App.java:24) 17/12/07 19:17:48 INFO SparkContext: SparkContext already stopped. Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem at scala.Predef$.require(Predef.scala:224) at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91) at org.apache.spark.SparkContext.<init>(SparkContext.scala:524) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at org.com.will.sparkl.App.main(App.java:24) 17/12/07 19:17:48 INFO ShutdownHookManager: Shutdown hook called 17/12/07 19:17:48 INFO ShutdownHookManager: Deleting directory C:\Users\will\AppData\Local\Temp\spark-c60f05a8-5476-469b-8c43-d8476796a1dd ```
那位大佬看下scala报错求解决
Exception in thread "main" java.lang.VerifyError: class scala.collection.mutable.WrappedArray overrides final method toBuffer.()Lscala/collection/mutable/Buffer; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73) at org.apache.spark.SparkConf.<init>(SparkConf.scala:68) at org.apache.spark.SparkConf.<init>(SparkConf.scala:55) at SessionStat$.main(SessionStat.scala:21) at SessionStat.main(SessionStat.scala) ``` ``` object SessionStat { def main(args: Array[String]): Unit = { //获取筛选条件 val jsonStr = ConfigurationManager.config.getString(Constants.TASK_PARAMS) //获取筛选条件对应的JsonObject val taskParam = JSONObject.fromObject(jsonStr) //创建全局唯一的主键 val taskUUID = UUID.randomUUID().toString //创建sparkSession val sparkConf = new SparkConf().setAppName("session").setMaster("local[*]") //创建sparkSession(包含SparkContext) val sparkSession = SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate() //获取原始的动作表数据 //actionRDD:RDD[UserVisit] val actionRDD = getOriActionRDD(sparkSession,taskParam) actionRDD.foreach(println(_)) } def getOriActionRDD(sparkSession: SparkSession, taskParam: JSONObject) = { //获取查询时间的开始时间 val startDate = ParamUtils.getParam(taskParam,Constants.PARAM_START_DATE) //获取查询时间的结束时间 val endDate = ParamUtils.getParam(taskParam,Constants.PARAM_END_DATE) //查询数据 val sql = "select * from user_visit_action where date >='"+startDate+"' and date <='"+endDate+"'" import sparkSession.implicits._ sparkSession.sql(sql).as[UserVisitAction].rdd } }
使用spark自带的sbt编译工具打包失败
小菜目前刚学spark,安装spark的操作系统环境是ubuntu 12.0.4。安装的scala是scala-2.9.1.final.tgz。配置好环境变量后,通过git clone git://github.com/aparch/spark.git下载spark源代码后,执行命令sbt/sbt update complie,出现信息如下: NOTE: The sbt/sbt script has been relocated to build/sbt. Please update references to point to the new location. Invoking 'build/sbt update compile' now ... /root/spark/build/sbt-launch-lib.bash: line 84: java: command not found 查了半天不知道是什么原因,恳请各位大侠帮忙看看。
如何利用spark计算欧氏距离
初学spark,求问利用spark计算欧氏距离的思路是什么样的?
[hive] hive on spark hive.exec.reducers.bytes.per.reducer参数值和实际数据量不一样
hive on spark 在运行sql时,想动态控制reduce的数据,就设置了set hive.exec.reducers.bytes.per.reducer = 256000000; 但是发觉reduce变成了1个,实际数据有大概2g左右。 后来把set hive.exec.reducers.bytes.per.reducer = 32000000; 发觉reduce变成了7个 ![图片说明](https://img-ask.csdn.net/upload/201911/28/1574948343_708239.png) 切换成 hive on mr时,set hive.exec.reducers.bytes.per.reducer = 256000000又有用了 求助~ 另外发现 on mr合并小文件的参数在 on spark中设置的大小都没效果?
spark2.1.0运行spark on yarn的client模式一定需要自行编译吗?
如题,我用官网下载的预编译版本,每次运行client模式都会报错 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. 但是用cluster模式就能正常运行。网上看了一些~请问有这个说法吗? 采用的是java1.8,hadoop2.7.3,spark2.1.0。 请指导一下,谢谢!
这个问题怎么解决,docker搭建kafka的wen'ti
首先说明这个错误的前提,我没有自己在虚拟机上搭建,因为华为送了服务器,我就直接在它的服务器上搭建了docker,弄了三个容器装了kafka,直接使用docker-compose搭建集群  映射的端口就是这样子,但是呢,在IDEA连接kafka集群的时候 首先连接IP:5000,5002,5004 再连接返回的host.name =kafka1,kafka2,kafka3 最后继续连接advertised.host.name=kafka1,kafka2,kafka3 这样的情况,如果是普通服务器还好,直接在本地hosts添加主机IP映射即可 但是这个容器就添加不了了,容器的IP地址是内网设定的,我们本地访问ip肯定访问不到了。 20/01/16 22:11:04 INFO AppInfoParser: Kafka version: 2.4.0 20/01/16 22:11:04 INFO AppInfoParser: Kafka commitId: 77a89fcf8d7fa018 20/01/16 22:11:04 INFO AppInfoParser: Kafka startTimeMs: 1579183864167 20/01/16 22:11:04 INFO KafkaConsumer: [Consumer clientId=consumer-groupid1-1, groupId=groupid1] Subscribed to topic(s): test, topicongbo 20/01/16 22:11:04 INFO Metadata: [Consumer clientId=consumer-groupid1-1, groupId=groupid1] Cluster ID: Kkwgy0gkSkmGAlsC_5cz9A 20/01/16 22:11:04 INFO AbstractCoordinator: [Consumer clientId=consumer-groupid1-1, groupId=groupid1] Discovered group coordinator kafka3:9092 (id: 2147483644 rack: null) 20/01/16 22:11:06 WARN NetworkClient: [Consumer clientId=consumer-groupid1-1, groupId=groupid1] Error connecting to node kafka3:9092 (id: 2147483644 rack: null) java.net.UnknownHostException: kafka3 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:955) at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:289) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.tryConnect(ConsumerNetworkClient.java:572) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:757) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:737) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:599) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:409) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:294) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:230) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:444) at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1235) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1168) at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:172) at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.start(DirectKafkaInputDStream.scala:260) at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7(DStreamGraph.scala:54) at org.apache.spark.streaming.DStreamGraph.$anonfun$start$7$adapted(DStreamGraph.scala:54) at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:145) at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:974) at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67) at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56) at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50) at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:971) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153) at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149) at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440) at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 那么这个错误怎么解决的呢,而且华为的安全组我没有权限修改,只能5000-5010的端口对外开方
spark读取不了本地文件是怎么回事
``` textFile=sc.textFile("file:///home/hduser/pythonwork/ipynotebook/data/test.txt") stringRDD=textFile.flatMap(lambda line:line.split(' ')) stringRDD.collect() ``` 我此路径下是有test文件的: ![图片说明](https://img-ask.csdn.net/upload/201805/18/1526634813_44673.png) 错误是: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 8.0 failed 4 times, most recent failure: Lost task 1.3 in stage 8.0 (TID 58, 192.168.56.103, executor 1): java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist 。 。 。 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599) 。 。 。 Caused by: java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist ``` 而且发现若我把代码中test.txt随便改一个名字,比如ttest.txt(肯定是没有的文件) 错误竟然发生了变化: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hduser/pythonwork/ipynotebook/data/tesst.txt at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:938) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:153) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) ``` 注意: 此时我是以spark集群跑的:'spark://emaster:7077' 若是以本地跑就可以找到本地的那个test.txt文件 找hdfs文件系统的文件可以找到(在spark集群跑情况下) 。。。处由于字数显示省略了些不重要的错误提示,若想知道的话可以回复我 跪求大神帮助~感激不尽!!!
相见恨晚的超实用网站
相见恨晚的超实用网站 持续更新中。。。
Java学习的正确打开方式
在博主认为,对于入门级学习java的最佳学习方法莫过于视频+博客+书籍+总结,前三者博主将淋漓尽致地挥毫于这篇博客文章中,至于总结在于个人,实际上越到后面你会发现学习的最好方式就是阅读参考官方文档其次就是国内的书籍,博客次之,这又是一个层次了,这里暂时不提后面再谈。博主将为各位入门java保驾护航,各位只管冲鸭!!!上天是公平的,只要不辜负时间,时间自然不会辜负你。 何谓学习?博主所理解的学习,它是一个过程,是一个不断累积、不断沉淀、不断总结、善于传达自己的个人见解以及乐于分享的过程。
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过...
有哪些让程序员受益终生的建议
从业五年多,辗转两个大厂,出过书,创过业,从技术小白成长为基层管理,联合几个业内大牛回答下这个问题,希望能帮到大家,记得帮我点赞哦。 敲黑板!!!读了这篇文章,你将知道如何才能进大厂,如何实现财务自由,如何在工作中游刃有余,这篇文章很长,但绝对是精品,记得帮我点赞哦!!!! 一腔肺腑之言,能看进去多少,就看你自己了!!! 目录: 在校生篇: 为什么要尽量进大厂? 如何选择语言及方...
大学四年自学走来,这些私藏的实用工具/学习网站我贡献出来了
大学四年,看课本是不可能一直看课本的了,对于学习,特别是自学,善于搜索网上的一些资源来辅助,还是非常有必要的,下面我就把这几年私藏的各种资源,网站贡献出来给你们。主要有:电子书搜索、实用工具、在线视频学习网站、非视频学习网站、软件下载、面试/求职必备网站。 注意:文中提到的所有资源,文末我都给你整理好了,你们只管拿去,如果觉得不错,转发、分享就是最大的支持了。 一、电子书搜索 对于大部分程序员...
linux系列之常用运维命令整理笔录
本博客记录工作中需要的linux运维命令,大学时候开始接触linux,会一些基本操作,可是都没有整理起来,加上是做开发,不做运维,有些命令忘记了,所以现在整理成博客,当然vi,文件操作等就不介绍了,慢慢积累一些其它拓展的命令,博客不定时更新 free -m 其中:m表示兆,也可以用g,注意都要小写 Men:表示物理内存统计 total:表示物理内存总数(total=used+free) use...
比特币原理详解
一、什么是比特币 比特币是一种电子货币,是一种基于密码学的货币,在2008年11月1日由中本聪发表比特币白皮书,文中提出了一种去中心化的电子记账系统,我们平时的电子现金是银行来记账,因为银行的背后是国家信用。去中心化电子记账系统是参与者共同记账。比特币可以防止主权危机、信用风险。其好处不多做赘述,这一层面介绍的文章很多,本文主要从更深层的技术原理角度进行介绍。 二、问题引入 假设现有4个人...
程序员接私活怎样防止做完了不给钱?
首先跟大家说明一点,我们做 IT 类的外包开发,是非标品开发,所以很有可能在开发过程中会有这样那样的需求修改,而这种需求修改很容易造成扯皮,进而影响到费用支付,甚至出现做完了项目收不到钱的情况。 那么,怎么保证自己的薪酬安全呢? 我们在开工前,一定要做好一些证据方面的准备(也就是“讨薪”的理论依据),这其中最重要的就是需求文档和验收标准。一定要让需求方提供这两个文档资料作为开发的基础。之后开发...
网页实现一个简单的音乐播放器(大佬别看。(⊙﹏⊙))
今天闲着无事,就想写点东西。然后听了下歌,就打算写个播放器。 于是乎用h5 audio的加上js简单的播放器完工了。 演示地点演示 html代码如下` music 这个年纪 七月的风 音乐 ` 然后就是css`*{ margin: 0; padding: 0; text-decoration: none; list-...
Python十大装B语法
Python 是一种代表简单思想的语言,其语法相对简单,很容易上手。不过,如果就此小视 Python 语法的精妙和深邃,那就大错特错了。本文精心筛选了最能展现 Python 语法之精妙的十个知识点,并附上详细的实例代码。如能在实战中融会贯通、灵活使用,必将使代码更为精炼、高效,同时也会极大提升代码B格,使之看上去更老练,读起来更优雅。
数据库优化 - SQL优化
以实际SQL入手,带你一步一步走上SQL优化之路!
2019年11月中国大陆编程语言排行榜
2019年11月2日,我统计了某招聘网站,获得有效程序员招聘数据9万条。针对招聘信息,提取编程语言关键字,并统计如下: 编程语言比例 rank pl_ percentage 1 java 33.62% 2 cpp 16.42% 3 c_sharp 12.82% 4 javascript 12.31% 5 python 7.93% 6 go 7.25% 7 p...
通俗易懂地给女朋友讲:线程池的内部原理
餐盘在灯光的照耀下格外晶莹洁白,女朋友拿起红酒杯轻轻地抿了一小口,对我说:“经常听你说线程池,到底线程池到底是个什么原理?”
《奇巧淫技》系列-python!!每天早上八点自动发送天气预报邮件到QQ邮箱
将代码部署服务器,每日早上定时获取到天气数据,并发送到邮箱。 也可以说是一个小型人工智障。 知识可以运用在不同地方,不一定非是天气预报。
经典算法(5)杨辉三角
杨辉三角 是经典算法,这篇博客对它的算法思想进行了讲解,并有完整的代码实现。
英特尔不为人知的 B 面
从 PC 时代至今,众人只知在 CPU、GPU、XPU、制程、工艺等战场中,英特尔在与同行硬件芯片制造商们的竞争中杀出重围,且在不断的成长进化中,成为全球知名的半导体公司。殊不知,在「刚硬」的背后,英特尔「柔性」的软件早已经做到了全方位的支持与支撑,并持续发挥独特的生态价值,推动产业合作共赢。 而对于这一不知人知的 B 面,很多人将其称之为英特尔隐形的翅膀,虽低调,但是影响力却不容小觑。 那么,在...
腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹?
昨天,有网友私信我,说去阿里面试,彻底的被打击到了。问了为什么网上大量使用ThreadLocal的源码都会加上private static?他被难住了,因为他从来都没有考虑过这个问题。无独有偶,今天笔者又发现有网友吐槽了一道腾讯的面试题,我们一起来看看。 腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹? 在互联网职场论坛,一名程序员发帖求助到。二面腾讯,其中一个算法题:64匹...
面试官:你连RESTful都不知道我怎么敢要你?
干货,2019 RESTful最贱实践
刷了几千道算法题,这些我私藏的刷题网站都在这里了!
遥想当年,机缘巧合入了 ACM 的坑,周边巨擘林立,从此过上了"天天被虐似死狗"的生活… 然而我是谁,我可是死狗中的战斗鸡,智力不够那刷题来凑,开始了夜以继日哼哧哼哧刷题的日子,从此"读题与提交齐飞, AC 与 WA 一色 ",我惊喜的发现被题虐既刺激又有快感,那一刻我泪流满面。这么好的事儿作为一个正直的人绝不能自己独享,经过激烈的颅内斗争,我决定把我私藏的十几个 T 的,阿不,十几个刷题网...
SQL-小白最佳入门sql查询一
不要偷偷的查询我的个人资料,即使你再喜欢我,也不要这样,真的不好;
JavaScript 为什么能活到现在?
作者 | 司徒正美 责编 |郭芮 出品 | CSDN(ID:CSDNnews) JavaScript能发展到现在的程度已经经历不少的坎坷,早产带来的某些缺陷是永久性的,因此浏览器才有禁用JavaScript的选项。甚至在jQuery时代有人问出这样的问题,jQuery与JavaScript哪个快?在Babel.js出来之前,发明一门全新的语言代码代替JavaScript...
项目中的if else太多了,该怎么重构?
介绍 最近跟着公司的大佬开发了一款IM系统,类似QQ和微信哈,就是聊天软件。我们有一部分业务逻辑是这样的 if (msgType = "文本") { // dosomething } else if(msgType = "图片") { // doshomething } else if(msgType = "视频") { // doshomething } else { // doshom...
致 Python 初学者
欢迎来到“Python进阶”专栏!来到这里的每一位同学,应该大致上学习了很多 Python 的基础知识,正在努力成长的过程中。在此期间,一定遇到了很多的困惑,对未来的学习方向感到迷茫。我非常理解你们所面临的处境。我从2007年开始接触 python 这门编程语言,从2009年开始单一使用 python 应对所有的开发工作,直至今天。回顾自己的学习过程,也曾经遇到过无数的困难,也曾经迷茫过、困惑过。开办这个专栏,正是为了帮助像我当年一样困惑的 Python 初学者走出困境、快速成长。希望我的经验能真正帮到你
Python 编程开发 实用经验和技巧
Python是一门很灵活的语言,也有很多实用的方法,有时候实现一个功能可以用多种方法实现,我这里总结了一些常用的方法和技巧,包括小数保留指定位小数、判断变量的数据类型、类方法@classmethod、制表符中文对齐、遍历字典、datetime.timedelta的使用等,会持续更新......
吐血推荐珍藏的Visual Studio Code插件
作为一名Java工程师,由于工作需要,最近一个月一直在写NodeJS,这种经历可以说是一部辛酸史了。好在有神器Visual Studio Code陪伴,让我的这段经历没有更加困难。眼看这段经历要告一段落了,今天就来给大家分享一下我常用的一些VSC的插件。 VSC的插件安装方法很简单,只需要点击左侧最下方的插件栏选项,然后就可以搜索你想要的插件了。 下面我们进入正题 Material Theme ...
实战:如何通过python requests库写一个抓取小网站图片的小爬虫
有点爱好的你,偶尔应该会看点图片文字,最近小网站经常崩溃消失,不如想一个办法本地化吧,把小照片珍藏起来! 首先,准备一个珍藏的小网站,然后就可以开始啦! 第一步 我们先写一个获取网站的url的链接,因为url常常是由page或者,其他元素构成,我们就把他分离出来,我找到的网站主页下有图片区 图片区内有标题页,一个标题里有10张照片大概 所以步骤是: 第一步:进入图片区的标题页 def getH...
“狗屁不通文章生成器”登顶GitHub热榜,分分钟写出万字形式主义大作
一、垃圾文字生成器介绍 最近在浏览GitHub的时候,发现了这样一个骨骼清奇的雷人项目,而且热度还特别高。 项目中文名:狗屁不通文章生成器 项目英文名:BullshitGenerator 根据作者的介绍,他是偶尔需要一些中文文字用于GUI开发时测试文本渲染,因此开发了这个废话生成器。但由于生成的废话实在是太过富于哲理,所以最近已经被小伙伴们给玩坏了。 他的文风可能是这样的: 你发现,...
程序员:我终于知道post和get的区别
是一个老生常谈的话题,然而随着不断的学习,对于以前的认识有很多误区,所以还是需要不断地总结的,学而时习之,不亦说乎
《程序人生》系列-这个程序员只用了20行代码就拿了冠军
你知道的越多,你不知道的越多 点赞再看,养成习惯GitHub上已经开源https://github.com/JavaFamily,有一线大厂面试点脑图,欢迎Star和完善 前言 这一期不算《吊打面试官》系列的,所有没前言我直接开始。 絮叨 本来应该是没有这期的,看过我上期的小伙伴应该是知道的嘛,双十一比较忙嘛,要值班又要去帮忙拍摄年会的视频素材,还得搞个程序员一天的Vlog,还要写BU...
加快推动区块链技术和产业创新发展,2019可信区块链峰会在京召开
11月8日,由中国信息通信研究院、中国通信标准化协会、中国互联网协会、可信区块链推进计划联合主办,科技行者协办的2019可信区块链峰会将在北京悠唐皇冠假日酒店开幕。   区块链技术被认为是继蒸汽机、电力、互联网之后,下一代颠覆性的核心技术。如果说蒸汽机释放了人类的生产力,电力解决了人类基本的生活需求,互联网彻底改变了信息传递的方式,区块链作为构造信任的技术有重要的价值。   1...
相关热词 c#委托 逆变与协变 c#新建一个项目 c#获取dll文件路径 c#子窗体调用主窗体事件 c# 拷贝目录 c# 调用cef 网页填表c#源代码 c#部署端口监听项目、 c#接口中的属性使用方法 c# 昨天
立即提问