spark:在reduceByKey中,怎么获取到key的值

如题,

spark:在reduceByKey中,怎么获取到key的值

1个回答

.map((key,value)=>(key,(key,value)))
先map将value转为(key,value)
reduceByKey就可以获取到key值

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
spark-shell --master spark://ip:7337 的方式运行 报错
spark-shell --master spark://ip:7337 的方式运行 报错 readerIndex(5) + length(799024) exceeds writerIndex(176) : UNpooledUnsafeDir![图片说明](https://img-ask.csdn.net/upload/201902/20/1550655590_823550.png) (cdh集群的端口是7337没毛病,经常用的7077报错拒绝连接,可以排除端口问题) 现在提交spark-submit脚本运行standalone模式也是报同样的错,yarn模式没问题,百度谷歌无果,望大神们帮忙解决下!
使用spark的standalone模式调整心跳时间时出现Error(Invalid argument to --conf: spark.worker.timeout)?
使用spark集群运行程序时报错日志显示: ERROR TaskSchedulerImpl:70 - Lost executor 1 on : Executor heartbeat timed out after 381181 ms 所以使用spark submit更改心跳时间 [hadoop@Master spark2.4.0]$ bin/spark-submit --master spark://master:7077 --conf spark.worker.timeout 10000000 --py-files id.py id.py --name id 但是显示没有指令,请问该怎么做? Error: Invalid argument to --conf: spark.worker.timeout
spark: syntax error near unexpected token `"$ARG"'
root 用户下,run-example SparkPi,出现permission denied,授权访问后,再次运行出现如下错误,改用最新的spark版本1.6.1,也是出现同样的问题,请问该如何解决该问题? /opt/spark-lecture/spark-1.5.2-bin-hadoop2.6/bin/spark-class: line 76: syntax error near unexpected token `"$ARG"' /opt/spark-lecture/spark-1.5.2-bin-hadoop2.6/bin/spark-class: line 76: ` CMD+=("$ARG")'
spark shell在存运算结果到hdfs时报java.io.IOException: Not a file: hdfs://mini1:9000/spark/res
scala> sc.textFile("hdfs://mini1:9000/spark").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://mini1:9000/spark/res2") 执行上面的代码出错,这个目录在hdfs下是有的,而且就算没有也会创建。还有就是我运行的代码中是保存到res2目录 ,这里为什么报没有res目录 18/11/05 19:06:44 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes java.io.IOException: Not a file: hdfs://mini1:9000/spark/res at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:320) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:330) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC$$iwC.<init>(<console>:41) at $iwC$$iwC.<init>(<console>:43) at $iwC.<init>(<console>:45) at <init>(<console>:47) at .<init>(<console>:51) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
spark 读取不到hive metastore 获取不到数据库
直接上异常 ``` Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data01/hadoop/yarn/local/filecache/355/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for TERM 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for HUP 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for INT 19/08/13 19:53:17 INFO SecurityManager: Changing view acls to: yarn,hdfs 19/08/13 19:53:17 INFO SecurityManager: Changing modify acls to: yarn,hdfs 19/08/13 19:53:17 INFO SecurityManager: Changing view acls groups to: 19/08/13 19:53:17 INFO SecurityManager: Changing modify acls groups to: 19/08/13 19:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set() 19/08/13 19:53:18 INFO ApplicationMaster: Preparing Local resources 19/08/13 19:53:19 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1565610088533_0087_000001 19/08/13 19:53:19 INFO ApplicationMaster: Starting the user application in a separate Thread 19/08/13 19:53:19 INFO ApplicationMaster: Waiting for spark context initialization... 19/08/13 19:53:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292 19/08/13 19:53:19 INFO SparkContext: Submitted application: voice_stream 19/08/13 19:53:19 INFO SecurityManager: Changing view acls to: yarn,hdfs 19/08/13 19:53:19 INFO SecurityManager: Changing modify acls to: yarn,hdfs 19/08/13 19:53:19 INFO SecurityManager: Changing view acls groups to: 19/08/13 19:53:19 INFO SecurityManager: Changing modify acls groups to: 19/08/13 19:53:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set() 19/08/13 19:53:19 INFO Utils: Successfully started service 'sparkDriver' on port 20410. 19/08/13 19:53:19 INFO SparkEnv: Registering MapOutputTracker 19/08/13 19:53:19 INFO SparkEnv: Registering BlockManagerMaster 19/08/13 19:53:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/08/13 19:53:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/08/13 19:53:19 INFO DiskBlockManager: Created local directory at /data01/hadoop/yarn/local/usercache/hdfs/appcache/application_1565610088533_0087/blockmgr-94d35b97-43b2-496e-a4cb-73ecd3ed186c 19/08/13 19:53:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 19/08/13 19:53:19 INFO SparkEnv: Registering OutputCommitCoordinator 19/08/13 19:53:19 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 19/08/13 19:53:19 INFO Utils: Successfully started service 'SparkUI' on port 28852. 19/08/13 19:53:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://datanode02:28852 19/08/13 19:53:19 INFO YarnClusterScheduler: Created YarnClusterScheduler 19/08/13 19:53:20 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1565610088533_0087 and attemptId Some(appattempt_1565610088533_0087_000001) 19/08/13 19:53:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31984. 19/08/13 19:53:20 INFO NettyBlockTransferService: Server created on datanode02:31984 19/08/13 19:53:20 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/08/13 19:53:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManagerMasterEndpoint: Registering block manager datanode02:31984 with 366.3 MB RAM, BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/application_1565610088533_0087_1 19/08/13 19:53:20 INFO ApplicationMaster: =============================================================================== YARN executor launch context: env: CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/conf<CPS>/usr/hdp/2.6.5.0-292/hadoop/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.6.5.0-292/hadoop/lib/hadoop-lzo-0.6.0.2.6.5.0-292.jar:/etc/hadoop/conf/secure:/usr/hdp/current/ext/hadoop/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__ SPARK_YARN_STAGING_DIR -> *********(redacted) SPARK_USER -> *********(redacted) command: LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \ {{JAVA_HOME}}/bin/java \ -server \ -Xmx5120m \ -Djava.io.tmpdir={{PWD}}/tmp \ '-Dspark.history.ui.port=18081' \ '-Dspark.rpc.message.maxSize=100' \ -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ -XX:OnOutOfMemoryError='kill %p' \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://CoarseGrainedScheduler@datanode02:20410 \ --executor-id \ <executorId> \ --hostname \ <hostname> \ --cores \ 2 \ --app-id \ application_1565610088533_0087 \ --user-class-path \ file:$PWD/__app__.jar \ --user-class-path \ file:$PWD/hadoop-common-2.7.3.jar \ --user-class-path \ file:$PWD/guava-12.0.1.jar \ --user-class-path \ file:$PWD/hbase-server-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-protocol-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-client-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-common-1.2.8.jar \ --user-class-path \ file:$PWD/mysql-connector-java-5.1.44-bin.jar \ --user-class-path \ file:$PWD/spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar \ --user-class-path \ file:$PWD/spark-examples_2.11-1.6.0-typesafe-001.jar \ --user-class-path \ file:$PWD/fastjson-1.2.7.jar \ 1><LOG_DIR>/stdout \ 2><LOG_DIR>/stderr resources: spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar" } size: 12271027 timestamp: 1565697198603 type: FILE visibility: PRIVATE spark-examples_2.11-1.6.0-typesafe-001.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark-examples_2.11-1.6.0-typesafe-001.jar" } size: 1867746 timestamp: 1565697198751 type: FILE visibility: PRIVATE hbase-server-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-server-1.2.8.jar" } size: 4197896 timestamp: 1565697197770 type: FILE visibility: PRIVATE hbase-common-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-common-1.2.8.jar" } size: 570163 timestamp: 1565697198318 type: FILE visibility: PRIVATE __app__.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark_history_data2.jar" } size: 44924 timestamp: 1565697197260 type: FILE visibility: PRIVATE guava-12.0.1.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/guava-12.0.1.jar" } size: 1795932 timestamp: 1565697197614 type: FILE visibility: PRIVATE hbase-client-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-client-1.2.8.jar" } size: 1306401 timestamp: 1565697198180 type: FILE visibility: PRIVATE __spark_conf__ -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/__spark_conf__.zip" } size: 273513 timestamp: 1565697199131 type: ARCHIVE visibility: PRIVATE fastjson-1.2.7.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/fastjson-1.2.7.jar" } size: 417221 timestamp: 1565697198865 type: FILE visibility: PRIVATE hbase-protocol-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-protocol-1.2.8.jar" } size: 4366252 timestamp: 1565697198023 type: FILE visibility: PRIVATE __spark_libs__ -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/hdp/apps/2.6.5.0-292/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 227600110 timestamp: 1549953820247 type: ARCHIVE visibility: PUBLIC mysql-connector-java-5.1.44-bin.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/mysql-connector-java-5.1.44-bin.jar" } size: 999635 timestamp: 1565697198445 type: FILE visibility: PRIVATE hadoop-common-2.7.3.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hadoop-common-2.7.3.jar" } size: 3479293 timestamp: 1565697197476 type: FILE visibility: PRIVATE =============================================================================== 19/08/13 19:53:20 INFO RMProxy: Connecting to ResourceManager at namenode02/10.1.38.38:8030 19/08/13 19:53:20 INFO YarnRMClient: Registering the ApplicationMaster 19/08/13 19:53:20 INFO YarnAllocator: Will request 3 executor container(s), each with 2 core(s) and 5632 MB memory (including 512 MB of overhead) 19/08/13 19:53:20 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@datanode02:20410) 19/08/13 19:53:20 INFO YarnAllocator: Submitted 3 unlocalized container requests. 19/08/13 19:53:20 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 19/08/13 19:53:20 INFO AMRMClientImpl: Received new token for : datanode03:45454 19/08/13 19:53:21 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000002 on host datanode03 for executor with ID 1 19/08/13 19:53:21 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: Opening proxy : datanode03:45454 19/08/13 19:53:21 INFO AMRMClientImpl: Received new token for : datanode01:45454 19/08/13 19:53:21 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000003 on host datanode01 for executor with ID 2 19/08/13 19:53:21 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: Opening proxy : datanode01:45454 19/08/13 19:53:22 INFO AMRMClientImpl: Received new token for : datanode02:45454 19/08/13 19:53:22 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000004 on host datanode02 for executor with ID 3 19/08/13 19:53:22 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:22 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:22 INFO ContainerManagementProtocolProxy: Opening proxy : datanode02:45454 19/08/13 19:53:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.198.144:41122) with ID 1 19/08/13 19:53:25 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.229.163:24656) with ID 3 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode03:3328 with 2.5 GB RAM, BlockManagerId(1, datanode03, 3328, None) 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode02:28863 with 2.5 GB RAM, BlockManagerId(3, datanode02, 28863, None) 19/08/13 19:53:25 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.229.158:64276) with ID 2 19/08/13 19:53:25 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/08/13 19:53:25 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode01:20487 with 2.5 GB RAM, BlockManagerId(2, datanode01, 20487, None) 19/08/13 19:53:25 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 19/08/13 19:53:25 INFO SparkContext: Starting job: start at VoiceApplication2.java:128 19/08/13 19:53:25 INFO DAGScheduler: Registering RDD 1 (start at VoiceApplication2.java:128) 19/08/13 19:53:25 INFO DAGScheduler: Got job 0 (start at VoiceApplication2.java:128) with 20 output partitions 19/08/13 19:53:25 INFO DAGScheduler: Final stage: ResultStage 1 (start at VoiceApplication2.java:128) 19/08/13 19:53:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 19/08/13 19:53:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 19/08/13 19:53:26 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at start at VoiceApplication2.java:128), which has no missing parents 19/08/13 19:53:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KB, free 366.3 MB) 19/08/13 19:53:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2011.0 B, free 366.3 MB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode02:31984 (size: 2011.0 B, free: 366.3 MB) 19/08/13 19:53:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:26 INFO DAGScheduler: Submitting 50 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at start at VoiceApplication2.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 19/08/13 19:53:26 INFO YarnClusterScheduler: Adding task set 0.0 with 50 tasks 19/08/13 19:53:26 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, datanode02, executor 3, partition 0, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, datanode03, executor 1, partition 1, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, datanode01, executor 2, partition 2, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, datanode02, executor 3, partition 3, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, datanode03, executor 1, partition 4, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, datanode01, executor 2, partition 5, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode02:28863 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode03:3328 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode01:20487 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, datanode02, executor 3, partition 6, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, datanode02, executor 3, partition 7, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 693 ms on datanode02 (executor 3) (1/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 712 ms on datanode02 (executor 3) (2/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, datanode02, executor 3, partition 8, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 21 ms on datanode02 (executor 3) (3/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, datanode02, executor 3, partition 9, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 26 ms on datanode02 (executor 3) (4/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, datanode02, executor 3, partition 10, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 23 ms on datanode02 (executor 3) (5/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, datanode02, executor 3, partition 11, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 25 ms on datanode02 (executor 3) (6/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, datanode02, executor 3, partition 12, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 18 ms on datanode02 (executor 3) (7/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 14 ms on datanode02 (executor 3) (8/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, datanode02, executor 3, partition 13, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, datanode02, executor 3, partition 14, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 16 ms on datanode02 (executor 3) (9/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, datanode02, executor 3, partition 15, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 22 ms on datanode02 (executor 3) (10/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, datanode02, executor 3, partition 16, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 16 ms on datanode02 (executor 3) (11/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, datanode02, executor 3, partition 17, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 13 ms on datanode02 (executor 3) (12/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, datanode01, executor 2, partition 18, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, datanode01, executor 2, partition 19, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 787 ms on datanode01 (executor 2) (13/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 789 ms on datanode01 (executor 2) (14/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, datanode03, executor 1, partition 20, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, datanode03, executor 1, partition 21, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 905 ms on datanode03 (executor 1) (15/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 907 ms on datanode03 (executor 1) (16/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22, datanode02, executor 3, partition 22, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 23.0 in stage 0.0 (TID 23, datanode02, executor 3, partition 23, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 24.0 in stage 0.0 (TID 24, datanode01, executor 2, partition 24, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 18.0 in stage 0.0 (TID 18) in 124 ms on datanode01 (executor 2) (17/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 134 ms on datanode02 (executor 3) (18/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 25.0 in stage 0.0 (TID 25, datanode01, executor 2, partition 25, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 26.0 in stage 0.0 (TID 26, datanode03, executor 1, partition 26, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 17.0 in stage 0.0 (TID 17) in 134 ms on datanode02 (executor 3) (19/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 20.0 in stage 0.0 (TID 20) in 122 ms on datanode03 (executor 1) (20/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 27.0 in stage 0.0 (TID 27, datanode03, executor 1, partition 27, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 19.0 in stage 0.0 (TID 19) in 127 ms on datanode01 (executor 2) (21/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 21.0 in stage 0.0 (TID 21) in 123 ms on datanode03 (executor 1) (22/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 28.0 in stage 0.0 (TID 28, datanode02, executor 3, partition 28, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 29.0 in stage 0.0 (TID 29, datanode02, executor 3, partition 29, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) in 19 ms on datanode02 (executor 3) (23/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 23.0 in stage 0.0 (TID 23) in 18 ms on datanode02 (executor 3) (24/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 30.0 in stage 0.0 (TID 30, datanode01, executor 2, partition 30, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 31.0 in stage 0.0 (TID 31, datanode01, executor 2, partition 31, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 25.0 in stage 0.0 (TID 25) in 27 ms on datanode01 (executor 2) (25/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) in 29 ms on datanode01 (executor 2) (26/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 32.0 in stage 0.0 (TID 32, datanode02, executor 3, partition 32, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 29.0 in stage 0.0 (TID 29) in 16 ms on datanode02 (executor 3) (27/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 33.0 in stage 0.0 (TID 33, datanode03, executor 1, partition 33, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 26.0 in stage 0.0 (TID 26) in 30 ms on datanode03 (executor 1) (28/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 34.0 in stage 0.0 (TID 34, datanode02, executor 3, partition 34, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 28.0 in stage 0.0 (TID 28) in 21 ms on datanode02 (executor 3) (29/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 35.0 in stage 0.0 (TID 35, datanode03, executor 1, partition 35, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 27.0 in stage 0.0 (TID 27) in 32 ms on datanode03 (executor 1) (30/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 36.0 in stage 0.0 (TID 36, datanode02, executor 3, partition 36, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 32.0 in stage 0.0 (TID 32) in 11 ms on datanode02 (executor 3) (31/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 37.0 in stage 0.0 (TID 37, datanode01, executor 2, partition 37, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 30.0 in stage 0.0 (TID 30) in 18 ms on datanode01 (executor 2) (32/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 38.0 in stage 0.0 (TID 38, datanode01, executor 2, partition 38, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 31.0 in stage 0.0 (TID 31) in 20 ms on datanode01 (executor 2) (33/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 39.0 in stage 0.0 (TID 39, datanode03, executor 1, partition 39, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) in 17 ms on datanode03 (executor 1) (34/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 34.0 in stage 0.0 (TID 34) in 17 ms on datanode02 (executor 3) (35/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 40.0 in stage 0.0 (TID 40, datanode02, executor 3, partition 40, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 41.0 in stage 0.0 (TID 41, datanode03, executor 1, partition 41, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 35.0 in stage 0.0 (TID 35) in 17 ms on datanode03 (executor 1) (36/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 42.0 in stage 0.0 (TID 42, datanode02, executor 3, partition 42, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 36.0 in stage 0.0 (TID 36) in 16 ms on datanode02 (executor 3) (37/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 43.0 in stage 0.0 (TID 43, datanode01, executor 2, partition 43, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 37.0 in stage 0.0 (TID 37) in 16 ms on datanode01 (executor 2) (38/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 44.0 in stage 0.0 (TID 44, datanode02, executor 3, partition 44, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 45.0 in stage 0.0 (TID 45, datanode02, executor 3, partition 45, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 14 ms on datanode02 (executor 3) (39/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 42.0 in stage 0.0 (TID 42) in 11 ms on datanode02 (executor 3) (40/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 46.0 in stage 0.0 (TID 46, datanode03, executor 1, partition 46, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 39.0 in stage 0.0 (TID 39) in 20 ms on datanode03 (executor 1) (41/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 47.0 in stage 0.0 (TID 47, datanode03, executor 1, partition 47, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 41.0 in stage 0.0 (TID 41) in 20 ms on datanode03 (executor 1) (42/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 48.0 in stage 0.0 (TID 48, datanode01, executor 2, partition 48, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 49.0 in stage 0.0 (TID 49, datanode01, executor 2, partition 49, PROCESS_LOCAL, 7888 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 43.0 in stage 0.0 (TID 43) in 18 ms on datanode01 (executor 2) (43/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 38.0 in stage 0.0 (TID 38) in 31 ms on datanode01 (executor 2) (44/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 45.0 in stage 0.0 (TID 45) in 11 ms on datanode02 (executor 3) (45/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 44.0 in stage 0.0 (TID 44) in 16 ms on datanode02 (executor 3) (46/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 46.0 in stage 0.0 (TID 46) in 18 ms on datanode03 (executor 1) (47/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 48.0 in stage 0.0 (TID 48) in 15 ms on datanode01 (executor 2) (48/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 47.0 in stage 0.0 (TID 47) in 15 ms on datanode03 (executor 1) (49/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 49.0 in stage 0.0 (TID 49) in 25 ms on datanode01 (executor 2) (50/50) 19/08/13 19:53:27 INFO YarnClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 19/08/13 19:53:27 INFO DAGScheduler: ShuffleMapStage 0 (start at VoiceApplication2.java:128) finished in 1.174 s 19/08/13 19:53:27 INFO DAGScheduler: looking for newly runnable stages 19/08/13 19:53:27 INFO DAGScheduler: running: Set() 19/08/13 19:53:27 INFO DAGScheduler: waiting: Set(ResultStage 1) 19/08/13 19:53:27 INFO DAGScheduler: failed: Set() 19/08/13 19:53:27 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[2] at start at VoiceApplication2.java:128), which has no missing parents 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.2 KB, free 366.3 MB) 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1979.0 B, free 366.3 MB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode02:31984 (size: 1979.0 B, free: 366.3 MB) 19/08/13 19:53:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:27 INFO DAGScheduler: Submitting 20 missing tasks from ResultStage 1 (ShuffledRDD[2] at start at VoiceApplication2.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 19/08/13 19:53:27 INFO YarnClusterScheduler: Adding task set 1.0 with 20 tasks 19/08/13 19:53:27 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 50, datanode03, executor 1, partition 0, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 51, datanode02, executor 3, partition 1, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 52, datanode01, executor 2, partition 3, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 53, datanode03, executor 1, partition 2, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 54, datanode02, executor 3, partition 4, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 55, datanode01, executor 2, partition 5, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode02:28863 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode01:20487 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode03:3328 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.229.163:24656 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.198.144:41122 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.229.158:64276 19/08/13 19:53:27 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 56, datanode03, executor 1, partition 7, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 53) in 192 ms on datanode03 (executor 1) (1/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 57, datanode03, executor 1, partition 8, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 56) in 25 ms on datanode03 (executor 1) (2/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 58, datanode02, executor 3, partition 6, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 51) in 220 ms on datanode02 (executor 3) (3/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 14.0 in stage 1.0 (TID 59, datanode03, executor 1, partition 14, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 57) in 17 ms on datanode03 (executor 1) (4/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 16.0 in stage 1.0 (TID 60, datanode03, executor 1, partition 16, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 14.0 in stage 1.0 (TID 59) in 15 ms on datanode03 (executor 1) (5/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 16.0 in stage 1.0 (TID 60) in 21 ms on datanode03 (executor 1) (6/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 61, datanode02, executor 3, partition 9, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 54) in 269 ms on datanode02 (executor 3) (7/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 50) in 339 ms on datanode03 (executor 1) (8/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 62, datanode02, executor 3, partition 10, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 58) in 56 ms on datanode02 (executor 3) (9/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 63, datanode01, executor 2, partition 11, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 55) in 284 ms on datanode01 (executor 2) (10/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 64, datanode01, executor 2, partition 12, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 52) in 287 ms on datanode01 (executor 2) (11/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 65, datanode02, executor 3, partition 13, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID 66, datanode02, executor 3, partition 15, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 62) in 25 ms on datanode02 (executor 3) (12/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 61) in 29 ms on datanode02 (executor 3) (13/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID 67, datanode02, executor 3, partition 17, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 15.0 in stage 1.0 (TID 66) in 13 ms on datanode02 (executor 3) (14/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 13.0 in stage 1.0 (TID 65) in 16 ms on datanode02 (executor 3) (15/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID 68, datanode02, executor 3, partition 18, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID 69, datanode01, executor 2, partition 19, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 11.0 in stage 1.0 (TID 63) in 30 ms on datanode01 (executor 2) (16/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 64) in 30 ms on datanode01 (executor 2) (17/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 17.0 in stage 1.0 (TID 67) in 17 ms on datanode02 (executor 3) (18/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 19.0 in stage 1.0 (TID 69) in 13 ms on datanode01 (executor 2) (19/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 18.0 in stage 1.0 (TID 68) in 20 ms on datanode02 (executor 3) (20/20) 19/08/13 19:53:27 INFO YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 19/08/13 19:53:27 INFO DAGScheduler: ResultStage 1 (start at VoiceApplication2.java:128) finished in 0.406 s 19/08/13 19:53:27 INFO DAGScheduler: Job 0 finished: start at VoiceApplication2.java:128, took 1.850883 s 19/08/13 19:53:27 INFO ReceiverTracker: Starting 1 receivers 19/08/13 19:53:27 INFO ReceiverTracker: ReceiverTracker started 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@4044ec97 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO MappedDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO MappedDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO MappedDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@5dd4b960 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@132d0c3c 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO MappedDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO MappedDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO MappedDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@5dd4b960 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@525bed0c 19/08/13 19:53:27 INFO DAGScheduler: Got job 1 (start at VoiceApplication2.java:128) with 1 output partitions 19/08/13 19:53:27 INFO DAGScheduler: Final stage: ResultStage 2 (start at VoiceApplication2.java:128) 19/08/13 19:53:27 INFO DAGScheduler: Parents of final stage: List() 19/08/13 19:53:27 INFO DAGScheduler: Missing parents: List() 19/08/13 19:53:27 INFO DAGScheduler: Submitting ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:613), which has no missing parents 19/08/13 19:53:27 INFO ReceiverTracker: Receiver 0 started 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 133.5 KB, free 366.2 MB) 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 36.3 KB, free 366.1 MB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on datanode02:31984 (size: 36.3 KB, free: 366.3 MB) 19/08/13 19:53:27 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:27 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:613) (first 15 tasks are for partitions Vector(0)) 19/08/13 19:53:27 INFO YarnClusterScheduler: Adding task set 2.0 with 1 tasks 19/08/13 19:53:27 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, datanode01, executor 2, partition 0, PROCESS_LOCAL, 8757 bytes) 19/08/13 19:53:27 INFO RecurringTimer: Started timer for JobGenerator at time 1565697240000 19/08/13 19:53:27 INFO JobGenerator: Started JobGenerator at 1565697240000 ms 19/08/13 19:53:27 INFO JobScheduler: Started JobScheduler 19/08/13 19:53:27 INFO StreamingContext: StreamingContext started 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on datanode01:20487 (size: 36.3 KB, free: 2.5 GB) 19/08/13 19:53:27 INFO ReceiverTracker: Registered receiver for stream 0 from 10.1.229.158:64276 19/08/13 19:54:00 INFO JobScheduler: Added jobs for time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.0 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.1 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Finished job streaming job 1565697240000 ms.1 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Finished job streaming job 1565697240000 ms.0 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.2 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO SharedState: loading hive config file: file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/85431/__spark_conf__.zip/__hadoop_conf__/hive-site.xml 19/08/13 19:54:00 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('hdfs://CID-042fb939-95b4-4b74-91b8-9f94b999bdf7/apps/hive/warehouse'). 19/08/13 19:54:00 INFO SharedState: Warehouse path is 'hdfs://CID-042fb939-95b4-4b74-91b8-9f94b999bdf7/apps/hive/warehouse'. 19/08/13 19:54:00 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode02:31984 in memory (size: 1979.0 B, free: 366.3 MB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode02:28863 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode01:20487 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode03:3328 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:02 INFO CodeGenerator: Code generated in 175.416957 ms 19/08/13 19:54:02 INFO JobScheduler: Finished job streaming job 1565697240000 ms.2 from job set of time 1565697240000 ms 19/08/13 19:54:02 ERROR JobScheduler: Error running job streaming job 1565697240000 ms.2 org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/08/13 19:54:02 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/08/13 19:54:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ) 19/08/13 19:54:02 INFO StreamingContext: Invoking stop(stopGracefully=true) from shutdown hook 19/08/13 19:54:02 INFO ReceiverTracker: Sent stop signal to all 1 receivers 19/08/13 19:54:02 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver 19/08/13 19:54:02 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 70) in 35055 ms on datanode01 (executor 2) (1/1) 19/08/13 19:54:02 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 19/08/13 19:54:02 INFO DAGScheduler: ResultStage 2 (start at VoiceApplication2.java:128) finished in 35.086 s 19/08/13 19:54:02 INFO ReceiverTracker: Waiting for receiver job to terminate gracefully 19/08/13 19:54:02 INFO ReceiverTracker: Waited for receiver job to terminate gracefully 19/08/13 19:54:02 INFO ReceiverTracker: All of the receivers have deregistered successfully 19/08/13 19:54:02 INFO ReceiverTracker: ReceiverTracker stopped 19/08/13 19:54:02 INFO JobGenerator: Stopping JobGenerator gracefully 19/08/13 19:54:02 INFO JobGenerator: Waiting for all received blocks to be consumed for job generation 19/08/13 19:54:02 INFO JobGenerator: Waited for all received blocks to be consumed for job generation 19/08/13 19:54:12 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) 19/08/13 19:54:12 ERROR Utils: Uncaught exception in thread pool-1-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ```
java程序中获取spark任务的计算结果
如题,开发了一个spark任务,通过java web 提交到spark集群,如果获取计算返回的 结果?
spark问题,怎么从hdfs获取目录下的文件名
如题,我想获取hdfs下的文件名怎么获取。用java Spark
spark RDD中的元组如何按照指定格式保存到HDFS上?
请教一个问题:spark数据清洗的结果为RDD[(String, String)]类型的rdd,在这个RDD中,每一个元素都是一个元组。元组的key值是文件名,value值是文件内容,我现在想把整个RDD保存在HDFS上,让RDD中的每一个元素保存为一个文件,其中key值作为文件名,而value值作为文件内容。 应该如何实现呢? RDD好像不支持遍历,只能通过collect()方法保存为一个数组,再进行遍历,但是这样可能会把内存撑爆,目前的做法是先把RDD通过saveAsTextFile方法保存在HDFS上,然后再使用FSDataInputStream输入流对保存后的part文件进行遍历读取,使用输出流写到HDFS上,这样很耗时。 请问有没有好一点的方法,可以直接把RDD的内容写到HDFS上呢?
当jar在hdfs的时候提交spark job报错
(一)jar不在hdfs上的时候提交spark任务成功,使用的命令: spark-submit --master spark://192.168.244.130:7077 --class cn.com.cnpc.klmy.common.WordCount2 --executor-memory 1G --total-executor-cores 2 /root/modelcall-2.0.jar (二)而当jar在hdfs上的时候提交spark任务报错:classNotFoundException呢?,命令如下: spark-submit --master spark://192.168.244.130:7077 --class cn.com.cnpc.klmy.common.WordCount2 --executor-memory 1G --total-executor-cores 2 hdfs://192.168.244.130:9000/mdjar/modelcall-2.0.jar 请教各位大咖这到底是什么原因造成的?望各位大咖不吝赐教!跪谢!!! 注:hdfs能够正常访问,代码里面产生的结果存在hdfs上(第一情况正常运行,在hdfs上能够查看到结果)
spark和javaweb整合,如何通过页面提交spark任务,并过去结果
首先说一下想要达到的效果,就是网页有一个按钮,用户可以通过按钮提交任务到spark,spark集群运行并得出结果,结果能够返回给页面或者服务器。主要就是有两个问题。第一:如何通过服务器提交spark任务,让spark跑起来,第二:获取spark得出结果,能够在页面显示,或者我能通过程序获取到,有经验的或者有思路的大牛们帮忙解答一下,必有重谢!!!!
Spark 中core没有分配
刚学习Hadoop+Spark ![图片说明](https://img-ask.csdn.net/upload/201706/05/1496653166_781159.png) 这是在virtualBox中建立的master->(slave1, slave2)集群 我通过 spark-shell --executor-memory 512m --master spark://master:7077 打开spark,然后发现,在UI中显示的状态是Waiting 我的spark-env.sh配置如下: export SPARK_MASTER_IP=master export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=300m export SPARK_EXECUTOR_INSTANCES=1 然后在scala中执行: val textFile=sc.textFile("hdfs://master:9000/home/hduser/wordcount/input/LICENSE.txt") textFile.count 出现如下错误 ![图片说明](https://img-ask.csdn.net/upload/201706/05/1496653839_218354.png)
Spark提交作业为什么一定要conf.setJars(),它的具体作用到底是什么?
代码如下: ``` package wordcount import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.rdd.RDD object WordCount extends App { val conf = new SparkConf() //就是这里,为什必须要有它,它的具体作用到底是啥? .set("spark.jars", "src/main/resources/sparkcore.jar,") .set("spark.app.name", "WordCount") .set("spark.master", "spark://master:7077") .set("spark.driver.host", "win") .set("spark.executor.memory", "512M") .set("spark.eventLog.enabled", "true") .set("spark.eventLog.dir", "hdfs://master:9000/spark/history") val sc=new SparkContext(conf) val lines:RDD[String]=sc.textFile("hdfs://master:9000/user/dsf/wordcount_input") val words:RDD[String]=lines.flatMap(_.split(" ")) val wordAndOne:RDD[(String,Int)]=words.map((_,1)) val reduce:RDD[(String,Int)]=wordAndOne.reduceByKey(_+_) val sorted:RDD[(String,Int)]=reduce.sortBy(_._2, ascending=false,numPartitions=1) sorted.saveAsTextFile("hdfs://master:9000/user/dsf/wordcount_output") println("\ntextFile: "+lines.collect().toBuffer) println("flatMap: "+words.collect().toBuffer) println("map: "+wordAndOne.collect().toBuffer) println("reduceByKey: "+reduce.collect().toBuffer) println("sortBy: "+sorted.collect().toBuffer) sc.stop() } /** 在Linux终端运行此应用的命令行: spark-submit \ --master spark://master:7077 \ --class wordcount.WordCount \ sparkcore.jar */ ``` 如果没有.set("spark.jars", "src/main/resources/sparkcore.jar,")这段代码,它会报这个异常: ``` Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 192.168.1.15, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD ``` ![这是我在Spark官网找到的:](https://img-ask.csdn.net/upload/201810/14/1539507421_95226.jpg) 翻译过来是: spark.jars: 以逗号分隔的本地jar列表,包含在驱动程序和执行程序类路径中。 按照官网的意思,是Driver和Excutor都应该有程序的jar包,可我不明白它的具体原理,哪位好心人给讲解一下,谢谢!
Spark 连接 mongodb 用python
按照官网教程 1 from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("myApp") \ .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/Spark-Test.Numbers") \ .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/Spark-Test.Numbers") \ .getOrCreate() df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load() 结果报错Caused by: java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource 2 我看需要用--packages这个命令导入包 cmd>> pyspark --package org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 报错:Exception in thread "main" java.lang.IllegalArgumentException: pyspark does not 3 完全按照官方来 cmd>>pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.myCollection" --packages org.mongodb.spark:mongo-spark-connector_2.10:1.1.0 报错:'D:\SparkNew\spark\bin\pyspark2.cmd" --conf "spark.mongodb.input.uri' 不是内部或外部命令, 也不是可运行的程序或批处理文件。 不太明白我用的pyspark,怎么报错是pyspark2.cmd 那怎么才能跟mongodb连接呢,就是找不到DefaultSource.DefaultSource的事啊
spark集群运行错误:15
这是一个仿照网上例子,自己学习测试的。用scala编写写了一个wordCount的例子,在myEclipse上是可以运行的,并可以得出结果。现在将例子导出jar包,然后放到hadoop集群上运行,出现如下错误:Stack trace: ExitCodeException exitCode=15跪求各路大神帮忙, 这个问题已经困扰我一个星期了,网上也找了很久,没找到解决办法。没有多少分了 。。。非常感谢!!!环境: hadoop2.6.2 spark2.2 jdk1.8 scala2.2hadoop集群应该是没有问题的,浏览器可以打开50070的页面下面是spark on yarn的环境:export JAVA_HOME=/usr/local/src/jdk1.8.0_144export SPARK_MASTER_IP=node1export SPARK_MASTER_PORT=7077export SPARK_WORKER_CORES=1export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_MEMORY=1gexport SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181"export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopexport SPARK_HOME=/usr/local/src/spark-2.2.0-bin-hadoop2.6export SPARK_JAR=/usr/local/src/spark-2.2.0-bin-hadoop2.6/jars/*.jarexport PATH=$SPARK_HOME/bin:$PATHWordCount例子:object wc { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("wc") val sc = new SparkContext(conf) val text = sc.textFile("test.txt") val words = text.flatMap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val results = pairs.reduceByKey(_+_).map(tuple => (tuple._2 , tuple._1 )) val sorted = results.sortByKey(false).map(tuple => (tuple._2 , tuple._1 )) sorted.foreach(x => println(x)) sc.stop() }}错误信息:Application Attempt State: FAILEDAM Container: container_1507729080248_0001_01_000001Node: N/ATracking URL: HistoryDiagnostics Info: AM Container for appattempt_1507729080248_0001_000001 exited with exitCode: 15For more detailed output, check application tracking page:http://node1:8088/proxy/application_1507729080248_0001/Then, click on links to logs of each attempt.Diagnostics: Exception from container-launch.Container id: container_1507729080248_0001_01_000001Exit code: 15Stack trace: ExitCodeException exitCode=15:at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)at org.apache.hadoop.util.Shell.run(Shell.java:455)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Container exited with a non-zero exit code 15Failing this attempt![图片](https://img-ask.csdn.net/upload/201710/12/1507764756_681945.jpg)![图片](https://img-ask.csdn.net/upload/201710/12/1507764760_746436.jpg)![图片](https://img-ask.csdn.net/upload/201710/12/1507764811_639764.jpg)
对Spark RDD中的数据进行处理
Spark新手。 现在在程序中生成了一个VertexRDD[(String,String)]. 其中的值是如下这种形式的: (3477,267 6106 7716 8221 18603 19717 28189) (2631,18589 18595 25725 26023 26026 27866) (10969,18591 25949 25956 26041) (10218,9320 19950 20493 26031) (5860,18583 18595 25725 26233) (11501,1551 26187 27170) (5717,2596 5187 5720 18583 25725) (950,19667 20493 25725 26024 26033 26192 27279 27281) (13397,19943 26377) (2899,4720 8411 19081 20100 20184 20270 20480 20493 20573 20574 25891) (11424,19816 19819 19841 20244 27098) (8951,5914 18609 26057) (1909,8797 18608 19785 19786 27531) (12807,20040 20608 27159)(后面用到的数据) (17953,1718 6112 18603 18608) 前面的值是key,后面的一串字符是value(由空格隔开) 现在我想对于这个RDD,将每一条数据value中的空格隔开的每个值取出并两两组合,形成一个新的key-value的数据,然后形成一个新的RDD,比如 对(12807,20040 20608 27159)这一条数据,处理后得到的是 (20040,20608) (20040,27159) (20608,27159) 怎么才能实现?求问
Hive on spark查询报错。
求助!!!在hadoop使用Hive on spark执行Bigbench测试时,一直会有报错,log信息: FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation. An error occured while running command: ========== runEngineCmd -f /var/lib/hadoop-hdfs/Big-Bench/engines/hive/queries/q04/q04.sql ========== 在网上查了很多资料,有说版本不匹配的,有说是概率性问题,有没有大佬来瞅一眼啊。。哭了
spark submit 提交集群任务后,spark Web UI界面不显示,但是有4040界面,显示local模式
遇到如下问题,求教大神: 集群有三个节点,111为master。剩余两个为slave。每个节点 4核,6.6G。 提交命令如下 nohup bin/spark-submit --master spark://sousou:7077 --executor-memory 1g --total-executor-cores 2 --class AnalyzeInfo /spark/jar/v2_AnalyzeInfo.jar & nohup bin/spark-submit --master spark://sousou111:7077 --executor-memory 1g --total-executor-cores 2 --class SaveInfoMain /spark/jar/saveAnn.jar & 问题如下: 1. spark submit 提交集群任务后,spark Web UI界面不显示SaveInfoMain,但是有4040界面,且查看界面Environment显示local模式。这是为什么啊?这样造成的问题是程序没有办法在界面停止。且这个程序有时候会造成处理数据异常缓慢,偶尔处理三四个小时之前的数据,AnalyzeInfo这个任务就不会产生这个问题。 2. 而且这两个任务出现的共同点是:我设置的触发HDFS上的目录下文件就优雅停止程序,刚运行时还可以,但是这两个程序运行时间长了,比如说一天后我上传到HDFS上文件,这两程序就不能成功停止了。 Environment图片如下: ![图片说明](https://img-ask.csdn.net/upload/201810/23/1540264892_86550.png) ![图片说明](https://img-ask.csdn.net/upload/201810/23/1540264909_714074.png)
Spark平台下运行WordCount时遇到如下的报错该如何处理?求各路大神指教。。。
还有另外几个WARN 15/05/19 11:19:19 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 15/05/19 11:19:33 INFO AppClient$ClientActor: Connecting to master spark://172.18.219.136:7077... 15/05/19 11:19:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/05/19 11:19:49 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/05/19 11:19:53 INFO AppClient$ClientActor: Connecting to master spark://172.18.219.136:7077... 15/05/19 11:20:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/05/19 11:20:13 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 15/05/19 11:20:13 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/05/19 11:20:13 INFO TaskSchedulerImpl: Cancelling stage 1 15/05/19 11:20:13 INFO DAGScheduler: Failed to run collect at WordCount.scala:31 Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.
爬虫福利二 之 妹子图网MM批量下载
爬虫福利一:27报网MM批量下载    点击 看了本文,相信大家对爬虫一定会产生强烈的兴趣,激励自己去学习爬虫,在这里提前祝:大家学有所成! 目标网站:妹子图网 环境:Python3.x 相关第三方模块:requests、beautifulsoup4 Re:各位在测试时只需要将代码里的变量 path 指定为你当前系统要保存的路径,使用 python xxx.py 或IDE运行即可。
Java学习的正确打开方式
在博主认为,对于入门级学习java的最佳学习方法莫过于视频+博客+书籍+总结,前三者博主将淋漓尽致地挥毫于这篇博客文章中,至于总结在于个人,实际上越到后面你会发现学习的最好方式就是阅读参考官方文档其次就是国内的书籍,博客次之,这又是一个层次了,这里暂时不提后面再谈。博主将为各位入门java保驾护航,各位只管冲鸭!!!上天是公平的,只要不辜负时间,时间自然不会辜负你。 何谓学习?博主所理解的学习,它
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过
大学四年自学走来,这些私藏的实用工具/学习网站我贡献出来了
大学四年,看课本是不可能一直看课本的了,对于学习,特别是自学,善于搜索网上的一些资源来辅助,还是非常有必要的,下面我就把这几年私藏的各种资源,网站贡献出来给你们。主要有:电子书搜索、实用工具、在线视频学习网站、非视频学习网站、软件下载、面试/求职必备网站。 注意:文中提到的所有资源,文末我都给你整理好了,你们只管拿去,如果觉得不错,转发、分享就是最大的支持了。 一、PDF搜索网站推荐 对于大部
linux系列之常用运维命令整理笔录
本博客记录工作中需要的linux运维命令,大学时候开始接触linux,会一些基本操作,可是都没有整理起来,加上是做开发,不做运维,有些命令忘记了,所以现在整理成博客,当然vi,文件操作等就不介绍了,慢慢积累一些其它拓展的命令,博客不定时更新 顺便拉下票,我在参加csdn博客之星竞选,欢迎投票支持,每个QQ或者微信每天都可以投5票,扫二维码即可,http://m234140.nofollow.ax.
Vue + Spring Boot 项目实战(十四):用户认证方案与完善的访问拦截
本篇文章主要讲解 token、session 等用户认证方案的区别并分析常见误区,以及如何通过前后端的配合实现完善的访问拦截,为下一步权限控制的实现打下基础。
比特币原理详解
一、什么是比特币 比特币是一种电子货币,是一种基于密码学的货币,在2008年11月1日由中本聪发表比特币白皮书,文中提出了一种去中心化的电子记账系统,我们平时的电子现金是银行来记账,因为银行的背后是国家信用。去中心化电子记账系统是参与者共同记账。比特币可以防止主权危机、信用风险。其好处不多做赘述,这一层面介绍的文章很多,本文主要从更深层的技术原理角度进行介绍。 二、问题引入  假设现有4个人
程序员接私活怎样防止做完了不给钱?
首先跟大家说明一点,我们做 IT 类的外包开发,是非标品开发,所以很有可能在开发过程中会有这样那样的需求修改,而这种需求修改很容易造成扯皮,进而影响到费用支付,甚至出现做完了项目收不到钱的情况。 那么,怎么保证自己的薪酬安全呢? 我们在开工前,一定要做好一些证据方面的准备(也就是“讨薪”的理论依据),这其中最重要的就是需求文档和验收标准。一定要让需求方提供这两个文档资料作为开发的基础。之后开发
网页实现一个简单的音乐播放器(大佬别看。(⊙﹏⊙))
今天闲着无事,就想写点东西。然后听了下歌,就打算写个播放器。 于是乎用h5 audio的加上js简单的播放器完工了。 欢迎 改进 留言。 演示地点跳到演示地点 html代码如下`&lt;!DOCTYPE html&gt; &lt;html&gt; &lt;head&gt; &lt;title&gt;music&lt;/title&gt; &lt;meta charset="utf-8"&gt
Python十大装B语法
Python 是一种代表简单思想的语言,其语法相对简单,很容易上手。不过,如果就此小视 Python 语法的精妙和深邃,那就大错特错了。本文精心筛选了最能展现 Python 语法之精妙的十个知识点,并附上详细的实例代码。如能在实战中融会贯通、灵活使用,必将使代码更为精炼、高效,同时也会极大提升代码B格,使之看上去更老练,读起来更优雅。 1. for - else 什么?不是 if 和 else 才
数据库优化 - SQL优化
前面一篇文章从实例的角度进行数据库优化,通过配置一些参数让数据库性能达到最优。但是一些“不好”的SQL也会导致数据库查询变慢,影响业务流程。本文从SQL角度进行数据库优化,提升SQL运行效率。 判断问题SQL 判断SQL是否有问题时可以通过两个表象进行判断: 系统级别表象 CPU消耗严重 IO等待严重 页面响应时间过长
2019年11月中国大陆编程语言排行榜
2019年11月2日,我统计了某招聘网站,获得有效程序员招聘数据9万条。针对招聘信息,提取编程语言关键字,并统计如下: 编程语言比例 rank pl_ percentage 1 java 33.62% 2 c/c++ 16.42% 3 c_sharp 12.82% 4 javascript 12.31% 5 python 7.93% 6 go 7.25% 7
通俗易懂地给女朋友讲:线程池的内部原理
餐厅的约会 餐盘在灯光的照耀下格外晶莹洁白,女朋友拿起红酒杯轻轻地抿了一小口,对我说:“经常听你说线程池,到底线程池到底是个什么原理?”我楞了一下,心里想女朋友今天是怎么了,怎么突然问出这么专业的问题,但做为一个专业人士在女朋友面前也不能露怯啊,想了一下便说:“我先给你讲讲我前同事老王的故事吧!” 大龄程序员老王 老王是一个已经北漂十多年的程序员,岁数大了,加班加不动了,升迁也无望,于是拿着手里
经典算法(5)杨辉三角
写在前面: 我是 扬帆向海,这个昵称来源于我的名字以及女朋友的名字。我热爱技术、热爱开源、热爱编程。技术是开源的、知识是共享的。 这博客是对自己学习的一点点总结及记录,如果您对 Java、算法 感兴趣,可以关注我的动态,我们一起学习。 用知识改变命运,让我们的家人过上更好的生活。 目录一、杨辉三角的介绍二、杨辉三角的算法思想三、代码实现1.第一种写法2.第二种写法 一、杨辉三角的介绍 百度
腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹?
昨天,有网友私信我,说去阿里面试,彻底的被打击到了。问了为什么网上大量使用ThreadLocal的源码都会加上private static?他被难住了,因为他从来都没有考虑过这个问题。无独有偶,今天笔者又发现有网友吐槽了一道腾讯的面试题,我们一起来看看。 腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹? 在互联网职场论坛,一名程序员发帖求助到。二面腾讯,其中一个算法题:64匹
面试官:你连RESTful都不知道我怎么敢要你?
面试官:了解RESTful吗? 我:听说过。 面试官:那什么是RESTful? 我:就是用起来很规范,挺好的 面试官:是RESTful挺好的,还是自我感觉挺好的 我:都挺好的。 面试官:… 把门关上。 我:… 要干嘛?先关上再说。 面试官:我说出去把门关上。 我:what ?,夺门而去 文章目录01 前言02 RESTful的来源03 RESTful6大原则1. C-S架构2. 无状态3.统一的接
为啥国人偏爱Mybatis,而老外喜欢Hibernate/JPA呢?
关于SQL和ORM的争论,永远都不会终止,我也一直在思考这个问题。昨天又跟群里的小伙伴进行了一番讨论,感触还是有一些,于是就有了今天这篇文。 声明:本文不会下关于Mybatis和JPA两个持久层框架哪个更好这样的结论。只是摆事实,讲道理,所以,请各位看官勿喷。 一、事件起因 关于Mybatis和JPA孰优孰劣的问题,争论已经很多年了。一直也没有结论,毕竟每个人的喜好和习惯是大不相同的。我也看
SQL-小白最佳入门sql查询一
一 说明 如果是初学者,建议去网上寻找安装Mysql的文章安装,以及使用navicat连接数据库,以后的示例基本是使用mysql数据库管理系统; 二 准备前提 需要建立一张学生表,列分别是id,名称,年龄,学生信息;本示例中文章篇幅原因SQL注释略; 建表语句: CREATE TABLE `student` ( `id` int(11) NOT NULL AUTO_INCREMENT, `
项目中的if else太多了,该怎么重构?
介绍 最近跟着公司的大佬开发了一款IM系统,类似QQ和微信哈,就是聊天软件。我们有一部分业务逻辑是这样的 if (msgType = "文本") { // dosomething } else if(msgType = "图片") { // doshomething } else if(msgType = "视频") { // doshomething } else { // dosho
“狗屁不通文章生成器”登顶GitHub热榜,分分钟写出万字形式主义大作
一、垃圾文字生成器介绍 最近在浏览GitHub的时候,发现了这样一个骨骼清奇的雷人项目,而且热度还特别高。 项目中文名:狗屁不通文章生成器 项目英文名:BullshitGenerator 根据作者的介绍,他是偶尔需要一些中文文字用于GUI开发时测试文本渲染,因此开发了这个废话生成器。但由于生成的废话实在是太过富于哲理,所以最近已经被小伙伴们给玩坏了。 他的文风可能是这样的: 你发现,
程序员:我终于知道post和get的区别
IT界知名的程序员曾说:对于那些月薪三万以下,自称IT工程师的码农们,其实我们从来没有把他们归为我们IT工程师的队伍。他们虽然总是以IT工程师自居,但只是他们一厢情愿罢了。 此话一出,不知激起了多少(码农)程序员的愤怒,却又无可奈何,于是码农问程序员。 码农:你知道get和post请求到底有什么区别? 程序员:你看这篇就知道了。 码农:你月薪三万了? 程序员:嗯。 码农:你是怎么做到的? 程序员:
《程序人生》系列-这个程序员只用了20行代码就拿了冠军
你知道的越多,你不知道的越多 点赞再看,养成习惯GitHub上已经开源https://github.com/JavaFamily,有一线大厂面试点脑图,欢迎Star和完善 前言 这一期不算《吊打面试官》系列的,所有没前言我直接开始。 絮叨 本来应该是没有这期的,看过我上期的小伙伴应该是知道的嘛,双十一比较忙嘛,要值班又要去帮忙拍摄年会的视频素材,还得搞个程序员一天的Vlog,还要写BU
加快推动区块链技术和产业创新发展,2019可信区块链峰会在京召开
      11月8日,由中国信息通信研究院、中国通信标准化协会、中国互联网协会、可信区块链推进计划联合主办,科技行者协办的2019可信区块链峰会将在北京悠唐皇冠假日酒店开幕。   区块链技术被认为是继蒸汽机、电力、互联网之后,下一代颠覆性的核心技术。如果说蒸汽机释放了人类的生产力,电力解决了人类基本的生活需求,互联网彻底改变了信息传递的方式,区块链作为构造信任的技术有重要的价值。   1
程序员把地府后台管理系统做出来了,还有3.0版本!12月7号最新消息:已在开发中有github地址
第一幕:缘起 听说阎王爷要做个生死簿后台管理系统,我们派去了一个程序员…… 996程序员做的梦: 第一场:团队招募 为了应对地府管理危机,阎王打算找“人”开发一套地府后台管理系统,于是就在地府总经办群中发了项目需求。 话说还是中国电信的信号好,地府都是满格,哈哈!!! 经常会有外行朋友问:看某网站做的不错,功能也简单,你帮忙做一下? 而这次,面对这样的需求,这个程序员
Android 9.0系统新特性,对刘海屏设备进行适配
其实Android 9.0系统已经是去年推出的“老”系统了,这个系统中新增了一个比较重要的特性,就是对刘海屏设备进行了支持。一直以来我也都有打算针对这个新特性好好地写一篇文章,但是为什么直到拖到了Android 10.0系统都发布了才开始写这篇文章呢?当然,一是因为我这段时间确实比较忙,今年几乎绝大部分的业余时间都放到写新书上了。但是最主要的原因并不是这个,而是因为刘海屏设备的适配存在一定的特殊性
网易云6亿用户音乐推荐算法
网易云音乐是音乐爱好者的集聚地,云音乐推荐系统致力于通过 AI 算法的落地,实现用户千人千面的个性化推荐,为用户带来不一样的听歌体验。 本次分享重点介绍 AI 算法在音乐推荐中的应用实践,以及在算法落地过程中遇到的挑战和解决方案。 将从如下两个部分展开: AI 算法在音乐推荐中的应用 音乐场景下的 AI 思考 从 2013 年 4 月正式上线至今,网易云音乐平台持续提供着:乐屏社区、UGC
【技巧总结】位运算装逼指南
位算法的效率有多快我就不说,不信你可以去用 10 亿个数据模拟一下,今天给大家讲一讲位运算的一些经典例子。不过,最重要的不是看懂了这些例子就好,而是要在以后多去运用位运算这些技巧,当然,采用位运算,也是可以装逼的,不信,你往下看。我会从最简单的讲起,一道比一道难度递增,不过居然是讲技巧,那么也不会太难,相信你分分钟看懂。 判断奇偶数 判断一个数是基于还是偶数,相信很多人都做过,一般的做法的代码如下
日均350000亿接入量,腾讯TubeMQ性能超过Kafka
整理 | 夕颜出品 | AI科技大本营(ID:rgznai100) 【导读】近日,腾讯开源动作不断,相继开源了分布式消息中间件TubeMQ,基于最主流的 OpenJDK8开发的
8年经验面试官详解 Java 面试秘诀
    作者 | 胡书敏 责编 | 刘静 出品 | CSDN(ID:CSDNnews) 本人目前在一家知名外企担任架构师,而且最近八年来,在多家外企和互联网公司担任Java技术面试官,前后累计面试了有两三百位候选人。在本文里,就将结合本人的面试经验,针对Java初学者、Java初级开发和Java开发,给出若干准备简历和准备面试的建议。   Java程序员准备和投递简历的实
面试官如何考察你的思维方式?
1.两种思维方式在求职面试中,经常会考察这种问题:北京有多少量特斯拉汽车? 某胡同口的煎饼摊一年能卖出多少个煎饼? 深圳有多少个产品经理? 一辆公交车里能装下多少个乒乓球? 一
碎片化的时代,如何学习
今天周末,和大家聊聊学习这件事情。 在如今这个社会,我们的时间被各类 APP 撕的粉碎。 刷知乎、刷微博、刷朋友圈; 看论坛、看博客、看公号; 等等形形色色的信息和知识获取方式一个都不错过。 貌似学了很多,但是却感觉没什么用。 要解决上面这些问题,首先要分清楚一点,什么是信息,什么是知识。 那什么是信息呢? 你一切听到的、看到的,都是信息,比如微博上的明星出轨、微信中的表情大战、抖音上的...
相关热词 c# plc s1200 c#里氏转换原则 c# 主界面 c# do loop c#存为组套 模板 c# 停掉协程 c# rgb 读取图片 c# 图片颜色调整 最快 c#多张图片上传 c#密封类与密封方法
立即提问