Spark Task卡住的问题、Scheduler Delay 很长的问题

我写了一个label encoder的demo,逻辑很简单,从hive中读一张表(46.5M大小,很小很小),
然后对多列进行label encoder(label encoder不支持多列,使用了pipeline操作),然后从中抽取字典,最后写入hive中。

但是有个问题,就是在我执行到pipeline后面的action时,提交的作业会卡在这很久,不知道是什么原因,不可能是数据倾斜,这么小的数据。。。我对spark的调度只知表面,不知深层,请大神指点一二,谢谢各位大神。

下面是代码和ui界面:

代码部分

图片说明

spark ui 相关界面

图片说明

count at NullValueCheck 是校验一下空值,这个直接读取hive表,count一下,countByValue是StringIndexer类中的方法。他们的执行时间还可接受。

NullValueCheck的DAG图界面:
图片说明

下面是countByValue方法的DAG图界面:
图片说明

下面是count at LabelEncoder这了,这里提交了pipeline任务,然后就卡在这了:

图片说明

下面两张是count at LabelEnocder job的DAG:
图片说明
图片说明

下面是这个stages界面,可以看到scheduler delay很长,task time 没有,任务卡在这了:

图片说明

下面是executor界面供参考:
图片说明

下面是这个卡住的用的总时长和后面保存表操作,可以看到这个提交pipeline任务的时间跟别的不在一个等级上,里面因为scheduler delay卡住很长时间:
图片说明

烦请各位大神帮忙看下,通过这次指导我一定能从中获取到更多spark任务相关知识,谢谢各位大神了。

1个回答

我也遇到这个问题,我初步判断是小文件太多了.一个小文件启动一个task,,然后并行处理,,,,,合并小文件应该可以解决...

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
求助,win10系统task scheduler导致CPU占用100%
Dell笔记本用的insider win10,周末升级了一次。周一开机后task scheduler就开始运行,占用CPU70多,CPU总占用100,风扇一直响.(以前这个服务也运行过)。通过任务管理器,得手动关闭3次,才能把服务关掉,风扇不再响,但是过了一会,它又自己启动了。然后我就想,既然是计划任务,就让它运行一次,晚上没关机。但今天早上来公司,它还在运行。 尝试在服务管理里和QQ管家里禁用这个服务,但都失败。在计算机管理的任务计划程序里找到了task scheduler文件夹,但是内容是空的。 网上说这个服务是有用的,禁用会增加系统不稳定性,但是现在的问题是这个服务在持续不断的运行,一天一夜了,要运行到什么时候呢? 该怎么解决呢,要找办法禁用这个服务吗?还是说是因为某些计划任务导致的,禁掉相关的计划任务就行呢?
static const AP_Scheduler::Task
static const AP_Scheduler::Task scheduler_tasks[] PROGMEM = { { rc_loop, 4, 10 }, { throttle_loop, 8, 45 }, { update_GPS, 8, 90 }, #if OPTFLOW == ENABLED { update_optflow, 8, 20 }, #endif}; 上面代码中AP_Scheduler为一个class,Task 为AP_Scheduler内中的一个结构体,请问应该如何解读上面这段C++代码的意思?特别是static const AP_Scheduler::Task scheduler_tasks[] PROGMEM这种表达方式。还有rc_loop,throttle_loop都是函数,这样{ rc_loop, 4, 10 },的表达是什么意思?
shell 无法获取命令的输出结果
shell 获取脚本的输出结果 百度了很多资料,依然无法得到我想要的答案 我本来是要执行这个命令 ./cspub/cs_client -c ./cspub//client/client.conf -f ./url_list/tgb 这条命令输出结果应该是 NOTICE: 12-04 22:35:46: * 0 [connect_scheduler:485] yq01-ps-beehive-agent118850.yq01:7455 NOTICE: 12-04 22:35:46: * 0 [main:1370] login succeed, task_id = 2495372168972472539 NOTICE: 12-04 22:35:46: * 0 [send_task:870] send 3200 urls. 0 online host num 31 也就是说4条结果,但是每次我获取结果的时候,只能获取到最后一行,“0 online host num 31” 其他三行无法获得 代码: #!/bin/bash A=`./cspub/cs_client -c ./cspub//client/client.conf -f ./url_list/tgbdaa` echo 'aaaaaaaaa' echo $A echo $? 输出结果: NOTICE: 12-04 22:40:59: * 0 [connect_scheduler:485] yq01-ps-beehive-agent118876.yq01:7455 NOTICE: 12-04 22:40:59: * 0 [main:1370] login succeed, task_id = 2495460135153963424 NOTICE: 12-04 22:40:59: * 0 [send_task:870] send 3200 urls. aaaaaaaaa online host num 31 0
这样的coredump如何定位问题,求大神指导
# 一个服务程序,平均一天挂一次,coredump内容都已一样,分析不出来 ``` c++ Program terminated with signal 11, Segmentation fault. #0 SLL_Next (t=0x2) at src/linked_list.h:45 45 src/linked_list.h: No such file or directory. Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.el7_5.x86_64 openldap-2.4.44-20.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 SLL_Next (t=0x2) at src/linked_list.h:45 #1 SLL_TryPop (rv=<synthetic pointer>, list=0x1f9c080) at src/linked_list.h:69 #2 TryPop (rv=<synthetic pointer>, this=0x1f9c080) at src/thread_cache.h:220 #3 Allocate (cl=4, size=48, this=<optimized out>) at src/thread_cache.h:381 #4 malloc_fast_path<tcmalloc::allocate_full_cpp_throw_oom> (size=<optimized out>) at src/tcmalloc.cc:1751 #5 tc_new (size=<optimized out>) at src/tcmalloc.cc:1851 #6 0x00007fd6bf3bca19 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6 #7 0x00007fd6c2c3842d in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from ./lib/libcpprest.so.2.10 #8 0x00007fd6bf3be6d8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6 #9 0x000000000069b836 in InformationCommand::Process (this=0x7ec8a38, task=..., response=@0x7fd58890f3f8: 0x0) at src/modules/DataCommand/InformationCommand.cpp:74 #10 0x0000000000724c62 in ThreadPoolGateway::task (this=0x252bda0, task=0x3b792f0) at src/modules/Servers/impl/ThreadPoolGateway.cpp:115 #11 0x0000000000726306 in boost::_mfi::mf1<void, ThreadPoolGateway, HDConstants::TaskDef*>::operator() (this=0x606ac40, p=0x252bda0, a1=0x3b792f0) at /usr/local/include/boost/bind/mem_fn_template.hpp:165 #12 0x00000000007261fd in boost::_bi::list2<boost::_bi::value<ThreadPoolGateway*>, boost::_bi::value<HDConstants::TaskDef*> >::operator()<boost::_mfi::mf1<void, ThreadPoolGateway, HDConstants::TaskDef*>, boost::_bi::list0> (this=0x606ac50, f=..., a=...) at /usr/local/include/boost/bind/bind.hpp:319 #13 0x0000000000725fc5 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, ThreadPoolGateway, HDConstants::TaskDef*>, boost::_bi::list2<boost::_bi::value<ThreadPoolGateway*>, boost::_bi::value<HDConstants::TaskDef*> > >::operator() (this=0x606ac40) at /usr/local/include/boost/bind/bind.hpp:1294 #14 0x0000000000725eea in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, ThreadPoolGateway, HDConstants::TaskDef*>, boost::_bi::list2<boost::_bi::value<ThreadPoolGateway*>, boost::_bi::value<HDConstants::TaskDef*> > >, void>::invoke (function_obj_ptr=...) at /usr/local/include/boost/function/function_template.hpp:159 #15 0x0000000000660f7e in boost::function0<void>::operator() (this=0x7fd58890fa80) at /usr/local/include/boost/function/function_template.hpp:760 ---Type <return> to continue, or q <return> to quit--- #16 0x000000000067cc94 in boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks>::execute_task (this=0x258c180) at /usr/local/include/boost/threadpool/detail/pool_core.hpp:440 #17 0x000000000067bd44 in boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> >::run (this=0x25a0150) at /usr/local/include/boost/threadpool/detail/worker_thread.hpp:82 #18 0x000000000067f4ee in boost::_mfi::mf0<void, boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > >::call<boost::shared_ptr<boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > > > (this=0x2571298, u=...) at /usr/local/include/boost/bind/mem_fn_template.hpp:40 #19 0x000000000067f473 in boost::_mfi::mf0<void, boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > >::operator()<boost::shared_ptr<boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > > > (this=0x2571298, u=...) at /usr/local/include/boost/bind/mem_fn_template.hpp:55 #20 0x000000000067f41e in boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > > > >::operator()<boost::_mfi::mf0<void, boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > >, boost::_bi::list0> (this=0x25712a8, f=..., a=...) at /usr/local/include/boost/bind/bind.hpp:259 #21 0x000000000067f303 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > >, boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > > > > >::operator() (this=0x2571298) at /usr/local/include/boost/bind/bind.hpp:1294 #22 0x000000000067f0ee in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, boost::threadpool::detail::worker_---Type <return> to continue, or q <return> to quit--- thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > >, boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::threadpool::detail::worker_thread<boost::threadpool::detail::pool_core<boost::function0<void>, boost::threadpool::fifo_scheduler, boost::threadpool::static_size, boost::threadpool::resize_controller, boost::threadpool::wait_for_all_tasks> > > > > > >::run ( this=0x25710e0) at /usr/local/include/boost/thread/detail/thread.hpp:116 #23 0x00007fd6bfa826d9 in thread_proxy () from ./lib/libboost_thread.so.1.64.0 #24 0x00007fd6bfeabdd5 in start_thread () from /lib64/libpthread.so.0 #25 0x00007fd6beb17ead in clone () from /lib64/libc.so.6 (gdb) (gdb) (gdb) f 9 #9 0x000000000069b836 in InformationCommand::Process (this=0x7ec8a38, task=..., response=@0x7fd58890f3f8: 0x0) at src/modules/DataCommand/InformationCommand.cpp:74 74 src/modules/DataCommand/InformationCommand.cpp: No such file or directory. (gdb) p task $1 = (const HDConstants::TaskDef &) @0x3b792f0: {data = 0x4537b60, socket = 0x3122000, clientAddr = 3748199939, clientPort = 33385, msgType = (unknown: 0), protocol = 2} (gdb) p task.data.size $2 = 71 (gdb) p serialNum $3 = 4623 (gdb) ``` ![Core源码位置](https://img-ask.csdn.net/upload/201912/05/1575510670_285044.png)
我把hive-site.xml放进spark/conf/里后报了一堆警告,怎么处理,不处理有影响吗?
之前配置的时候一直没发现忘记把hive-site.xml配置文件放到spark/conf中,今天把文件放进去,结果一打开pyspark就报一堆错,使用sparksql的时候也是报一堆警告,警告如下: 因为太长,所以先把想法写在这。我想知道怎么可以把这个提示的等级调高,或者怎么可以解决这些警告,麻烦大佬们帮忙看看,谢谢! ```shell To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2019-11-14 17:13:50,994 WARN conf.HiveConf: HiveConf of name hive.metastore.client.capability.check does not exist 2019-11-14 17:13:50,994 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.false.positive.probability does not exist 2019-11-14 17:13:50,994 WARN conf.HiveConf: HiveConf of name hive.druid.broker.address.default does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.io.orc.time.counters does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve-fraction.min does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.ppd.enabled does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.metastore.event.message.factory does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.server2.metrics.enabled does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.tez.hs2.user.access does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.druid.storage.storageDirectory does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.timeout.ms does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction.threshold does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.connect.retry.limit does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.xmx.headroom does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.tez.dynamic.semijoin.reduction does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.direct does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.stats does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.client.consistent.splits does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.start does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.ttl does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.management.acl does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.delegation.token.lifetime does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.guidKey does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.ats.hook.queue.capacity does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.strict.checks.large.query does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.tez.bigtable.minsize.semijoin.reduction does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.min does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.alloc.size does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.wait.queue.comparator.class.name does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.port does not exist 2019-11-14 17:13:50,995 WARN conf.HiveConf: HiveConf of name hive.orc.cache.use.soft.references does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.enabled does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction.max does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.listener.thread-count does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.tez.container.max.java.heap.fraction does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.stats.column.autogather does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am.liveness.heartbeat.interval.ms does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.io.decoding.metrics.percentiles.intervals does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.groupby.position.alias does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.metastore.txn.store.impl does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.spark.use.groupby.shuffle does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.object.cache.enabled does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.server2.parallel.ops.in.session does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.groupby.limit.extrastep does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.location does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.delay.seconds does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.materializedview.fileformat does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.file.cleaner.threads does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.test.fail.compaction does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.blobstore.use.blobstore.as.scratchdir does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.service.metrics.class does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap.path does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.download.permanent.fns does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.historic.queries does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.vectorized.execution.reducesink.new.enabled does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.compactor.max.num.delta does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.attempted does not exist 2019-11-14 17:13:50,996 WARN conf.HiveConf: HiveConf of name hive.server2.webui.port does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.compactor.initiator.failed.compacts.threshold does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.service.metrics.reporter does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.max.pending.writes does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.llap.execution.mode does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.llap.enable.grace.join.in.llap does not exist 2019-11-14 17:13:50,999 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.mode does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.io.threadpool.size does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.druid.select.threshold does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.scratchdir.lock does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.server2.webui.use.spnego does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.service.metrics.file.frequency does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.hs2.coordinator.enabled does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.timeout.seconds does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.optimize.filter.stats.reduction does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.exec.orc.base.delta.ratio does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.metastore.fastpath does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.test.fail.heartbeater does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.file.cleanup.delay.seconds does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.management.rpc.port does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.mapjoin.hybridgrace.bloomfilter does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.tree does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.metastore.stats.ndv.tuner does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.query.length does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.failed does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.server2.close.session.on.disconnect does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.optimize.ppd.windowing does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.metastore.initial.metadata.count.enabled does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.server2.webui.host does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.orc.splits.ms.footer.cache.enabled does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup.min does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.file.metadata.threads does not exist 2019-11-14 17:13:51,000 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.service.refresh.interval.sec does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.output.size does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.remote.token.requires.signing does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.cache.allow.synthetic.fileid does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.hash.table.inflation.factor does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.hbase.ttl does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.auto.enforce.vectorized does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.writeset.reaper.interval does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.order.columnalignment does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.service.send.buffer.size does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.exec.schema.evolution does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.values.clause does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.server2.llap.concurrent.queries does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.auto.allow.uber does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.partition.size.max does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.auto.auth does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.orc.splits.include.fileid does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.communicator.num.threads does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.orderby.position.alias does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.sleep.between.retries.ms does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.partitions does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.component does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.shuffle.port does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.direct.sql.max.elements.in.clause does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.druid.passiveWaitTimeMs does not exist 2019-11-14 17:13:51,001 WARN conf.HiveConf: HiveConf of name hive.load.dynamic.partitions.thread does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.segments.granularity does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.response.header.size does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.conf.internal.variable.list does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductionpercentage does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.repl.cm.enabled does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.retry.limit does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.serialize.in.tasks does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.query.timeout.seconds does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.service.metrics.hadoop2.frequency does not exist 2019-11-14 17:13:51,002 WARN conf.HiveConf: HiveConf of name hive.orc.splits.directory.batch.ms does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.reader.wait does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.max.timeout.ms does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.max.open.txns does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.reduce.side does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.server2.zookeeper.publish.configs does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.auto.convert.join.hashtable.max.entries does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.init.threads does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.metastore.authorization.storage.check.externaltable.drop does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.execution.mode does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.cbo.cnf.maxnodes does not exist 2019-11-14 17:13:51,004 WARN conf.HiveConf: HiveConf of name hive.vectorized.adaptor.usage.mode does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.materializedview.rewriting does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupMembershipKey does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.catalog.cache.size does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.cbo.show.warnings does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.fshandler.threads does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.tez.max.bloom.filter.entries does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.io.metadata.fraction does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.materializedview.serde does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.wait.queue.size does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.cache.entries does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.txn.operational.properties does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.memory.ttl does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.port does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.io.nonvector.wrapper.enabled does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.cache.size does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vectorized.input.format does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.optimize.cte.materialize.threshold does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.clean.until does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.optimize.semijoin.conversion does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.port does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.metrics.enabled does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.repl.rootdir does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.limit.partition.request does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.async.log.enabled does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.logger does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.allow.udf.load.on.demand does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.cli.tez.session.async does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.tez.bloom.filter.factor does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.am-reporter.max.threads does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.spark.use.file.size.for.mapjoin does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.strict.checks.bucketing does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.tez.bucket.pruning.compat does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.server2.webui.spnego.principal does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.preemption.metrics.intervals does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.shuffle.dir.watcher.enabled does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.arena.count does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.use.SSL does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.task.communicator.connection.timeout.ms does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.transpose.aggr.join does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.druid.maxTries does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.spark.dynamic.partition.pruning.max.data.size does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.base does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggr.stats.invalidator.frequency does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.lrfu does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.mmap does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.druid.coordinator.address.default does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.max.fetch.size does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.conf.hidden.list does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.io.sarg.cache.max.weight.mb does not exist 2019-11-14 17:13:51,005 WARN conf.HiveConf: HiveConf of name hive.server2.clear.dangling.scratchdir.interval does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.druid.sleep.time does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.row.serde.deserialize does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.server2.compile.lock.timeout does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.timedout.txn.reaper.interval does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.aggregate.stats.max.variance does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.io.lrfu.lambda does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.druid.metadata.db.type does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.output.stream.timeout does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.transactional.events.mem does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.resultset.default.fetch.size does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.repl.cm.retain does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.merge.cardinality.check does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.server2.authentication.ldap.groupClassKey does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.optimize.point.lookup does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.allow.permanent.fns does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.ssl does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.txn.manager.dump.lock.state.on.acquire.timeout does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.compactor.history.retention.succeeded does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.io.use.fileid.path does not exist 2019-11-14 17:13:51,006 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.row.count does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.mapjoin.optimized.hashtable.probe.percent does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.druid.select.distribute does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.am.use.fqdn does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.reenable.min.timeout.ms does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.validate.acls does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.support.special.characters.tablename does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.mv.files.thread does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.skip.compile.udf.check does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.enabled does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.repl.cm.interval does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.server2.sleep.interval.between.start.attempts does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.yarn.container.mb does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.druid.http.read.timeout does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.blobstore.optimizations.enabled does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.orc.gap.cache does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.exec.copyfile.maxnumfiles does not exist 2019-11-14 17:13:51,007 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.formats does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.druid.http.numConnection does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.task.scheduler.enable.preemption does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.num.executors does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.full does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.connection.class does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.server2.tez.sessions.custom.queue.allowed does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.slice.lrr does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.metastore.hbase.cache.max.writer.wait does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.http.request.header.size does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.server2.webui.max.threads does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.optimize.limittranspose.reductiontuples does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.test.rollbacktxn does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.num.schedulable.tasks.per.node does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.acl does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.io.memory.size does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.strict.checks.type.safety does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.server2.async.exec.async.compile does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.llap.auto.max.input.size does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.tez.enable.memory.manager does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.msck.repair.batch.size does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.blobstore.supported.schemes does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.orc.splits.allow.synthetic.fileid does not exist 2019-11-14 17:13:51,008 WARN conf.HiveConf: HiveConf of name hive.stats.filter.in.factor does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.spark.use.op.stats does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.exec.input.listing.max.threads does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.server2.tez.session.lifetime.jitter does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.web.port does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.strict.checks.cartesian.product does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.rpc.num.handlers does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.vcpus.per.instance does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.count.open.txns.interval does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.tez.min.bloom.filter.entries does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.optimize.partition.columns.separate does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.orc.cache.stripe.details.mem.size does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.txn.heartbeat.threadpool.size does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.locality.delay does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.repl.cmrootdir does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.task.scheduler.node.disable.backoff.factor does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.am.liveness.connection.sleep.between.retries.ms does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.spark.exec.inplace.progress does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.druid.working.directory does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.daemon.memory.per.instance.mb does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.msck.path.validation does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.tez.task.scale.memory.reserve.fraction does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.merge.nway.joins does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.compactor.history.reaper.interval does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.io.encode.vector.serde.async.enabled does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.server2.in.place.progress does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.druid.indexer.memory.rownum.max does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.server2.xsrf.filter.enabled does not exist 2019-11-14 17:13:51,009 WARN conf.HiveConf: HiveConf of name hive.llap.io.allocator.alloc.max does not exist Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.4 (default, Sep 20 2019 17:49:03) SparkSession available as 'spark'. ```
spark 读取不到hive metastore 获取不到数据库
直接上异常 ``` Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data01/hadoop/yarn/local/filecache/355/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for TERM 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for HUP 19/08/13 19:53:17 INFO SignalUtils: Registered signal handler for INT 19/08/13 19:53:17 INFO SecurityManager: Changing view acls to: yarn,hdfs 19/08/13 19:53:17 INFO SecurityManager: Changing modify acls to: yarn,hdfs 19/08/13 19:53:17 INFO SecurityManager: Changing view acls groups to: 19/08/13 19:53:17 INFO SecurityManager: Changing modify acls groups to: 19/08/13 19:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set() 19/08/13 19:53:18 INFO ApplicationMaster: Preparing Local resources 19/08/13 19:53:19 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1565610088533_0087_000001 19/08/13 19:53:19 INFO ApplicationMaster: Starting the user application in a separate Thread 19/08/13 19:53:19 INFO ApplicationMaster: Waiting for spark context initialization... 19/08/13 19:53:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292 19/08/13 19:53:19 INFO SparkContext: Submitted application: voice_stream 19/08/13 19:53:19 INFO SecurityManager: Changing view acls to: yarn,hdfs 19/08/13 19:53:19 INFO SecurityManager: Changing modify acls to: yarn,hdfs 19/08/13 19:53:19 INFO SecurityManager: Changing view acls groups to: 19/08/13 19:53:19 INFO SecurityManager: Changing modify acls groups to: 19/08/13 19:53:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set() 19/08/13 19:53:19 INFO Utils: Successfully started service 'sparkDriver' on port 20410. 19/08/13 19:53:19 INFO SparkEnv: Registering MapOutputTracker 19/08/13 19:53:19 INFO SparkEnv: Registering BlockManagerMaster 19/08/13 19:53:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/08/13 19:53:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/08/13 19:53:19 INFO DiskBlockManager: Created local directory at /data01/hadoop/yarn/local/usercache/hdfs/appcache/application_1565610088533_0087/blockmgr-94d35b97-43b2-496e-a4cb-73ecd3ed186c 19/08/13 19:53:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 19/08/13 19:53:19 INFO SparkEnv: Registering OutputCommitCoordinator 19/08/13 19:53:19 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 19/08/13 19:53:19 INFO Utils: Successfully started service 'SparkUI' on port 28852. 19/08/13 19:53:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://datanode02:28852 19/08/13 19:53:19 INFO YarnClusterScheduler: Created YarnClusterScheduler 19/08/13 19:53:20 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1565610088533_0087 and attemptId Some(appattempt_1565610088533_0087_000001) 19/08/13 19:53:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31984. 19/08/13 19:53:20 INFO NettyBlockTransferService: Server created on datanode02:31984 19/08/13 19:53:20 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/08/13 19:53:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManagerMasterEndpoint: Registering block manager datanode02:31984 with 366.3 MB RAM, BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, datanode02, 31984, None) 19/08/13 19:53:20 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/application_1565610088533_0087_1 19/08/13 19:53:20 INFO ApplicationMaster: =============================================================================== YARN executor launch context: env: CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/conf<CPS>/usr/hdp/2.6.5.0-292/hadoop/*<CPS>/usr/hdp/2.6.5.0-292/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.6.5.0-292/hadoop/lib/hadoop-lzo-0.6.0.2.6.5.0-292.jar:/etc/hadoop/conf/secure:/usr/hdp/current/ext/hadoop/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__ SPARK_YARN_STAGING_DIR -> *********(redacted) SPARK_USER -> *********(redacted) command: LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" \ {{JAVA_HOME}}/bin/java \ -server \ -Xmx5120m \ -Djava.io.tmpdir={{PWD}}/tmp \ '-Dspark.history.ui.port=18081' \ '-Dspark.rpc.message.maxSize=100' \ -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ -XX:OnOutOfMemoryError='kill %p' \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://CoarseGrainedScheduler@datanode02:20410 \ --executor-id \ <executorId> \ --hostname \ <hostname> \ --cores \ 2 \ --app-id \ application_1565610088533_0087 \ --user-class-path \ file:$PWD/__app__.jar \ --user-class-path \ file:$PWD/hadoop-common-2.7.3.jar \ --user-class-path \ file:$PWD/guava-12.0.1.jar \ --user-class-path \ file:$PWD/hbase-server-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-protocol-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-client-1.2.8.jar \ --user-class-path \ file:$PWD/hbase-common-1.2.8.jar \ --user-class-path \ file:$PWD/mysql-connector-java-5.1.44-bin.jar \ --user-class-path \ file:$PWD/spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar \ --user-class-path \ file:$PWD/spark-examples_2.11-1.6.0-typesafe-001.jar \ --user-class-path \ file:$PWD/fastjson-1.2.7.jar \ 1><LOG_DIR>/stdout \ 2><LOG_DIR>/stderr resources: spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark-streaming-kafka-0-8-assembly_2.11-2.3.2.jar" } size: 12271027 timestamp: 1565697198603 type: FILE visibility: PRIVATE spark-examples_2.11-1.6.0-typesafe-001.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark-examples_2.11-1.6.0-typesafe-001.jar" } size: 1867746 timestamp: 1565697198751 type: FILE visibility: PRIVATE hbase-server-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-server-1.2.8.jar" } size: 4197896 timestamp: 1565697197770 type: FILE visibility: PRIVATE hbase-common-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-common-1.2.8.jar" } size: 570163 timestamp: 1565697198318 type: FILE visibility: PRIVATE __app__.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/spark_history_data2.jar" } size: 44924 timestamp: 1565697197260 type: FILE visibility: PRIVATE guava-12.0.1.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/guava-12.0.1.jar" } size: 1795932 timestamp: 1565697197614 type: FILE visibility: PRIVATE hbase-client-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-client-1.2.8.jar" } size: 1306401 timestamp: 1565697198180 type: FILE visibility: PRIVATE __spark_conf__ -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/__spark_conf__.zip" } size: 273513 timestamp: 1565697199131 type: ARCHIVE visibility: PRIVATE fastjson-1.2.7.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/fastjson-1.2.7.jar" } size: 417221 timestamp: 1565697198865 type: FILE visibility: PRIVATE hbase-protocol-1.2.8.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hbase-protocol-1.2.8.jar" } size: 4366252 timestamp: 1565697198023 type: FILE visibility: PRIVATE __spark_libs__ -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/hdp/apps/2.6.5.0-292/spark2/spark2-hdp-yarn-archive.tar.gz" } size: 227600110 timestamp: 1549953820247 type: ARCHIVE visibility: PUBLIC mysql-connector-java-5.1.44-bin.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/mysql-connector-java-5.1.44-bin.jar" } size: 999635 timestamp: 1565697198445 type: FILE visibility: PRIVATE hadoop-common-2.7.3.jar -> resource { scheme: "hdfs" host: "CID-042fb939-95b4-4b74-91b8-9f94b999bdf7" port: -1 file: "/user/hdfs/.sparkStaging/application_1565610088533_0087/hadoop-common-2.7.3.jar" } size: 3479293 timestamp: 1565697197476 type: FILE visibility: PRIVATE =============================================================================== 19/08/13 19:53:20 INFO RMProxy: Connecting to ResourceManager at namenode02/10.1.38.38:8030 19/08/13 19:53:20 INFO YarnRMClient: Registering the ApplicationMaster 19/08/13 19:53:20 INFO YarnAllocator: Will request 3 executor container(s), each with 2 core(s) and 5632 MB memory (including 512 MB of overhead) 19/08/13 19:53:20 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@datanode02:20410) 19/08/13 19:53:20 INFO YarnAllocator: Submitted 3 unlocalized container requests. 19/08/13 19:53:20 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 19/08/13 19:53:20 INFO AMRMClientImpl: Received new token for : datanode03:45454 19/08/13 19:53:21 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000002 on host datanode03 for executor with ID 1 19/08/13 19:53:21 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: Opening proxy : datanode03:45454 19/08/13 19:53:21 INFO AMRMClientImpl: Received new token for : datanode01:45454 19/08/13 19:53:21 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000003 on host datanode01 for executor with ID 2 19/08/13 19:53:21 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:21 INFO ContainerManagementProtocolProxy: Opening proxy : datanode01:45454 19/08/13 19:53:22 INFO AMRMClientImpl: Received new token for : datanode02:45454 19/08/13 19:53:22 INFO YarnAllocator: Launching container container_e20_1565610088533_0087_01_000004 on host datanode02 for executor with ID 3 19/08/13 19:53:22 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/08/13 19:53:22 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/08/13 19:53:22 INFO ContainerManagementProtocolProxy: Opening proxy : datanode02:45454 19/08/13 19:53:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.198.144:41122) with ID 1 19/08/13 19:53:25 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.229.163:24656) with ID 3 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode03:3328 with 2.5 GB RAM, BlockManagerId(1, datanode03, 3328, None) 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode02:28863 with 2.5 GB RAM, BlockManagerId(3, datanode02, 28863, None) 19/08/13 19:53:25 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.1.229.158:64276) with ID 2 19/08/13 19:53:25 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/08/13 19:53:25 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done 19/08/13 19:53:25 INFO BlockManagerMasterEndpoint: Registering block manager datanode01:20487 with 2.5 GB RAM, BlockManagerId(2, datanode01, 20487, None) 19/08/13 19:53:25 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 19/08/13 19:53:25 INFO SparkContext: Starting job: start at VoiceApplication2.java:128 19/08/13 19:53:25 INFO DAGScheduler: Registering RDD 1 (start at VoiceApplication2.java:128) 19/08/13 19:53:25 INFO DAGScheduler: Got job 0 (start at VoiceApplication2.java:128) with 20 output partitions 19/08/13 19:53:25 INFO DAGScheduler: Final stage: ResultStage 1 (start at VoiceApplication2.java:128) 19/08/13 19:53:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 19/08/13 19:53:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 19/08/13 19:53:26 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at start at VoiceApplication2.java:128), which has no missing parents 19/08/13 19:53:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KB, free 366.3 MB) 19/08/13 19:53:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2011.0 B, free 366.3 MB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode02:31984 (size: 2011.0 B, free: 366.3 MB) 19/08/13 19:53:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:26 INFO DAGScheduler: Submitting 50 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at start at VoiceApplication2.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 19/08/13 19:53:26 INFO YarnClusterScheduler: Adding task set 0.0 with 50 tasks 19/08/13 19:53:26 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, datanode02, executor 3, partition 0, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, datanode03, executor 1, partition 1, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, datanode01, executor 2, partition 2, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, datanode02, executor 3, partition 3, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, datanode03, executor 1, partition 4, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, datanode01, executor 2, partition 5, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode02:28863 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode03:3328 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode01:20487 (size: 2011.0 B, free: 2.5 GB) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, datanode02, executor 3, partition 6, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, datanode02, executor 3, partition 7, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 693 ms on datanode02 (executor 3) (1/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 712 ms on datanode02 (executor 3) (2/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, datanode02, executor 3, partition 8, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 21 ms on datanode02 (executor 3) (3/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, datanode02, executor 3, partition 9, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 26 ms on datanode02 (executor 3) (4/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, datanode02, executor 3, partition 10, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 23 ms on datanode02 (executor 3) (5/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, datanode02, executor 3, partition 11, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 25 ms on datanode02 (executor 3) (6/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, datanode02, executor 3, partition 12, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 18 ms on datanode02 (executor 3) (7/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 14 ms on datanode02 (executor 3) (8/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, datanode02, executor 3, partition 13, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, datanode02, executor 3, partition 14, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 16 ms on datanode02 (executor 3) (9/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, datanode02, executor 3, partition 15, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 22 ms on datanode02 (executor 3) (10/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, datanode02, executor 3, partition 16, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 16 ms on datanode02 (executor 3) (11/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, datanode02, executor 3, partition 17, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 13 ms on datanode02 (executor 3) (12/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, datanode01, executor 2, partition 18, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, datanode01, executor 2, partition 19, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 787 ms on datanode01 (executor 2) (13/50) 19/08/13 19:53:26 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 789 ms on datanode01 (executor 2) (14/50) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, datanode03, executor 1, partition 20, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:26 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, datanode03, executor 1, partition 21, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 905 ms on datanode03 (executor 1) (15/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 907 ms on datanode03 (executor 1) (16/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22, datanode02, executor 3, partition 22, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 23.0 in stage 0.0 (TID 23, datanode02, executor 3, partition 23, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 24.0 in stage 0.0 (TID 24, datanode01, executor 2, partition 24, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 18.0 in stage 0.0 (TID 18) in 124 ms on datanode01 (executor 2) (17/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 134 ms on datanode02 (executor 3) (18/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 25.0 in stage 0.0 (TID 25, datanode01, executor 2, partition 25, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 26.0 in stage 0.0 (TID 26, datanode03, executor 1, partition 26, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 17.0 in stage 0.0 (TID 17) in 134 ms on datanode02 (executor 3) (19/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 20.0 in stage 0.0 (TID 20) in 122 ms on datanode03 (executor 1) (20/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 27.0 in stage 0.0 (TID 27, datanode03, executor 1, partition 27, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 19.0 in stage 0.0 (TID 19) in 127 ms on datanode01 (executor 2) (21/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 21.0 in stage 0.0 (TID 21) in 123 ms on datanode03 (executor 1) (22/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 28.0 in stage 0.0 (TID 28, datanode02, executor 3, partition 28, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 29.0 in stage 0.0 (TID 29, datanode02, executor 3, partition 29, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) in 19 ms on datanode02 (executor 3) (23/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 23.0 in stage 0.0 (TID 23) in 18 ms on datanode02 (executor 3) (24/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 30.0 in stage 0.0 (TID 30, datanode01, executor 2, partition 30, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 31.0 in stage 0.0 (TID 31, datanode01, executor 2, partition 31, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 25.0 in stage 0.0 (TID 25) in 27 ms on datanode01 (executor 2) (25/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) in 29 ms on datanode01 (executor 2) (26/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 32.0 in stage 0.0 (TID 32, datanode02, executor 3, partition 32, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 29.0 in stage 0.0 (TID 29) in 16 ms on datanode02 (executor 3) (27/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 33.0 in stage 0.0 (TID 33, datanode03, executor 1, partition 33, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 26.0 in stage 0.0 (TID 26) in 30 ms on datanode03 (executor 1) (28/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 34.0 in stage 0.0 (TID 34, datanode02, executor 3, partition 34, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 28.0 in stage 0.0 (TID 28) in 21 ms on datanode02 (executor 3) (29/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 35.0 in stage 0.0 (TID 35, datanode03, executor 1, partition 35, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 27.0 in stage 0.0 (TID 27) in 32 ms on datanode03 (executor 1) (30/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 36.0 in stage 0.0 (TID 36, datanode02, executor 3, partition 36, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 32.0 in stage 0.0 (TID 32) in 11 ms on datanode02 (executor 3) (31/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 37.0 in stage 0.0 (TID 37, datanode01, executor 2, partition 37, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 30.0 in stage 0.0 (TID 30) in 18 ms on datanode01 (executor 2) (32/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 38.0 in stage 0.0 (TID 38, datanode01, executor 2, partition 38, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 31.0 in stage 0.0 (TID 31) in 20 ms on datanode01 (executor 2) (33/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 39.0 in stage 0.0 (TID 39, datanode03, executor 1, partition 39, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) in 17 ms on datanode03 (executor 1) (34/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 34.0 in stage 0.0 (TID 34) in 17 ms on datanode02 (executor 3) (35/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 40.0 in stage 0.0 (TID 40, datanode02, executor 3, partition 40, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 41.0 in stage 0.0 (TID 41, datanode03, executor 1, partition 41, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 35.0 in stage 0.0 (TID 35) in 17 ms on datanode03 (executor 1) (36/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 42.0 in stage 0.0 (TID 42, datanode02, executor 3, partition 42, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 36.0 in stage 0.0 (TID 36) in 16 ms on datanode02 (executor 3) (37/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 43.0 in stage 0.0 (TID 43, datanode01, executor 2, partition 43, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 37.0 in stage 0.0 (TID 37) in 16 ms on datanode01 (executor 2) (38/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 44.0 in stage 0.0 (TID 44, datanode02, executor 3, partition 44, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 45.0 in stage 0.0 (TID 45, datanode02, executor 3, partition 45, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 14 ms on datanode02 (executor 3) (39/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 42.0 in stage 0.0 (TID 42) in 11 ms on datanode02 (executor 3) (40/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 46.0 in stage 0.0 (TID 46, datanode03, executor 1, partition 46, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 39.0 in stage 0.0 (TID 39) in 20 ms on datanode03 (executor 1) (41/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 47.0 in stage 0.0 (TID 47, datanode03, executor 1, partition 47, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 41.0 in stage 0.0 (TID 41) in 20 ms on datanode03 (executor 1) (42/50) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 48.0 in stage 0.0 (TID 48, datanode01, executor 2, partition 48, PROCESS_LOCAL, 7831 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 49.0 in stage 0.0 (TID 49, datanode01, executor 2, partition 49, PROCESS_LOCAL, 7888 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 43.0 in stage 0.0 (TID 43) in 18 ms on datanode01 (executor 2) (43/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 38.0 in stage 0.0 (TID 38) in 31 ms on datanode01 (executor 2) (44/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 45.0 in stage 0.0 (TID 45) in 11 ms on datanode02 (executor 3) (45/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 44.0 in stage 0.0 (TID 44) in 16 ms on datanode02 (executor 3) (46/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 46.0 in stage 0.0 (TID 46) in 18 ms on datanode03 (executor 1) (47/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 48.0 in stage 0.0 (TID 48) in 15 ms on datanode01 (executor 2) (48/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 47.0 in stage 0.0 (TID 47) in 15 ms on datanode03 (executor 1) (49/50) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 49.0 in stage 0.0 (TID 49) in 25 ms on datanode01 (executor 2) (50/50) 19/08/13 19:53:27 INFO YarnClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 19/08/13 19:53:27 INFO DAGScheduler: ShuffleMapStage 0 (start at VoiceApplication2.java:128) finished in 1.174 s 19/08/13 19:53:27 INFO DAGScheduler: looking for newly runnable stages 19/08/13 19:53:27 INFO DAGScheduler: running: Set() 19/08/13 19:53:27 INFO DAGScheduler: waiting: Set(ResultStage 1) 19/08/13 19:53:27 INFO DAGScheduler: failed: Set() 19/08/13 19:53:27 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[2] at start at VoiceApplication2.java:128), which has no missing parents 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.2 KB, free 366.3 MB) 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1979.0 B, free 366.3 MB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode02:31984 (size: 1979.0 B, free: 366.3 MB) 19/08/13 19:53:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:27 INFO DAGScheduler: Submitting 20 missing tasks from ResultStage 1 (ShuffledRDD[2] at start at VoiceApplication2.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 19/08/13 19:53:27 INFO YarnClusterScheduler: Adding task set 1.0 with 20 tasks 19/08/13 19:53:27 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 50, datanode03, executor 1, partition 0, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 51, datanode02, executor 3, partition 1, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 52, datanode01, executor 2, partition 3, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 53, datanode03, executor 1, partition 2, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 54, datanode02, executor 3, partition 4, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 55, datanode01, executor 2, partition 5, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode02:28863 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode01:20487 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on datanode03:3328 (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.229.163:24656 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.198.144:41122 19/08/13 19:53:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.1.229.158:64276 19/08/13 19:53:27 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 56, datanode03, executor 1, partition 7, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 53) in 192 ms on datanode03 (executor 1) (1/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 57, datanode03, executor 1, partition 8, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 56) in 25 ms on datanode03 (executor 1) (2/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 58, datanode02, executor 3, partition 6, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 51) in 220 ms on datanode02 (executor 3) (3/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 14.0 in stage 1.0 (TID 59, datanode03, executor 1, partition 14, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 57) in 17 ms on datanode03 (executor 1) (4/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 16.0 in stage 1.0 (TID 60, datanode03, executor 1, partition 16, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 14.0 in stage 1.0 (TID 59) in 15 ms on datanode03 (executor 1) (5/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 16.0 in stage 1.0 (TID 60) in 21 ms on datanode03 (executor 1) (6/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 61, datanode02, executor 3, partition 9, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 54) in 269 ms on datanode02 (executor 3) (7/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 50) in 339 ms on datanode03 (executor 1) (8/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 62, datanode02, executor 3, partition 10, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 58) in 56 ms on datanode02 (executor 3) (9/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 63, datanode01, executor 2, partition 11, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 55) in 284 ms on datanode01 (executor 2) (10/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 64, datanode01, executor 2, partition 12, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 52) in 287 ms on datanode01 (executor 2) (11/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 65, datanode02, executor 3, partition 13, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID 66, datanode02, executor 3, partition 15, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 62) in 25 ms on datanode02 (executor 3) (12/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 61) in 29 ms on datanode02 (executor 3) (13/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID 67, datanode02, executor 3, partition 17, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 15.0 in stage 1.0 (TID 66) in 13 ms on datanode02 (executor 3) (14/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 13.0 in stage 1.0 (TID 65) in 16 ms on datanode02 (executor 3) (15/20) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID 68, datanode02, executor 3, partition 18, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID 69, datanode01, executor 2, partition 19, NODE_LOCAL, 7638 bytes) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 11.0 in stage 1.0 (TID 63) in 30 ms on datanode01 (executor 2) (16/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 64) in 30 ms on datanode01 (executor 2) (17/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 17.0 in stage 1.0 (TID 67) in 17 ms on datanode02 (executor 3) (18/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 19.0 in stage 1.0 (TID 69) in 13 ms on datanode01 (executor 2) (19/20) 19/08/13 19:53:27 INFO TaskSetManager: Finished task 18.0 in stage 1.0 (TID 68) in 20 ms on datanode02 (executor 3) (20/20) 19/08/13 19:53:27 INFO YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 19/08/13 19:53:27 INFO DAGScheduler: ResultStage 1 (start at VoiceApplication2.java:128) finished in 0.406 s 19/08/13 19:53:27 INFO DAGScheduler: Job 0 finished: start at VoiceApplication2.java:128, took 1.850883 s 19/08/13 19:53:27 INFO ReceiverTracker: Starting 1 receivers 19/08/13 19:53:27 INFO ReceiverTracker: ReceiverTracker started 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@4044ec97 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO MappedDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO MappedDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO MappedDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@5dd4b960 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@132d0c3c 19/08/13 19:53:27 INFO KafkaInputDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO KafkaInputDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO KafkaInputDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@5fd3dc81 19/08/13 19:53:27 INFO MappedDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO MappedDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO MappedDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@5dd4b960 19/08/13 19:53:27 INFO ForEachDStream: Slide time = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Storage level = Serialized 1x Replicated 19/08/13 19:53:27 INFO ForEachDStream: Checkpoint interval = null 19/08/13 19:53:27 INFO ForEachDStream: Remember interval = 60000 ms 19/08/13 19:53:27 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@525bed0c 19/08/13 19:53:27 INFO DAGScheduler: Got job 1 (start at VoiceApplication2.java:128) with 1 output partitions 19/08/13 19:53:27 INFO DAGScheduler: Final stage: ResultStage 2 (start at VoiceApplication2.java:128) 19/08/13 19:53:27 INFO DAGScheduler: Parents of final stage: List() 19/08/13 19:53:27 INFO DAGScheduler: Missing parents: List() 19/08/13 19:53:27 INFO DAGScheduler: Submitting ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:613), which has no missing parents 19/08/13 19:53:27 INFO ReceiverTracker: Receiver 0 started 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 133.5 KB, free 366.2 MB) 19/08/13 19:53:27 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 36.3 KB, free 366.1 MB) 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on datanode02:31984 (size: 36.3 KB, free: 366.3 MB) 19/08/13 19:53:27 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1039 19/08/13 19:53:27 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:613) (first 15 tasks are for partitions Vector(0)) 19/08/13 19:53:27 INFO YarnClusterScheduler: Adding task set 2.0 with 1 tasks 19/08/13 19:53:27 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, datanode01, executor 2, partition 0, PROCESS_LOCAL, 8757 bytes) 19/08/13 19:53:27 INFO RecurringTimer: Started timer for JobGenerator at time 1565697240000 19/08/13 19:53:27 INFO JobGenerator: Started JobGenerator at 1565697240000 ms 19/08/13 19:53:27 INFO JobScheduler: Started JobScheduler 19/08/13 19:53:27 INFO StreamingContext: StreamingContext started 19/08/13 19:53:27 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on datanode01:20487 (size: 36.3 KB, free: 2.5 GB) 19/08/13 19:53:27 INFO ReceiverTracker: Registered receiver for stream 0 from 10.1.229.158:64276 19/08/13 19:54:00 INFO JobScheduler: Added jobs for time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.0 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.1 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Finished job streaming job 1565697240000 ms.1 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Finished job streaming job 1565697240000 ms.0 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO JobScheduler: Starting job streaming job 1565697240000 ms.2 from job set of time 1565697240000 ms 19/08/13 19:54:00 INFO SharedState: loading hive config file: file:/data01/hadoop/yarn/local/usercache/hdfs/filecache/85431/__spark_conf__.zip/__hadoop_conf__/hive-site.xml 19/08/13 19:54:00 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('hdfs://CID-042fb939-95b4-4b74-91b8-9f94b999bdf7/apps/hive/warehouse'). 19/08/13 19:54:00 INFO SharedState: Warehouse path is 'hdfs://CID-042fb939-95b4-4b74-91b8-9f94b999bdf7/apps/hive/warehouse'. 19/08/13 19:54:00 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode02:31984 in memory (size: 1979.0 B, free: 366.3 MB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode02:28863 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode01:20487 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:00 INFO BlockManagerInfo: Removed broadcast_1_piece0 on datanode03:3328 in memory (size: 1979.0 B, free: 2.5 GB) 19/08/13 19:54:02 INFO CodeGenerator: Code generated in 175.416957 ms 19/08/13 19:54:02 INFO JobScheduler: Finished job streaming job 1565697240000 ms.2 from job set of time 1565697240000 ms 19/08/13 19:54:02 ERROR JobScheduler: Error running job streaming job 1565697240000 ms.2 org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/08/13 19:54:02 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/08/13 19:54:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'meta_voice' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:388) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:122) at com.stream.VoiceApplication2$2.call(VoiceApplication2.java:115) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ) 19/08/13 19:54:02 INFO StreamingContext: Invoking stop(stopGracefully=true) from shutdown hook 19/08/13 19:54:02 INFO ReceiverTracker: Sent stop signal to all 1 receivers 19/08/13 19:54:02 ERROR ReceiverTracker: Deregistered receiver for stream 0: Stopped by driver 19/08/13 19:54:02 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 70) in 35055 ms on datanode01 (executor 2) (1/1) 19/08/13 19:54:02 INFO YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 19/08/13 19:54:02 INFO DAGScheduler: ResultStage 2 (start at VoiceApplication2.java:128) finished in 35.086 s 19/08/13 19:54:02 INFO ReceiverTracker: Waiting for receiver job to terminate gracefully 19/08/13 19:54:02 INFO ReceiverTracker: Waited for receiver job to terminate gracefully 19/08/13 19:54:02 INFO ReceiverTracker: All of the receivers have deregistered successfully 19/08/13 19:54:02 INFO ReceiverTracker: ReceiverTracker stopped 19/08/13 19:54:02 INFO JobGenerator: Stopping JobGenerator gracefully 19/08/13 19:54:02 INFO JobGenerator: Waiting for all received blocks to be consumed for job generation 19/08/13 19:54:02 INFO JobGenerator: Waited for all received blocks to be consumed for job generation 19/08/13 19:54:12 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) 19/08/13 19:54:12 ERROR Utils: Uncaught exception in thread pool-1-thread-1 java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.spark.streaming.util.RecurringTimer.stop(RecurringTimer.scala:86) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:137) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:123) at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:681) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:680) at org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:714) at org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ```
spark-submit的集群提交问题
嗨,请教各位老师个问题,我提交spark应用,本地模式都是正常的,采用了集群提交模式(--master: spark://ip:7077),但却无法返回结果,一直打印如下warn信息: 16/10/18 17:16:09 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 16/10/18 17:16:24 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 16/10/18 17:16:39 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
quartz 集群环境下,停掉一个点,该点下的定时任务不被其他点承接,定时任务不再执行
集群环境下 springboot + quartz,停掉一个点,该点下的定时任务不被其他点承接,定时任务不再执行 两个job类20秒执行一次 0/20 * * * * ? 启动两个,能看到控制台的两个项目分别打印两个定时任务的日志,停掉其中一个项目,日志不变,停掉项目中的任务没有被运行的项目执行,运行的项目还执行原任务 配置文件 ``` #quartz集群配置 # =========================================================================== # Configure Main Scheduler Properties 调度器属性 # =========================================================================== #调度标识名 集群中每一个实例都必须使用相同的名称 org.quartz.scheduler.instanceName=DefaultQuartzScheduler #ID设置为自动获取 每一个必须不同 org.quartz.scheduler.instanceid=AUTO #============================================================================ # Configure ThreadPool #============================================================================ #线程池的实现类(一般使用SimpleThreadPool即可满足几乎所有用户的需求) org.quartz.threadPool.class=org.quartz.simpl.SimpleThreadPool #指定线程数,至少为1(无默认值)(一般设置为1-100直接的整数合适) org.quartz.threadPool.threadCount=10 #设置线程的优先级(最大为java.lang.Thread.MAX_PRIORITY 10,最小为Thread.MIN_PRIORITY 1,默认为5) org.quartz.threadPool.threadPriority=5 #============================================================================ # Configure JobStore #============================================================================ # 信息保存时间 默认值60秒 org.quartz.jobStore.misfireThreshold=60000 #数据保存方式为数据库持久化 org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX #数据库代理类,一般org.quartz.impl.jdbcjobstore.StdJDBCDelegate可以满足大部分数据库 org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate #JobDataMaps是否都为String类型 org.quartz.jobStore.useProperties=false #数据库别名 随便取 org.quartz.jobStore.dataSource=myDS #表的前缀,默认QRTZ_ org.quartz.jobStore.tablePrefix=QRTZ_ #是否加入集群 org.quartz.jobStore.isClustered=true #调度实例失效的检查时间间隔 org.quartz.jobStore.clusterCheckinInterval=15000 ``` ``` @Configuration public class QuartzConfiguration { @Autowired DataSource dataSource; /** * 继承org.springframework.scheduling.quartz.SpringBeanJobFactory 实现任务实例化方式 */ public static class AutowiringSpringBeanJobFactory extends SpringBeanJobFactory implements ApplicationContextAware { private transient AutowireCapableBeanFactory beanFactory; @Override public void setApplicationContext(final ApplicationContext context) { beanFactory = context.getAutowireCapableBeanFactory(); } /** * 将job实例交给spring ioc托管 我们在job实例实现类内可以直接使用spring注入的调用被spring ioc管理的实例 * * @param bundle * @return * @throws Exception */ @Override protected Object createJobInstance(final TriggerFiredBundle bundle) throws Exception { final Object job = super.createJobInstance(bundle); /** * 将job实例交付给spring ioc */ beanFactory.autowireBean(job); return job; } } /** * 配置任务工厂实例 * * @param applicationContext spring上下文实例 * @return */ @Bean public JobFactory jobFactory(ApplicationContext applicationContext) { /** * 采用自定义任务工厂 整合spring实例来完成构建任务 see {@link AutowiringSpringBeanJobFactory} */ AutowiringSpringBeanJobFactory jobFactory = new AutowiringSpringBeanJobFactory(); jobFactory.setApplicationContext(applicationContext); return jobFactory; } /** * 配置任务调度器 使用项目数据源作为quartz数据源 * * @param jobFactory 自定义配置任务工厂 * @return * @throws Exception */ @Bean(destroyMethod = "destroy", autowire = Autowire.NO) public SchedulerFactoryBean schedulerFactoryBean(JobFactory jobFactory) throws Exception { SchedulerFactoryBean schedulerFactoryBean = new SchedulerFactoryBean(); // 将spring管理job自定义工厂交由调度器维护 schedulerFactoryBean.setJobFactory(jobFactory); // 设置覆盖已存在的任务 schedulerFactoryBean.setOverwriteExistingJobs(true); // 项目启动完成后,等待10秒后开始执行调度器初始化 schedulerFactoryBean.setStartupDelay(10); // 设置调度器自动运行 schedulerFactoryBean.setAutoStartup(true); // 设置数据源,使用与项目统一数据源 schedulerFactoryBean.setDataSource(dataSource); // 设置上下文spring bean name schedulerFactoryBean.setApplicationContextSchedulerContextKey("applicationContext"); // 设置配置文件位置 schedulerFactoryBean.setConfigLocation(new ClassPathResource("/application-quartz.properties")); return schedulerFactoryBean; } } ```
一个百度拇指医生爬虫,想要先实现爬取某个问题的所有链接,但是爬不出来东西。求各位大神帮忙看一下这是为什么?
#写在前面的话 在这个爬虫里我想实现把百度拇指医生里关于“咳嗽”的链接全部爬取下来,下一步要进行的是把爬取到的每个链接里的items里面的内容爬取下来,但是我在第一步就卡住了,求各位大神帮我看一下吧。之前刚刚发了一篇问答,但是不知道怎么回事儿,现在找不到了,(貌似是被删了...?)救救小白吧!感激不尽! 这个是我的爬虫的结构 ![图片说明](https://img-ask.csdn.net/upload/201911/27/1574787999_274479.png) ##ks: ``` # -*- coding: utf-8 -*- import scrapy from kesou.items import KesouItem from scrapy.selector import Selector from scrapy.spiders import Spider from scrapy.http import Request ,FormRequest import pymongo class KsSpider(scrapy.Spider): name = 'ks' allowed_domains = ['kesou,baidu.com'] start_urls = ['https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%E5%92%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX%2B1M'] def parse(self, response): item = KesouItem() contents = response.xpath('.//h3[@class="t"]') for content in contents: url = content.xpath('.//a/@href').extract()[0] item['url'] = url yield item if self.offset < 760: self.offset += 10 yield scrapy.Request(url = "https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=" + str(self.offset) + "&oq=%E5%92%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX%2B1M",callback=self.parse,dont_filter=True) ``` ##items: ``` # -*- coding: utf-8 -*- # Define here the models for your scraped items # # See documentation in: # https://docs.scrapy.org/en/latest/topics/items.html import scrapy class KesouItem(scrapy.Item): # 问题ID question_ID = scrapy.Field() # 问题描述 question = scrapy.Field() # 医生回答发表时间 answer_pubtime = scrapy.Field() # 问题详情 description = scrapy.Field() # 医生姓名 doctor_name = scrapy.Field() # 医生职位 doctor_title = scrapy.Field() # 医生所在医院 hospital = scrapy.Field() ``` ##middlewares: ``` # -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in: # https://docs.scrapy.org/en/latest/topics/spider-middleware.html from scrapy import signals class KesouSpiderMiddleware(object): # Not all methods need to be defined. If a method is not defined, # scrapy acts as if the spider middleware does not modify the # passed objects. @classmethod def from_crawler(cls, crawler): # This method is used by Scrapy to create your spiders. s = cls() crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) return s def process_spider_input(self, response, spider): # Called for each response that goes through the spider # middleware and into the spider. # Should return None or raise an exception. return None def process_spider_output(self, response, result, spider): # Called with the results returned from the Spider, after # it has processed the response. # Must return an iterable of Request, dict or Item objects. for i in result: yield i def process_spider_exception(self, response, exception, spider): # Called when a spider or process_spider_input() method # (from other spider middleware) raises an exception. # Should return either None or an iterable of Request, dict # or Item objects. pass def process_start_requests(self, start_requests, spider): # Called with the start requests of the spider, and works # similarly to the process_spider_output() method, except # that it doesn’t have a response associated. # Must return only requests (not items). for r in start_requests: yield r def spider_opened(self, spider): spider.logger.info('Spider opened: %s' % spider.name) class KesouDownloaderMiddleware(object): # Not all methods need to be defined. If a method is not defined, # scrapy acts as if the downloader middleware does not modify the # passed objects. @classmethod def from_crawler(cls, crawler): # This method is used by Scrapy to create your spiders. s = cls() crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) return s def process_request(self, request, spider): # Called for each request that goes through the downloader # middleware. # Must either: # - return None: continue processing this request # - or return a Response object # - or return a Request object # - or raise IgnoreRequest: process_exception() methods of # installed downloader middleware will be called return None def process_response(self, request, response, spider): # Called with the response returned from the downloader. # Must either; # - return a Response object # - return a Request object # - or raise IgnoreRequest return response def process_exception(self, request, exception, spider): # Called when a download handler or a process_request() # (from other downloader middleware) raises an exception. # Must either: # - return None: continue processing this exception # - return a Response object: stops process_exception() chain # - return a Request object: stops process_exception() chain pass def spider_opened(self, spider): spider.logger.info('Spider opened: %s' % spider.name) ``` ##piplines: ``` # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html import pymongo from scrapy.utils.project import get_project_settings settings = get_project_settings() class KesouPipeline(object): def __init__(self): host = settings["MONGODB_HOST"] port = settings["MONGODB_PORT"] dbname = settings["MONGODB_DBNAME"] sheetname= settings["MONGODB_SHEETNAME"] # 创建MONGODB数据库链接 client = pymongo.MongoClient(host = host, port = port) # 指定数据库 mydb = client[dbname] # 存放数据的数据库表名 self.sheet = mydb[sheetname] def process_item(self, item, spider): data = dict(item) self.sheet.insert(data) return item ``` ##settings: ``` # -*- coding: utf-8 -*- # Scrapy settings for kesou project # # For simplicity, this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://docs.scrapy.org/en/latest/topics/settings.html # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html # https://docs.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = 'kesou' SPIDER_MODULES = ['kesou.spiders'] NEWSPIDER_MODULE = 'kesou.spiders' # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'kesou (+http://www.yourdomain.com)' # Obey robots.txt rules ROBOTSTXT_OBEY = False # Configure maximum concurrent requests performed by Scrapy (default: 16) #CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0) # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay # See also autothrottle settings and docs #DOWNLOAD_DELAY = 3 # The download delay setting will honor only one of: #CONCURRENT_REQUESTS_PER_DOMAIN = 16 #CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default) COOKIES_ENABLED = False # Disable Telnet Console (enabled by default) #TELNETCONSOLE_ENABLED = False USER_AGENT="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:67.0) Gecko/20100101 Firefox/67.0" # Override the default request headers: #DEFAULT_REQUEST_HEADERS = { # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # 'Accept-Language': 'en', #} # Enable or disable spider middlewares # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html #SPIDER_MIDDLEWARES = { # 'kesou.middlewares.KesouSpiderMiddleware': 543, #} # Enable or disable downloader middlewares # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html #DOWNLOADER_MIDDLEWARES = { # 'kesou.middlewares.KesouDownloaderMiddleware': 543, #} # Enable or disable extensions # See https://docs.scrapy.org/en/latest/topics/extensions.html #EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, #} # Configure item pipelines # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'kesou.pipelines.KesouPipeline': 300, } # MONGODB 主机名 MONGODB_HOST = "127.0.0.1" # MONGODB 端口号 MONGODB_PORT = 27017 # 数据库名称 MONGODB_DBNAME = "ks" # 存放数据的表名称 MONGODB_SHEETNAME = "ks_urls" # Enable and configure the AutoThrottle extension (disabled by default) # See https://docs.scrapy.org/en/latest/topics/autothrottle.html #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default) # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = 'httpcache' #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage' ``` ##run.py: ``` # -*- coding: utf-8 -*- from scrapy import cmdline cmdline.execute("scrapy crawl ks".split()) ``` ##这个是运行出来的结果: ``` PS D:\scrapy_project\kesou> scrapy crawl ks 2019-11-27 00:14:17 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: kesou) 2019-11-27 00:14:17 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twis.7.0, Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryphy 2.6.1, Platform Windows-10-10.0.18362-SP0 2019-11-27 00:14:17 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'kesou', 'COOKIES_ENABLED': False, 'NEWSPIDER_MODULE': 'spiders', 'SPIDER_MODULES': ['kesou.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:67.0) Gecko/20100101 Firefox/67 2019-11-27 00:14:17 [scrapy.extensions.telnet] INFO: Telnet Password: 051629c46f34abdf 2019-11-27 00:14:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-11-27 00:14:19 [scrapy.middleware] INFO: Enabled item pipelines: ['kesou.pipelines.KesouPipeline'] 2019-11-27 00:14:19 [scrapy.core.engine] INFO: Spider opened 2019-11-27 00:14:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-11-27 00:14:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-11-27 00:14:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%E5%92%B3%E5&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFXJvk%2FSYX% (referer: None) 2019-11-27 00:14:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.baidu.com/s?wd=%E5%92%B3%E5%97%BD&pn=0&oq=%B3%E5%97%BD&ct=2097152&ie=utf-8&si=muzhi.baidu.com&rsv_pq=980e0c55000e2402&rsv_t=ed3f0i5yeefxTMskgzim00cCUyVujMRnw0Vs4o1%2Bo%2Bohf9rFFSYX%2B1M> (referer: None) Traceback (most recent call last): File "d:\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output for x in result: File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr> return (_set_referer(r) for r in result or ()) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "d:\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable for r in iterable: File "d:\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "D:\scrapy_project\kesou\kesou\spiders\ks.py", line 19, in parse item['url'] = url File "d:\anaconda3\lib\site-packages\scrapy\item.py", line 73, in __setitem__ (self.__class__.__name__, key)) KeyError: 'KesouItem does not support field: url' 2019-11-27 00:14:20 [scrapy.core.engine] INFO: Closing spider (finished) 2019-11-27 00:14:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 438, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 68368, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.992207, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 26, 16, 14, 20, 855804), 'log_count/DEBUG': 1, 2019-11-27 00:14:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 438, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 68368, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.992207, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 26, 16, 14, 20, 855804), 'log_count/DEBUG': 1, 'log_count/ERROR': 1, 'log_count/INFO': 10, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'spider_exceptions/KeyError': 1, 'start_time': datetime.datetime(2019, 11, 26, 16, 14, 19, 863597)} 2019-11-27 00:14:21 [scrapy.core.engine] INFO: Spider closed (finished) ```
Spark读取错误PrematureEOFfrominputStream
:主要问题java.io.EOFException: Premature EOF from inputStream 使用textFile或者newAPIHadoopFile都出现这个错误 写spark读取数据的时候一直报这个错误。 连count,repartition都过不去。数据读的比平常慢的多。 看数据文件,应该是很均匀的,应该不是数据倾斜的问题了吧。 下面是报错信息: ``` 16/09/15 23:27:57 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54) at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Driver stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 0.0 failed 4 times, most recent failure: Lost task 41.3 in stage 0.0 (TID 5736, dn076179.heracles.sohuno.com): java.io.EOFException: Premature EOF from inputStream at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54) at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:102) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ```
spark streaming运行一段时间报以下异常,怎么解决
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1568735.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1568735.0 (TID 11808399, iZ94pshi327Z): java.lang.Exception: Could not compute split, block input-0-1438413230200 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/08/01 08:53:09 WARN AkkaUtils: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] in 2 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/08/01 08:53:28 WARN AkkaUtils: Error sending message [message = UpdateBlockInfo(BlockManagerId(0, iZ94w2tczvjZ, 41595),input-0-1438385673800,StorageLevel(false, false, false, false, 1),0,0,0)] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:221) at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:62) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:384) at org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:360) at org.apache.spark.storage.BlockManager.dropOldBlocks(BlockManager.scala:1138) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$dropOldNonBroadcastBlocks(BlockManager.scala:1115) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcVJ$sp(BlockManager.scala:149) at org.apache.spark.util.MetadataCleaner$$anon$1.run(MetadataCleaner.scala:43) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) 15/08/01 08:53:42 WARN AkkaUtils: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] in 3 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/08/01 08:53:45 WARN Executor: Issue communicating with driver in heartbeater org.apache.spark.SparkException: Error sending message [message = Heartbeat(0,[Lscala.Tuple2;@544fc1ff,BlockManagerId(0, iZ94w2tczvjZ, 41595))] at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:209) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) ... 1 more
spark jdbc连接impala报错Method not supported
各位好 我的spark是2.1.0,用的hive-jdbc 2.1.0,现在写入impala的时候报以下错: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2305) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2304) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446) at com.aoyou.data.CustomerVisitProduct$.saveToHive(CustomerVisitProduct.scala:281) at com.aoyou.data.CustomerVisitProduct$.main(CustomerVisitProduct.scala:221) at com.aoyou.data.CustomerVisitProduct.main(CustomerVisitProduct.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:75) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:589) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 以下是代码实现 val sparkConf = new SparkConf().setAppName("save").set("spark.sql.crossJoin.enabled", "true"); val sparkSession = SparkSession .builder() .enableHiveSupport() .getOrCreate(); val dataframe = sparkSession.createDataFrame(rddSchema, new Row().getClass()) val property = new Properties(); property.put("user", "xxxxx") property.put("password", "xxxxx") dataframe.write.mode(SaveMode.Append).option("driver", "org.apache.hive.jdbc.HiveDriver").jdbc("jdbc:hive2://xxxx:21050/rawdata;auth=noSasl", "tablename", property) 请问这是怎么回事啊?感觉是驱动版本问题
spark读取elasticSearch时EsHadoopNoNodesLeftException
RT 我本机是能ping通elasticSearch的9200端口的。 org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes faile]] at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:123) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:303) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:287) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:291) at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:118) at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:100) at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:57) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:346) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:31) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:34) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:34) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
在spark streaming中实时更新mllib的ALS算法的模型遇到的问题!
![图片说明](https://img-ask.csdn.net/upload/201506/15/1434358402_24697.jpg) ![图片说明](https://img-ask.csdn.net/upload/201506/15/1434358368_454177.jpg) ![图片说明](https://img-ask.csdn.net/upload/201506/15/1434358416_667645.jpg) 在spark streaming中使用ALS算法,实现模型的实时更新有人了解吗? 总是出ERROR [dag-scheduler-event-loop] scheduler.DAGSchedulerEventProcessLoop (Logging.scala:logError(96)) - DAGSchedulerEventProcessLoop failed; shutting down SparkContext 这个异常是什么意思?网上找了好久都没解决。。快疯了 大概就是上面几张图描述的那样子,求教育!
hive运行insert语句在on yarn的情况下报错,开启本地模式后就好了,报错如下:
``` hive> insert into test values('B',2); Query ID = root_20191114105642_8cc05952-0497-4eff-893e-af6de8f05c6e Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator 19/11/14 10:56:43 INFO client.RMProxy: Connecting to ResourceManager at cloudera/37.64.0.71:8032 19/11/14 10:56:43 INFO client.RMProxy: Connecting to ResourceManager at cloudera/37.64.0.71:8032 java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:15360, vCores:8>, maximum allowed allocation=<memory:6557, vCores:8>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6557, vCores:8> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:478) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:280) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:522) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:377) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:318) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:633) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:267) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:531) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:251) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1328) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:313) at org.apache.hadoop.util.RunJar.main(RunJar.java:227) Caused by: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:15360, vCores:8>, maximum allowed allocation=<memory:6557, vCores:8>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6557, vCores:8> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:478) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:280) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:522) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:377) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:318) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:633) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:267) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:531) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:284) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy43.submitApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:290) at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:297) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:330) ... 35 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:15360, vCores:8>, maximum allowed allocation=<memory:6557, vCores:8>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6557, vCores:8> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:478) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:280) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:522) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:377) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:318) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:633) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:267) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:531) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) at org.apache.hadoop.ipc.Client.call(Client.java:1445) at org.apache.hadoop.ipc.Client.call(Client.java:1355) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy42.submitApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:281) ... 48 more Job Submission failed with exception 'java.io.IOException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:15360, vCores:8>, maximum allowed allocation=<memory:6557, vCores:8>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6557, vCores:8> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:478) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:280) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:522) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:377) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:318) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:633) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:267) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:531) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) )' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:15360, vCores:8>, maximum allowed allocation=<memory:6557, vCores:8>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6557, vCores:8> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:478) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:374) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:280) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:522) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:377) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:318) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:633) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:267) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:531) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) ``` # 内存最大只有6G,他非要申请15G,这个问题该如何处理, # 求助各位大佬!!!
spark在yarn集群上执行client模式代码
spark的wordcount提交到yarn集群上运行时,出现以下报错:请问有大神知道如何解决吗? ``` [hadoop00@hadoop02 ~]$ ./spark-submit-wordcount-yarn-client.sh //下面是执行过程: 19/07/31 17:12:36 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.2.102:4040 19/07/31 17:12:36 INFO spark.SparkContext: Added JAR file:/home/hadoop00/spark-core-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://192.168.2.102:43723/jars/spark-core-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1564564356841 19/07/31 17:12:40 INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers 19/07/31 17:12:41 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 19/07/31 17:12:41 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 19/07/31 17:12:41 INFO yarn.Client: Setting up container launch context for our AM 19/07/31 17:12:41 INFO yarn.Client: Setting up the launch environment for our AM container 19/07/31 17:12:41 INFO yarn.Client: Preparing resources for our AM container 19/07/31 17:12:45 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 19/07/31 17:12:53 INFO yarn.Client: Uploading resource file:/tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe/__spark_libs__5797595590401639249.zip -> hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001/__spark_libs__5797595590401639249.zip 19/07/31 17:13:07 INFO yarn.Client: Uploading resource file:/tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe/__spark_conf__627970737981952935.zip -> hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001/__spark_conf__.zip 19/07/31 17:13:07 INFO spark.SecurityManager: Changing view acls to: hadoop00 19/07/31 17:13:07 INFO spark.SecurityManager: Changing modify acls to: hadoop00 19/07/31 17:13:07 INFO spark.SecurityManager: Changing view acls groups to: 19/07/31 17:13:07 INFO spark.SecurityManager: Changing modify acls groups to: 19/07/31 17:13:07 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop00); groups with view permissions: Set(); users with modify permissions: Set(hadoop00); groups with modify permissions: Set() 19/07/31 17:13:07 INFO yarn.Client: Submitting application application_1564523762236_0001 to ResourceManager 19/07/31 17:13:08 INFO impl.YarnClientImpl: Submitted application application_1564523762236_0001 19/07/31 17:13:08 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1564523762236_0001 and attemptId None 19/07/31 17:13:09 INFO yarn.Client: Application report for application_1564523762236_0001 (state: ACCEPTED) 19/07/31 17:13:09 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1564523805324 final status: UNDEFINED tracking URL: http://hadoop03:8088/proxy/application_1564523762236_0001/ user: hadoop00 19/07/31 17:13:10 INFO yarn.Client: Application report for application_1564523762236_0001 (state: FAILED) 19/07/31 17:13:10 INFO yarn.Client: client token: N/A diagnostics: Application application_1564523762236_0001 failed 2 times due to Error launching appattempt_1564523762236_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1564564389887 found 1564524406596 Note: System times on machines may be out of sync. Check system time and time zones. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168) at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) . Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1564523805324 final status: FAILED tracking URL: http://hadoop03:8088/cluster/app/application_1564523762236_0001 user: hadoop00 19/07/31 17:13:10 INFO yarn.Client: Deleted staging directory hdfs://myha01/user/hadoop00/.sparkStaging/application_1564523762236_0001 19/07/31 17:13:10 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at p2._01ScalaWordCountRemoteOps$.main(_01ScalaWordCountRemoteOps.scala:21) at p2._01ScalaWordCountRemoteOps.main(_01ScalaWordCountRemoteOps.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/07/31 17:13:10 INFO server.AbstractConnector: Stopped Spark@6f2bafef{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 19/07/31 17:13:10 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.2.102:4040 19/07/31 17:13:10 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 19/07/31 17:13:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 19/07/31 17:13:10 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 19/07/31 17:13:10 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 19/07/31 17:13:10 INFO cluster.YarnClientSchedulerBackend: Stopped 19/07/31 17:13:10 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/07/31 17:13:10 INFO memory.MemoryStore: MemoryStore cleared 19/07/31 17:13:10 INFO storage.BlockManager: BlockManager stopped 19/07/31 17:13:10 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 19/07/31 17:13:10 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running 19/07/31 17:13:10 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/07/31 17:13:10 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at p2._01ScalaWordCountRemoteOps$.main(_01ScalaWordCountRemoteOps.scala:21) at p2._01ScalaWordCountRemoteOps.main(_01ScalaWordCountRemoteOps.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/07/31 17:13:10 INFO util.ShutdownHookManager: Shutdown hook called 19/07/31 17:13:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-59635080-0711-4817-9e3b-b25f528cbbbe ```
spark 写入elasticsearch报错Could not write all entries
我在使用Spark将Rdd写入到elasticsearch集群的时候报出异常 ``` Could not write all entries [199/161664] (maybe ES was overloaded?). Bailing out... at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:250) at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:201) at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:163) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:49) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` RDD大概是5000W行数据,es集群有两个节点 ``` EsSpark.saveToEs(result, "userindex/users", Map("es.mapping.id" -> "uid")) ```
spark读取不了本地文件是怎么回事
``` textFile=sc.textFile("file:///home/hduser/pythonwork/ipynotebook/data/test.txt") stringRDD=textFile.flatMap(lambda line:line.split(' ')) stringRDD.collect() ``` 我此路径下是有test文件的: ![图片说明](https://img-ask.csdn.net/upload/201805/18/1526634813_44673.png) 错误是: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 8.0 failed 4 times, most recent failure: Lost task 1.3 in stage 8.0 (TID 58, 192.168.56.103, executor 1): java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist 。 。 。 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599) 。 。 。 Caused by: java.io.FileNotFoundException: File file:/home/hduser/pythonwork/ipynotebook/data/test.txt does not exist ``` 而且发现若我把代码中test.txt随便改一个名字,比如ttest.txt(肯定是没有的文件) 错误竟然发生了变化: ``` Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hduser/pythonwork/ipynotebook/data/tesst.txt at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:938) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:153) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) ``` 注意: 此时我是以spark集群跑的:'spark://emaster:7077' 若是以本地跑就可以找到本地的那个test.txt文件 找hdfs文件系统的文件可以找到(在spark集群跑情况下) 。。。处由于字数显示省略了些不重要的错误提示,若想知道的话可以回复我 跪求大神帮助~感激不尽!!!
Flink local模式下,运行Flink自带的jar包一直报错
Flink local模式下,运行Flink自带的jar包一直报错 启动Flink: ![图片说明](https://img-ask.csdn.net/upload/201911/22/1574391817_772547.png) 执行: bin/flink run examples/streaming/SocketWindowWordCount.jar --port 8888 利用nc -lk 8888模拟socket 输入,然后会报错,并且页面也进不去了 前台页面显示: ![图片说明](https://img-ask.csdn.net/upload/201911/22/1574392093_133203.png) 后台报错内容: ``` org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result. (JobID: f2ee89e0ed991f22ed9eaec00edfd789) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:261) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:426) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:816) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:290) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:216) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1053) at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1129) at org.apache.flink.client.cli.CliFrontend$$Lambda$1/1977310713.call(Unknown Source) at org.apache.flink.runtime.security.HadoopSecurityContext$$Lambda$2/1169474473.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1129) Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:380) at org.apache.flink.client.program.rest.RestClusterClient$$Lambda$11/256346753.apply(Unknown Source) at java.util.concurrent.CompletableFuture$ExceptionCompletion.run(CompletableFuture.java:1246) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) at java.util.concurrent.CompletableFuture$ThenApply.run(CompletableFuture.java:723) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) at java.util.concurrent.CompletableFuture$ThenCopy.run(CompletableFuture.java:1333) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2361) at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:203) at org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$21/100819684.accept(Unknown Source) at java.util.concurrent.CompletableFuture$WhenCompleteCompletion.run(CompletableFuture.java:1298) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) at java.util.concurrent.CompletableFuture$ThenCopy.run(CompletableFuture.java:1333) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:193) at java.util.concurrent.CompletableFuture.internalComplete(CompletableFuture.java:210) at java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:626) at java.util.concurrent.CompletableFuture$Async.run(CompletableFuture.java:428) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#1680779493]] after [12000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage". at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) at java.lang.Thread.run(Thread.java:745) End of exception on server side>] at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:350) at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:334) at org.apache.flink.runtime.rest.RestClient$$Lambda$31/1522310222.apply(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncCompose.exec(CompletableFuture.java:604) ... 4 more ``` 网上搜了一下,添加了这几个参数还是有问题: ``` akka.ask.timeout: 60 s web.timeout: 12000 taskmanager.host: localhost ``` 有人遇到过吗?
相见恨晚的超实用网站
相见恨晚的超实用网站 持续更新中。。。
爬虫福利二 之 妹子图网MM批量下载
爬虫福利一:27报网MM批量下载 点击 看了本文,相信大家对爬虫一定会产生强烈的兴趣,激励自己去学习爬虫,在这里提前祝:大家学有所成! 目标网站:妹子图网 环境:Python3.x 相关第三方模块:requests、beautifulsoup4 Re:各位在测试时只需要将代码里的变量path 指定为你当前系统要保存的路径,使用 python xxx.py 或IDE运行即可。 ...
字节跳动视频编解码面经
三四月份投了字节跳动的实习(图形图像岗位),然后hr打电话过来问了一下会不会opengl,c++,shador,当时只会一点c++,其他两个都不会,也就直接被拒了。 七月初内推了字节跳动的提前批,因为内推没有具体的岗位,hr又打电话问要不要考虑一下图形图像岗,我说实习投过这个岗位不合适,不会opengl和shador,然后hr就说秋招更看重基础。我当时想着能进去就不错了,管他哪个岗呢,就同意了面试...
开源一个功能完整的SpringBoot项目框架
福利来了,给大家带来一个福利。 最近想了解一下有关Spring Boot的开源项目,看了很多开源的框架,大多是一些demo或者是一个未成形的项目,基本功能都不完整,尤其是用户权限和菜单方面几乎没有完整的。 想到我之前做的框架,里面通用模块有:用户模块,权限模块,菜单模块,功能模块也齐全了,每一个功能都是完整的。 打算把这个框架分享出来,供大家使用和学习。 为什么用框架? 框架可以学习整体...
源码阅读(19):Java中主要的Map结构——HashMap容器(下1)
HashMap容器从字面的理解就是,基于Hash算法构造的Map容器。从数据结构的知识体系来说,HashMap容器是散列表在Java中的具体表达(并非线性表结构)。具体来说就是,利用K-V键值对中键对象的某个属性(默认使用该对象的“内存起始位置”这一属性)作为计算依据进行哈希计算(调用hashCode方法),然后再以计算后的返回值为依据,将当前K-V键值对在符合HashMap容器构造原则的基础上,放置到HashMap容器的某个位置上,且这个位置和之前添加的K-V键值对的存储位置完全独立,不一定构成连续的存储
c++制作的植物大战僵尸,开源,一代二代结合游戏
此游戏全部由本人自己制作完成。游戏大部分的素材来源于原版游戏素材,少部分搜集于网络,以及自己制作。 此游戏为同人游戏而且仅供学习交流使用,任何人未经授权,不得对本游戏进行更改、盗用等,否则后果自负。目前有六种僵尸和六种植物,植物和僵尸的动画都是本人做的。qq:2117610943 开源代码下载 提取码:3vzm 点击下载--&gt; 11月28日 新增四种植物 统一植物画风,全部修...
Java学习的正确打开方式
在博主认为,对于入门级学习java的最佳学习方法莫过于视频+博客+书籍+总结,前三者博主将淋漓尽致地挥毫于这篇博客文章中,至于总结在于个人,实际上越到后面你会发现学习的最好方式就是阅读参考官方文档其次就是国内的书籍,博客次之,这又是一个层次了,这里暂时不提后面再谈。博主将为各位入门java保驾护航,各位只管冲鸭!!!上天是公平的,只要不辜负时间,时间自然不会辜负你。 何谓学习?博主所理解的学习,它是一个过程,是一个不断累积、不断沉淀、不断总结、善于传达自己的个人见解以及乐于分享的过程。
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过...
Python——画一棵漂亮的樱花树(不同种樱花+玫瑰+圣诞树喔)
最近翻到一篇知乎,上面有不少用Python(大多是turtle库)绘制的树图,感觉很漂亮,我整理了一下,挑了一些我觉得不错的代码分享给大家(这些我都测试过,确实可以生成) one 樱花树 动态生成樱花 效果图(这个是动态的): 实现代码 import turtle as T import random import time # 画樱花的躯干(60,t) def Tree(branch, ...
linux系列之常用运维命令整理笔录
本博客记录工作中需要的linux运维命令,大学时候开始接触linux,会一些基本操作,可是都没有整理起来,加上是做开发,不做运维,有些命令忘记了,所以现在整理成博客,当然vi,文件操作等就不介绍了,慢慢积累一些其它拓展的命令,博客不定时更新 free -m 其中:m表示兆,也可以用g,注意都要小写 Men:表示物理内存统计 total:表示物理内存总数(total=used+free) use...
Python 基础(一):入门必备知识
Python 入门必备知识,你都掌握了吗?
深度学习图像算法在内容安全领域的应用
互联网给人们生活带来便利的同时也隐含了大量不良信息,防范互联网平台有害内容传播引起了多方面的高度关注。本次演讲从技术层面分享网易易盾在内容安全领域的算法实践经验,包括深度...
程序员接私活怎样防止做完了不给钱?
首先跟大家说明一点,我们做 IT 类的外包开发,是非标品开发,所以很有可能在开发过程中会有这样那样的需求修改,而这种需求修改很容易造成扯皮,进而影响到费用支付,甚至出现做完了项目收不到钱的情况。 那么,怎么保证自己的薪酬安全呢? 我们在开工前,一定要做好一些证据方面的准备(也就是“讨薪”的理论依据),这其中最重要的就是需求文档和验收标准。一定要让需求方提供这两个文档资料作为开发的基础。之后开发...
网页实现一个简单的音乐播放器(大佬别看。(⊙﹏⊙))
今天闲着无事,就想写点东西。然后听了下歌,就打算写个播放器。 于是乎用h5 audio的加上js简单的播放器完工了。 演示地点演示 html代码如下` music 这个年纪 七月的风 音乐 ` 然后就是css`*{ margin: 0; padding: 0; text-decoration: none; list-...
Python十大装B语法
Python 是一种代表简单思想的语言,其语法相对简单,很容易上手。不过,如果就此小视 Python 语法的精妙和深邃,那就大错特错了。本文精心筛选了最能展现 Python 语法之精妙的十个知识点,并附上详细的实例代码。如能在实战中融会贯通、灵活使用,必将使代码更为精炼、高效,同时也会极大提升代码B格,使之看上去更老练,读起来更优雅。
数据库优化 - SQL优化
以实际SQL入手,带你一步一步走上SQL优化之路!
2019年11月中国大陆编程语言排行榜
2019年11月2日,我统计了某招聘网站,获得有效程序员招聘数据9万条。针对招聘信息,提取编程语言关键字,并统计如下: 编程语言比例 rank pl_ percentage 1 java 33.62% 2 cpp 16.42% 3 c_sharp 12.82% 4 javascript 12.31% 5 python 7.93% 6 go 7.25% 7 p...
通俗易懂地给女朋友讲:线程池的内部原理
餐盘在灯光的照耀下格外晶莹洁白,女朋友拿起红酒杯轻轻地抿了一小口,对我说:“经常听你说线程池,到底线程池到底是个什么原理?”
经典算法(5)杨辉三角
写在前面: 我是 扬帆向海,这个昵称来源于我的名字以及女朋友的名字。我热爱技术、热爱开源、热爱编程。技术是开源的、知识是共享的。 这博客是对自己学习的一点点总结及记录,如果您对 Java、算法 感兴趣,可以关注我的动态,我们一起学习。 用知识改变命运,让我们的家人过上更好的生活。 目录一、杨辉三角的介绍二、杨辉三角的算法思想三、代码实现1.第一种写法2.第二种写法 一、杨辉三角的介绍 百度
腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹?
昨天,有网友私信我,说去阿里面试,彻底的被打击到了。问了为什么网上大量使用ThreadLocal的源码都会加上private static?他被难住了,因为他从来都没有考虑过这个问题。无独有偶,今天笔者又发现有网友吐槽了一道腾讯的面试题,我们一起来看看。 腾讯算法面试题:64匹马8个跑道需要多少轮才能选出最快的四匹? 在互联网职场论坛,一名程序员发帖求助到。二面腾讯,其中一个算法题:64匹...
面试官:你连RESTful都不知道我怎么敢要你?
干货,2019 RESTful最贱实践
为啥国人偏爱Mybatis,而老外喜欢Hibernate/JPA呢?
关于SQL和ORM的争论,永远都不会终止,我也一直在思考这个问题。昨天又跟群里的小伙伴进行了一番讨论,感触还是有一些,于是就有了今天这篇文。 声明:本文不会下关于Mybatis和JPA两个持久层框架哪个更好这样的结论。只是摆事实,讲道理,所以,请各位看官勿喷。 一、事件起因 关于Mybatis和JPA孰优孰劣的问题,争论已经很多年了。一直也没有结论,毕竟每个人的喜好和习惯是大不相同的。我也看...
项目中的if else太多了,该怎么重构?
介绍 最近跟着公司的大佬开发了一款IM系统,类似QQ和微信哈,就是聊天软件。我们有一部分业务逻辑是这样的 if (msgType = "文本") { // dosomething } else if(msgType = "图片") { // doshomething } else if(msgType = "视频") { // doshomething } else { // doshom...
致 Python 初学者
欢迎来到“Python进阶”专栏!来到这里的每一位同学,应该大致上学习了很多 Python 的基础知识,正在努力成长的过程中。在此期间,一定遇到了很多的困惑,对未来的学习方向感到迷茫。我非常理解你们所面临的处境。我从2007年开始接触 python 这门编程语言,从2009年开始单一使用 python 应对所有的开发工作,直至今天。回顾自己的学习过程,也曾经遇到过无数的困难,也曾经迷茫过、困惑过。开办这个专栏,正是为了帮助像我当年一样困惑的 Python 初学者走出困境、快速成长。希望我的经验能真正帮到你
Python 编程实用技巧
Python是一门很灵活的语言,也有很多实用的方法,有时候实现一个功能可以用多种方法实现,我这里总结了一些常用的方法,并会持续更新。
“狗屁不通文章生成器”登顶GitHub热榜,分分钟写出万字形式主义大作
一、垃圾文字生成器介绍 最近在浏览GitHub的时候,发现了这样一个骨骼清奇的雷人项目,而且热度还特别高。 项目中文名:狗屁不通文章生成器 项目英文名:BullshitGenerator 根据作者的介绍,他是偶尔需要一些中文文字用于GUI开发时测试文本渲染,因此开发了这个废话生成器。但由于生成的废话实在是太过富于哲理,所以最近已经被小伙伴们给玩坏了。 他的文风可能是这样的: 你发现,
程序员:我终于知道post和get的区别
IT界知名的程序员曾说:对于那些月薪三万以下,自称IT工程师的码农们,其实我们从来没有把他们归为我们IT工程师的队伍。他们虽然总是以IT工程师自居,但只是他们一厢情愿罢了。 此话一出,不知激起了多少(码农)程序员的愤怒,却又无可奈何,于是码农问程序员。 码农:你知道get和post请求到底有什么区别? 程序员:你看这篇就知道了。 码农:你月薪三万了? 程序员:嗯。 码农:你是怎么做到的? 程序员:
"狗屁不通文章生成器"登顶GitHub热榜,分分钟写出万字形式主义大作
前言 GitHub 被誉为全球最大的同性交友网站,……,陪伴我们已经走过 10+ 年时间,它托管了大量的软件代码,同时也承载了程序员无尽的欢乐。 上周给大家分享了一篇10个让你笑的合不拢嘴的Github项目,而且还拿了7万+个Star哦,有兴趣的朋友,可以看看, 印象最深刻的是 “ 呼吸不止,码字不停 ”: 老实交代,你是不是经常准备写个技术博客,打开word后瞬间灵感便秘,码不出字? 有什么
推荐几款比较实用的工具,网站
1.盘百度PanDownload 这个云盘工具是免费的,可以进行资源搜索,提速(偶尔会抽风????) 不要去某站买付费的???? PanDownload下载地址 2.BeJSON 这是一款拥有各种在线工具的网站,推荐它的主要原因是网站简洁,功能齐全,广告相比其他广告好太多了 bejson网站 3.二维码美化 这个网站的二维码美化很好看,网站界面也很...
《程序人生》系列-这个程序员只用了20行代码就拿了冠军
你知道的越多,你不知道的越多 点赞再看,养成习惯GitHub上已经开源https://github.com/JavaFamily,有一线大厂面试点脑图,欢迎Star和完善 前言 这一期不算《吊打面试官》系列的,所有没前言我直接开始。 絮叨 本来应该是没有这期的,看过我上期的小伙伴应该是知道的嘛,双十一比较忙嘛,要值班又要去帮忙拍摄年会的视频素材,还得搞个程序员一天的Vlog,还要写BU
相关热词 c#选择结构应用基本算法 c# 收到udp包后回包 c#oracle 头文件 c# 序列化对象 自定义 c# tcp 心跳 c# ice连接服务端 c# md5 解密 c# 文字导航控件 c#注册dll文件 c#安装.net
立即提问