海边微风落
2021-08-17 11:37
采纳率: 100%
浏览 109
已结题

hive使用Tez引擎失败

相关环境描述:
1.java版本:jdk1.8_212
2.hadoop版本:3.1.3
3.hive版本:3.1.2
4.tez版本:0.9.2

问题描述:
在hive中使用tez引擎报错,就连更换回mr也使用不了了,具体原因如下:

0: jdbc:hive2://node01:10000> select count(1) from emp;
INFO  : Compiling command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0): select count(1) from emp
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0); Time taken: 0.146 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0): select count(1) from emp
INFO  : Query ID = root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
WARN  : The session: sessionId=c7f2b41a-0426-40ae-993b-b3f17640520b, queueName=null, user=root, doAs=true, isOpen=false, isDefault=false has not been opened
INFO  : Subscribed to counters: [] for queryId: root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0
INFO  : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1629168275110_0004 failed 2 times due to AM Container for appattempt_1629168275110_0004_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-08-17 11:34:23.036]Exception from container-launch.
Container id: container_1629168275110_0004_02_000001
Exit code: 1

[2021-08-17 11:34:23.038]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


[2021-08-17 11:34:23.039]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


For more detailed output, check the application tracking page: http://node02:8088/cluster/app/application_1629168275110_0004 Then click on links to logs of each attempt.
. Failing the application.
    at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:1013) ~[tez-api-0.9.2.jar:0.9.2]
    at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:982) ~[tez-api-0.9.2.jar:0.9.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:453) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:368) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:124) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:245) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:368) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:195) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) ~[hive-service-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) ~[hive-service-3.1.2.jar:3.1.2]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_212]
    at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_212]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.3.jar:?]
    at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) ~[hive-service-3.1.2.jar:3.1.2]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_212]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO  : Completed executing command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0); Time taken: 4.709 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)


相关配置信息:
1.hadoop core-site

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
    </property>
    <!-- 文件存储目录 -->
    <property>
        <name>hadoop.data.dir</name>
        <value>/opt/servers/hadoop-3.1.3/datas</value>
    </property>
    <!-- 临时文件存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/servers/hadoop-3.1.3/datas/tmp</value>
    </property>
    <property>
            <name>hadoop.proxyuser.root.hosts</name>
            <value>*</value>
     </property>
    <property>
            <name>hadoop.proxyuser.root.groups</name>
            <value>*</value>
    </property>
        <property>
            <name>hadoop.http.staticuser.user</name>
            <value>root</value>
        </property>

</configuration>

2.hadoop capacity-schedular

<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,MyLine</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
     </description>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.default-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
     </description>
   </property>

    <property>
    <name>yarn.scheduler.capacity.root.MyLine.capacity</name>
    <value>60</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.user-limit-factor</name>
    <value>1</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.maximum-capacity</name>
    <value>80</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.state</name>
    <value>RUNNING</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_submit_applications</name>
    <value>*</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_administer_queue</name>
    <value>*</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_application_max_priority</name>
    <value>*</value>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.MyLine.maximum-application-lifetime
     </name>
     <value>-1</value>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.MyLine.default-application-lifetime
     </name>
     <value>-1</value>
   </property>
    
  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

</configuration>


  1. hadoop hdfs-site
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.data.dir}/namenode</value>
    </property>
    <!-- datanode数据的存放路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.data.dir}/datanode</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file://${hadoop.data.dir}/namesecondry</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node02:9868</value>
    </property>
    <property>
        <name>dfs.client.datanode-restart.timeout</name>
        <value>30</value>
    </property>

</configuration>


4.hadoop mapred-site


```html
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
    </property>

    <!-- 历史服务器端地址 -->
    <property>
            <name>mapreduce.jobhistory.address</name>
            <value>node01:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>node01:19888</value>
    </property>
    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>2048</value>
    </property>
    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx2048M</value>
    </property>
    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>4096</value>
    </property>
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Xmx4096M</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*, 
        $HADOOP_COMMON_HOME/lib/*, 
        $HADOOP_HDFS_HOME/*, 
        $HADOOP_HDFS_HOME/lib/*, 
        $HADOOP_MAPRED_HOME/*, 
        $HADOOP_MAPRED_HOME/lib/*, 
        $HADOOP_YARN_HOME/*, 
        $HADOOP_YARN_HOME/lib/*</value>
    </property>
</configuration>

  1. hadoop yarn -site
<configuration>
    <!-- 设置不检查虚拟内存的值,不然内存不够会报错 -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property> 
    <!--检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉 -->
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property> 
    
    <!-- yarn上面运行一个任务,最少需要1.5G内存,虚拟机没有这么大的内存就调小这个值,不然会报错 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node02</value>
    </property>
    <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
        </property>
    <!-- 开启日志聚集 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>  
        <name>yarn.log.server.url</name>  
        <value>http:/node01:19888/jobhistory/logs</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <property>
        <name>yarn.scheduler.fair.user-as-default-queue</name>
        <value>MyLine</value>
    </property>


</configuration>

  1. hive-site
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://node01:3306/metastore?useSSL=false</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value> 
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node01:9083</value>
    </property>

    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node01</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.execution.engine</name>
        <value>tez</value>
    </property>

</configuration>

  1. tez-site
<configuration>
<property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez/tez.tar.gz</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>false</value>
</property>
<property>
     <name>tez.history.logging.service.class</name>
     <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>

<property>
    <name>tez.am.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.am.resource.cpu.vcores</name>
    <value>1</value>
</property>
<property>
    <name>tez.container.max.java.heap.fraction</name>
    <value>0.4</value>
</property>
<property>
    <name>tez.task.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.task.resource.cpu.vcores</name>
    <value>1</value>
</property>
</configuration>


  1. hadoop hadoop-env
export TEZ_CONF_DIR=$HADOOP_HOME/etc/hadoop
export TEZ_JARS=/opt/servers/tez
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

9.环境变量相关设置

#JAVA_HOME
export JAVA_HOME=/opt/servers/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

##HADOOP_HOME
export HADOOP_HOME=/opt/servers/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/servers/zookeeper-3.5.7
export PATH=$PATH:$ZOOKEEPER_HOME/bin

#HIVE_HOME
export HIVE_HOME=/opt/servers/hive
export PATH=$PATH:$HIVE_HOME/bin
救救孩子吧,要被折磨疯了
  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 邀请回答

2条回答 默认 最新

  • CSDN专家-微编程 2021-08-17 23:11
    最佳回答

    关闭所有进程,将配置文件里的所有的中文注释全部删除,记得全部删除,然后保存,完成后从新启动,再次尝试你的操作

    评论
    解决 1 无用
    打赏 举报
查看更多回答(1条)

相关推荐 更多相似问题