Details
Description
We have pretty basic installation of Hadoop, Hive and Zookeeper and need to use LLAP with YARN services - because as far as I can judge Slider is dead and YARN services is generic mechanism for such jobs as LLAP.
In accordance with https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html I added
hadoop.registry.zk.quorum: <ACTUAL QUORUM>
to core-site.xml and
yarn.webapp.api-service.enable: True
to yarn-site.xml.
This enabled me to run simple example from this page.
Next under hive user I issued (without --auxhbase=false this fails):
/usr/lib/hive/bin/hive --service llap --name llaptest --instances 2 --size 2g --auxhbase=false
which gave me:
-bash-4.2$ ls -l llap-yarn-18Nov2019/ total 116932 -rw-rw-r--. 1 hive hive 119725946 Nov 18 10:00 llap-18Nov2019.tar.gz -rwx------. 1 hive hive 249 Nov 18 10:00 run.sh drwx------. 5 hive hive 88 Nov 18 10:00 test -rw-rw-r--. 1 hive hive 1777 Nov 18 11:51 Yarnfile
and run.sh started YARN service.
The problem is: AM for LLAP is started, but containers of an application fail perpetually. I can see this by logs and RM UI - 1-2 containers spawn in a second.
Logs showed this (hostname is replaced):
cat /var/log/hadoop-yarn/userlogs/application_1574064939102_0006/container_1574064939102_0006_01_000002/llap-daemon-hive-hostname.out ... + exec /usr/lib/jvm/jre-openjdk/bin/java -Dproc_llapdaemon -Xms4096m -Xmx4096m -Dhttp.maxConnections=5 -server -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+PrintGCDetails -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=100M -XX:+PrintGCDateStamps -Xloggc:/var/log/hadoop-yarn/userlogs/application_1574064939102_0006/container_1574064939102_0006_01_000002/gc_2019-11-18-15.log -XX:+UseParallelGC -Djava.io.tmpdir=/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/tmp/ -Dlog4j.configurationFile=llap-daemon-log4j2.properties -Dllap.daemon.log.dir=/var/log/hadoop-yarn/userlogs/application_1574064939102_0006/container_1574064939102_0006_01_000002 -Dllap.daemon.log.file=llap-daemon-hive-hostname.log -Dllap.daemon.root.logger=query-routing -Dllap.daemon.log.level=INFO -classpath '/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib/conf/:/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib//lib/*:/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib//lib/tez/*:/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib//lib/udfs/*:.:/srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib/lib/*.jar' org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon Error: Could not find or load main class org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon ...
Analyzing this led me to conclusion, that /srv/hadoop-yarn/nm-local/usercache/hive/appcache/application_1574064939102_0006/container_1574064939102_0006_01_000002/lib//lib/* actually contains /usr/lib/hive/lib/hive-llap-server-3.1.1.jar with org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon class.
Then I tried to do this:
# java -Dproc_llapdaemon -classpath '/usr/lib/hive/lib/*:/etc/hive/conf/*' 'org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon' Error: Could not find or load main class org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon # java -Dproc_llapdaemon -classpath /usr/lib/hive/lib/hive-llap-server-3.1.1.jar 'org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon' Error: Could not find or load main class org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon
Which puzzled me even more.
Please help me to start LLAP with YARN services correctly.