Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.5
-
None
Description
By default "mapreduce.admin.user.env" property for mapreduce2 and "tez.am.launch.env"/"tez.task.launch.env" for tez contains a path with a specific hdp version, like the next one:
LD_LIBRARY_PATH=/usr/hdp/3.1.5.xx-x/hadoop/lib/native:/usr/hdp/3.1.5.xx-x/hadoop/lib/native/Linux-{{architecture}}-64
As result after patch upgrade (for Tez, MR2, YARN) tez/mapreduce jobs point to paths with old versions. And in the case we add new hosts with NodeManager only (without any client) we don't have these libraries by the old paths. So I am going to change the default value to the next one:
/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-{{architecture}}-64
This path is always exists (because NodeManager installs hadoop-client package as dependency).
Also there is an issue with yarn-shuffle.jar path selection. By default we choose the spark client's version in the case a cluster has at least one spark client at any host:
/usr/hdp/<sparkClientHDPVersion>/spark2/aux/spark-*-yarn-shuffle.jar
But this way stops working in the case we have hosts without spark client. I am going to use yarn's yarn-shuffle.jar path for the same hosts:
/usr/hdp/<yarnClientHDPVersion>/spark2/aux/spark-*-yarn-shuffle.jar