[HIVE-15313] Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: None
Labels:
None

Description

According to wiki, run queries in HOS16 and HOS20 in yarn mode.
Following table shows the difference in query time between HOS16 and HOS20.

Version	Total time	Time for Jobs	Time for preparing jobs
Spark16	51	39	12
Spark20	54	40	14

HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing the source code of spark, found that following point causes this:
code:Client#distribute, In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in spark configuration file, it will first copy all jars in $SPARK_HOME/jars to a tmp directory and upload the tmp directory to distribute cache. Comparing spark16,
In spark16, it searches spark-assembly*.jar and upload it to distribute cache.

In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars".

We can accelerate the startup of hive on spark 20 by settintg "spark.yarn.archive" or "spark.yarn.jars":
set "spark.yarn.archive":

cd $SPARK_HOME/jars
zip spark-archive.zip ./*.jar # this is important, enter the jars folder then zip
$ hadoop fs -copyFromLocal spark-archive.zip 
$ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> conf/spark-defaults.conf

set "spark.yarn.jars":

$ hadoop fs mkdir spark-2.0.0-bin-hadoop 
$hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop 
$ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> conf/spark-defaults.conf

Suggest to add this part in wiki.

performance.improvement.after.set.spark.yarn.archive.PNG shows the detail performance impovement after setting spark.yarn.archive in small queries.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

performance.improvement.after.set.spark.yarn.archive.PNG
30/Nov/16 06:44
74 kB
liyunzhang

Activity

People

Assignee:: liyunzhang

Reporter:: liyunzhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/Nov/16 04:47

Updated:: 21/Jul/17 18:26

Resolved:: 03/Jan/17 06:22