Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
HiveClientImpl.loadTable load files one by one, so this step will take a long time if a job generates many files. There is a Hive.moveFile api can speed up this step for create table tableName as select ... and insert overwrite table tableName select ...
Here are two APIs comparison:
loadTable api: It took about 26 minutes(10:50:14 - 11:16:18) to load table
17/04/01 10:50:04 INFO TaskSetManager: Finished task 207165.0 in stage 0.0 (TID 216796) in 5952 ms on jqhadoop-test28-8.int.yihaodian.com (executor 54) (216869/216869) 17/04/01 10:50:04 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/04/01 10:50:04 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 541.797 s 17/04/01 10:50:04 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 551.208919 s 17/04/01 10:50:04 INFO FileFormatWriter: Job null committed. 17/04/01 10:50:14 INFO Hive: Replacing src:viewfs://cluster4/user/hive/warehouse/staging/.hive-staging_hive_2017-04-01_10-40-02_349_8047899863313770218-1/-ext-10000/part-00000-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, dest: viewfs://cluster4/user/hive/warehouse/tmp.db/spark_load_slow/part-00000-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, Status:true 17/04/01 10:50:14 INFO Hive: Replacing src:viewfs://cluster4/user/hive/warehouse/staging/.hive-staging_hive_2017-04-01_10-40-02_349_8047899863313770218-1/-ext-10000/part-00001-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, dest: viewfs://cluster4/user/hive/warehouse/tmp.db/spark_load_slow/part-00001-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, Status:true ... 17/04/01 11:16:11 INFO Hive: Replacing src:viewfs://cluster4/user/hive/warehouse/staging/.hive-staging_hive_2017-04-01_10-40-02_349_8047899863313770218-1/-ext-10000/part-99999-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, dest: viewfs://cluster4/user/hive/warehouse/tmp.db/spark_load_slow/part-99999-9335c5f3-60fa-418b-a466-2d76a5e84537-c000, Status:true 17/04/01 11:16:18 INFO SparkSqlParser: Parsing command: `tmp`.`spark_load_slow` 17/04/01 11:16:18 INFO CatalystSqlParser: Parsing command: string 17/04/01 11:16:18 INFO CatalystSqlParser: Parsing command: string 17/04/01 11:16:18 INFO CatalystSqlParser: Parsing command: string 17/04/01 11:16:18 INFO CatalystSqlParser: Parsing command: string 17/04/01 11:16:18 INFO CatalystSqlParser: Parsing command: string Time taken: 2178.736 seconds 17/04/01 11:16:18 INFO CliDriver: Time taken: 2178.736 seconds
moveFile api: It took about 9 minutes(13:24:39 - 13:33:46) to load table
17/04/01 13:24:38 INFO TaskSetManager: Finished task 210610.0 in stage 0.0 (TID 216829) in 5888 ms on jqhadoop-test28-28.int.yihaodian.com (executor 59) (216869/216869) 17/04/01 13:24:38 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/04/01 13:24:38 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 532.409 s 17/04/01 13:24:38 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 539.337610 s 17/04/01 13:24:39 INFO FileFormatWriter: Job null committed. 17/04/01 13:24:39 INFO Hive: Replacing src:viewfs://cluster4/user/hive/warehouse/staging/.hive-staging_hive_2017-04-01_13-14-46_099_8962745596360417817-1/-ext-10000, dest: viewfs://cluster4/user/hive/warehouse/tmp.db/spark_load_slow_movefile, Status:true 17/04/01 13:33:46 INFO SparkSqlParser: Parsing command: `tmp`.`spark_load_slow_movefile` 17/04/01 13:33:46 INFO CatalystSqlParser: Parsing command: string 17/04/01 13:33:46 INFO CatalystSqlParser: Parsing command: string 17/04/01 13:33:46 INFO CatalystSqlParser: Parsing command: string 17/04/01 13:33:46 INFO CatalystSqlParser: Parsing command: string 17/04/01 13:33:46 INFO CatalystSqlParser: Parsing command: string Time taken: 1142.671 seconds 17/04/01 13:33:46 INFO CliDriver: Time taken: 1142.671 seconds
More log can be find in attachments.
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-12908 Improve dynamic partition loading III
- Closed
- links to