Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When loading large number of partitions in cloud storage, notification log takes lot longer time to list newly added files.
It would be good to explore if FileStatus can be reused from Hive::listFilesCreatedByQuery or from copyFiles
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031) at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4171) at org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3566) at org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:3519) at org.apache.hadoop.hive.ql.metadata.Hive.addWriteNotificationLog(Hive.java:3504) at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2984) at org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:730) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:490) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:484)
Attachments
Issue Links
- is related to
-
HIVE-24669 Improve Filesystem usage in Hive::loadPartitionInternal
- Closed