Hive
  1. Hive
  2. HIVE-1852

Reduce unnecessary DFSClient.rename() calls

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      In Hive client side (MoveTask etc), DFSCleint.rename() is called for every file inside a directory. It is very expensive for a large directory in a busy DFS namenode. We should replace it with a single rename() call on the whole directory.

      1. HIVE-1852.patch
        0.9 kB
        Ning Zhang
      2. HIVE-1852.6.patch
        29 kB
        Ning Zhang
      3. HIVE-1852.5.patch
        28 kB
        Ning Zhang
      4. HIVE-1852.4.patch
        31 kB
        Ning Zhang
      5. HIVE-1852.3.patch
        20 kB
        Ning Zhang
      6. HIVE-1852.2.patch
        8 kB
        Ning Zhang

        Activity

        Hide
        Joydeep Sen Sarma added a comment -

        are u sure this is ok? it seems we have changed the semantics - the old code takes each file from underneath the dir and moves into final location. the new code moves the directory underneath the final location. there's one extra level of directory in the new code that's not there in the old code. also - the semantics in terms of collisions changes because of this. if we create a subdir - then there may not be collisions in the new code (because of rename) that may occur in the old code.

        Show
        Joydeep Sen Sarma added a comment - are u sure this is ok? it seems we have changed the semantics - the old code takes each file from underneath the dir and moves into final location. the new code moves the directory underneath the final location. there's one extra level of directory in the new code that's not there in the old code. also - the semantics in terms of collisions changes because of this. if we create a subdir - then there may not be collisions in the new code (because of rename) that may occur in the old code.
        Hide
        Hairong Kuang added a comment -

        Joydeep is right. I did not realize that you try to merge multiple source directories into one.

        Show
        Hairong Kuang added a comment - Joydeep is right. I did not realize that you try to merge multiple source directories into one.
        Hide
        Ning Zhang added a comment -

        The directory structures should not be changed by the patch (I tested with a simple insert overwrite and dynamic partition inserts on production and verified the results). The full unit tests are still running.

        The rationale is that the old code actually go over each file (item_i) in src and rename it to tmpPath/item_i. The new code simply rename src to tmpPath. About the name collision, the tmpPath need to be created first because it's parent may not exists (in case of multiple levels of dynamic partitions), in which case rename will fail. If the directory tmpPath is created first and then rename src to tmpPath, it won't fail and in reality it move all subdirectories/files in src to tmpPath. There is no documentation about the collision semantics but it is not guaranteed I can change the code.

        Show
        Ning Zhang added a comment - The directory structures should not be changed by the patch (I tested with a simple insert overwrite and dynamic partition inserts on production and verified the results). The full unit tests are still running. The rationale is that the old code actually go over each file (item_i) in src and rename it to tmpPath/item_i. The new code simply rename src to tmpPath. About the name collision, the tmpPath need to be created first because it's parent may not exists (in case of multiple levels of dynamic partitions), in which case rename will fail. If the directory tmpPath is created first and then rename src to tmpPath, it won't fail and in reality it move all subdirectories/files in src to tmpPath. There is no documentation about the collision semantics but it is not guaranteed I can change the code.
        Hide
        Joydeep Sen Sarma added a comment -

        old code rename src/item_i to tmpPath/item_i
        new code renames src to tmpPath/src. item_i's final position is tmpPath/src/item_i

        what am i missing?

        Show
        Joydeep Sen Sarma added a comment - old code rename src/item_i to tmpPath/item_i new code renames src to tmpPath/src. item_i's final position is tmpPath/src/item_i what am i missing?
        Hide
        Joydeep Sen Sarma added a comment -

        u need a test where src is a directory. please try different variants of load commands.

        Show
        Joydeep Sen Sarma added a comment - u need a test where src is a directory. please try different variants of load commands.
        Hide
        Joydeep Sen Sarma added a comment -

        oh - sorry - my previous comment was wrong. if u rename src to tmpPath - u lose all previous contents of tmpPath. which is not what the function does - it retains contents of tmpPath that don't collide with contents in src (the load command uses this i think and merges data in).

        maybe we need a different call for the specific case u are trying to optimize. if this is not being picked up by the tests - that's pretty bad ..

        Show
        Joydeep Sen Sarma added a comment - oh - sorry - my previous comment was wrong. if u rename src to tmpPath - u lose all previous contents of tmpPath. which is not what the function does - it retains contents of tmpPath that don't collide with contents in src (the load command uses this i think and merges data in). maybe we need a different call for the specific case u are trying to optimize. if this is not being picked up by the tests - that's pretty bad ..
        Hide
        Ning Zhang added a comment -

        @Joydeep, replaceFiles() is called by loadTable with the "replace" flag turned on, which mean it should overwrite the destination directory. Also tmpPath is a temporary path that should not exist before this call. Later in the function tmpPath is rename again to the destination path (where the existing files in destf will be removed).

        The non-overwriting version is implemented in copyFiles(). So I think we don't need another function.

        I also added a test case to test the load data (overwrite or not) works as expected to the new patch (no code changes from the first one).

        Show
        Ning Zhang added a comment - @Joydeep, replaceFiles() is called by loadTable with the "replace" flag turned on, which mean it should overwrite the destination directory. Also tmpPath is a temporary path that should not exist before this call. Later in the function tmpPath is rename again to the destination path (where the existing files in destf will be removed). The non-overwriting version is implemented in copyFiles(). So I think we don't need another function. I also added a test case to test the load data (overwrite or not) works as expected to the new patch (no code changes from the first one).
        Hide
        Joydeep Sen Sarma added a comment -

        what happens when there are multiple srcs:

        srcs = fs.listStatus(srcf);
        for (FileStatus src : srcs) {
        if (!fs.rename(src.getPath(), tmppath)) {

        we can't rename multiple sources to the same target. this happens when the input path is a glob (which i think the load command allows).

        can we special case this optimization (apply only when srcs.length == 1)?

        Show
        Joydeep Sen Sarma added a comment - what happens when there are multiple srcs: srcs = fs.listStatus(srcf); for (FileStatus src : srcs) { if (!fs.rename(src.getPath(), tmppath)) { we can't rename multiple sources to the same target. this happens when the input path is a glob (which i think the load command allows). can we special case this optimization (apply only when srcs.length == 1)?
        Hide
        Ning Zhang added a comment -

        Taking Hairong and Joydeep's comments.

        The original implementation assumes srcf be a path potentially containing wildcards, but in the current code path wildcards in the 'LOAD DATA' commands are handled differently (first a copy task handles the wildcards and then followed by a move task which calls the replaceFiles() function). So srcf should be a single leaf directory (although we don't prevent subdirectories inside srcf).

        Because of this I simplified the function by renaming srcf to destf and eliminated tmppath. The .3 patch passed all unit tests.

        Show
        Ning Zhang added a comment - Taking Hairong and Joydeep's comments. The original implementation assumes srcf be a path potentially containing wildcards, but in the current code path wildcards in the 'LOAD DATA' commands are handled differently (first a copy task handles the wildcards and then followed by a move task which calls the replaceFiles() function). So srcf should be a single leaf directory (although we don't prevent subdirectories inside srcf). Because of this I simplified the function by renaming srcf to destf and eliminated tmppath. The .3 patch passed all unit tests.
        Hide
        Joydeep Sen Sarma added a comment -

        Ning - do you know why we had the FsShell and -rmr code (in Hive.replaceFiles) ? i don't think it was there originally and there must have been a reason why it got put in. this patch is taking it out and i wanted to be sure this is ok. (I am wondering if the -rmr is there to handle non hdfs file systems)

        Show
        Joydeep Sen Sarma added a comment - Ning - do you know why we had the FsShell and -rmr code (in Hive.replaceFiles) ? i don't think it was there originally and there must have been a reason why it got put in. this patch is taking it out and i wanted to be sure this is ok. (I am wondering if the -rmr is there to handle non hdfs file systems)
        Show
        He Yongqiang added a comment - https://issues.apache.org/jira/browse/HIVE-1707
        Hide
        Ning Zhang added a comment -

        Yes, it's because the oldPath could be from a different file system. In this patch, a new file system is got from oldPath and it is used to delete oldPath. It should be equivalent to FsShell. Hairong mentioned that FsShell is relatively slow and more expensive that calling FileSystem.delete.

        Yongqiang, is there a test case for HIVE-1707?

        Show
        Ning Zhang added a comment - Yes, it's because the oldPath could be from a different file system. In this patch, a new file system is got from oldPath and it is used to delete oldPath. It should be equivalent to FsShell. Hairong mentioned that FsShell is relatively slow and more expensive that calling FileSystem.delete. Yongqiang, is there a test case for HIVE-1707 ?
        Hide
        Joydeep Sen Sarma added a comment -

        cool - the fsshell removal sounds good unless Yongqiang says something otherwise.

        i am pretty sure this patch breaks load command with a wildcard though. it seems to me that the load command is simply passing the input path (with the wildcard pattern) to the the loadTable/loadPartition methods (via LoadTableDesc). these commands were previously capable of handling wildcards that matched a set of files. now they will not be able to do that. Ning - can u confirm this? (maybe add a test trying to load a wildcard pattern?)

        on a more minor note - the checkPaths call that got taken out was checking for the presence of nested subdirectories inside the path being loaded. is this no longer necessary? (do we support directories within partitions/tables automatically at query time?)

        Show
        Joydeep Sen Sarma added a comment - cool - the fsshell removal sounds good unless Yongqiang says something otherwise. i am pretty sure this patch breaks load command with a wildcard though. it seems to me that the load command is simply passing the input path (with the wildcard pattern) to the the loadTable/loadPartition methods (via LoadTableDesc). these commands were previously capable of handling wildcards that matched a set of files. now they will not be able to do that. Ning - can u confirm this? (maybe add a test trying to load a wildcard pattern?) on a more minor note - the checkPaths call that got taken out was checking for the presence of nested subdirectories inside the path being loaded. is this no longer necessary? (do we support directories within partitions/tables automatically at query time?)
        Hide
        He Yongqiang added a comment -

        I remember i tried to use FileSystem.delete. but it did not work, so changed to use FsShell. So possibly you need to revert the change.
        There is no testcase in Hive covering it. I did the test manually in our cluster.

        Show
        He Yongqiang added a comment - I remember i tried to use FileSystem.delete. but it did not work, so changed to use FsShell. So possibly you need to revert the change. There is no testcase in Hive covering it. I did the test manually in our cluster.
        Hide
        Ning Zhang added a comment -

        Yongqiang, do you have the command/script that can reproduce the use case there oldPath is from a different FS? Theoretically using FsShell and FileSystem.delete should be the same (FsShell should internally call FileSystem.delete). Did you get a new file system from oldPath or use the same from other paths?

        Show
        Ning Zhang added a comment - Yongqiang, do you have the command/script that can reproduce the use case there oldPath is from a different FS? Theoretically using FsShell and FileSystem.delete should be the same (FsShell should internally call FileSystem.delete). Did you get a new file system from oldPath or use the same from other paths?
        Hide
        Ning Zhang added a comment -

        @joydeep, I've tested the case like 'LOAD DATA LOCAL INPATH '/dir/.txt' INTO TABLE blah and it worked. Is this what you meant by wildcards? If so I'll add a test case if it doesn't not exist already. The plan for that is to generate a CopyTask first from /dir/.txt to a temp dir say /dir/tmp-1000 and the MoveTask is moving /dir/tmp-1000 to the table's destination location. replaceFiles is only called in the MoveTask so that worked.

        Also the only reason checkPath is there for replaceFiles is to check nested subdirectories. For INSERT OVERWRITE (whether be dynamic or static partitions) the temporary directory should not contain any sub directories. For LOAD DATA commands, Namit said we don't have any use cases for loading a directory containing subdirectories, but it should be enabled. That's why I removed it. But if we want to ensure no subdirectories, I can bring it back.

        Show
        Ning Zhang added a comment - @joydeep, I've tested the case like 'LOAD DATA LOCAL INPATH '/dir/ .txt' INTO TABLE blah and it worked. Is this what you meant by wildcards? If so I'll add a test case if it doesn't not exist already. The plan for that is to generate a CopyTask first from /dir/ .txt to a temp dir say /dir/tmp-1000 and the MoveTask is moving /dir/tmp-1000 to the table's destination location. replaceFiles is only called in the MoveTask so that worked. Also the only reason checkPath is there for replaceFiles is to check nested subdirectories. For INSERT OVERWRITE (whether be dynamic or static partitions) the temporary directory should not contain any sub directories. For LOAD DATA commands, Namit said we don't have any use cases for loading a directory containing subdirectories, but it should be enabled. That's why I removed it. But if we want to ensure no subdirectories, I can bring it back.
        Hide
        Joydeep Sen Sarma added a comment -

        regarding wildcards - load data inpath /x/*.txt - does that work? the copy task should not happen if the source and destination file systems are the same (in the load command). i think u may have observed the copy command kick in because in the hive unit test environment - the default file system is pfile:// and local file system is not same as pfile:// (from a java class standpoint). so copytask kicks in. you might want to try a load command without the 'local' keyword.

        also - regarding the question about filesystem.delete versus fsshell - testing should be easy because we have two working file systems in our test environment (file:// and pfile://). we can create a partition pointing to one filesystem and then try to change it's location to the other filesystem and make sure things work.

        Show
        Joydeep Sen Sarma added a comment - regarding wildcards - load data inpath /x/*.txt - does that work? the copy task should not happen if the source and destination file systems are the same (in the load command). i think u may have observed the copy command kick in because in the hive unit test environment - the default file system is pfile:// and local file system is not same as pfile:// (from a java class standpoint). so copytask kicks in. you might want to try a load command without the 'local' keyword. also - regarding the question about filesystem.delete versus fsshell - testing should be easy because we have two working file systems in our test environment ( file:// and pfile://). we can create a partition pointing to one filesystem and then try to change it's location to the other filesystem and make sure things work.
        Hide
        He Yongqiang added a comment -

        @Joy, the FsShell delete is there because FB's hadoop is doing a slightly different implementation for the FileSystem.delete.

        Show
        He Yongqiang added a comment - @Joy, the FsShell delete is there because FB's hadoop is doing a slightly different implementation for the FileSystem.delete.
        Hide
        Ning Zhang added a comment -

        @joydeep, you are right. if there is no 'local' keyword there is no CopyTask and MoveTask takes wildcards. I'm uploading a new patch address wildcards. Also reverted to FsShell to use oldPath for the reason Yongqiang mentioned.

        I'm running unit tests.

        Show
        Ning Zhang added a comment - @joydeep, you are right. if there is no 'local' keyword there is no CopyTask and MoveTask takes wildcards. I'm uploading a new patch address wildcards. Also reverted to FsShell to use oldPath for the reason Yongqiang mentioned. I'm running unit tests.
        Hide
        Ning Zhang added a comment -

        Uploading HIVE-1852.5.patch containing a simple log fix in load_overwrite.q.

        Show
        Ning Zhang added a comment - Uploading HIVE-1852 .5.patch containing a simple log fix in load_overwrite.q.
        Hide
        Joydeep Sen Sarma added a comment -

        Hive..java:1564 - this should read fs.rename(srcs[0]) (since srcf may have been a wildcard that matched a single dir)
        Hive.java:1574 - we can optimize this loop i think. if the wildcard does not match a single directory - then it has to match a set of files. the loadsemantic analyzer already enforces this. so we don't need a second listStatus and loop over the entries here. can directly move each of the srcs into destf/srcs.getName

        we have lost the atomic move for the wildcard case. i think it's ok (it's not used much i would imagine) - at least leave a note/todo saying that this would be nice to have atomic.

        new tests look pretty good to me - the load/move case with wildcards is getting covered. we could add one where the load path is a wildcard that matches a single dir to cover the first comment here.

        Show
        Joydeep Sen Sarma added a comment - Hive..java:1564 - this should read fs.rename(srcs [0] ) (since srcf may have been a wildcard that matched a single dir) Hive.java:1574 - we can optimize this loop i think. if the wildcard does not match a single directory - then it has to match a set of files. the loadsemantic analyzer already enforces this. so we don't need a second listStatus and loop over the entries here. can directly move each of the srcs into destf/srcs.getName we have lost the atomic move for the wildcard case. i think it's ok (it's not used much i would imagine) - at least leave a note/todo saying that this would be nice to have atomic. new tests look pretty good to me - the load/move case with wildcards is getting covered. we could add one where the load path is a wildcard that matches a single dir to cover the first comment here.
        Hide
        Ning Zhang added a comment -

        Take Joy's comments. Also added one more test case for wildcard matching a single directory name (load_fs.q).

        Show
        Ning Zhang added a comment - Take Joy's comments. Also added one more test case for wildcard matching a single directory name (load_fs.q).
        Hide
        Joydeep Sen Sarma added a comment -

        committed. thanks Ning.

        Show
        Joydeep Sen Sarma added a comment - committed. thanks Ning.

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Ning Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development