Hadoop Common
  1. Hadoop Common
  2. HADOOP-6240

Rename operation is not consistent between different implementations of FileSystem

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The rename operation has many scenarios that are not consistently implemented across file systems.

      1. hadoop-6240-5.patch
        29 kB
        Suresh Srinivas
      2. hadoop-6240-4.patch
        29 kB
        Suresh Srinivas
      3. hadoop-6240-3.patch
        29 kB
        Suresh Srinivas
      4. hadoop-6240-2.patch
        26 kB
        Suresh Srinivas
      5. hadoop-6240-1.patch
        26 kB
        Suresh Srinivas
      6. hadoop-6240.patch
        26 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          There are two problems to address:

          1. Define clearly what the behavior of rename functionality shoudl be
          2. Ensure all the FileSystem implementations adhere to it

          Before looking at how different implementations of FileSystem in hadoop behave, lets decide on the expected behavior under different boundary conditions. Posix (http://www.opengroup.org/onlinepubs/009695399/functions/rename.html) defines the following:

          1. Renaming a file to a directory or directory to a file should fail.
          2. Renaming old file to a new file that exists should succeed. Existing new file is removed before rename.
          3. Renaming old dir to a new dir that exists and is not empty should fail
          4. Renaming old dir to a new dir that exists and is empty should succeed. Existing new dir is removed before rename.
          5. The new pathname that contains the old path as prefix should fail.
          6. Renaming to a new path with non-existent directory (rename /a/b to /c/d where c does not exist) should fail

          For indicating failure in above cases, IOException will be thrown instead of returning false.

          Show
          Suresh Srinivas added a comment - There are two problems to address: Define clearly what the behavior of rename functionality shoudl be Ensure all the FileSystem implementations adhere to it Before looking at how different implementations of FileSystem in hadoop behave, lets decide on the expected behavior under different boundary conditions. Posix ( http://www.opengroup.org/onlinepubs/009695399/functions/rename.html ) defines the following: Renaming a file to a directory or directory to a file should fail. Renaming old file to a new file that exists should succeed. Existing new file is removed before rename. Renaming old dir to a new dir that exists and is not empty should fail Renaming old dir to a new dir that exists and is empty should succeed. Existing new dir is removed before rename. The new pathname that contains the old path as prefix should fail. Renaming to a new path with non-existent directory (rename /a/b to /c/d where c does not exist) should fail For indicating failure in above cases, IOException will be thrown instead of returning false .
          Hide
          Sanjay Radia added a comment -

          Rename (unlike mkdir and delete) should never return a false.
          Each return of false should really throw an exception.

          Show
          Sanjay Radia added a comment - Rename (unlike mkdir and delete) should never return a false. Each return of false should really throw an exception.
          Hide
          Suresh Srinivas added a comment -

          Based on discussion with Sanjay and Owen, here are the next steps:

          1. Semantics of existing FileSystem.rename() operation will not be changed to ensure backward compatibility. Only change to rename implementation is - Local and HDFS file systems will throw exception instead of returning false. This in most cases should work for existing applications or change required to handle this is relatively minor.
          2. FileSystem will have new protected method rename2 to be used in FileContext. This method will implement rename operation as described in the previous comment. In this jira the new rename2 will be implemented in Local and HDFS file system. For other file systems, the implementation will indicate rename2 is not implemented. Jiras will be created to track this for other file systems.
          3. New FileContext interface will use rename2 instead of rename.
          Show
          Suresh Srinivas added a comment - Based on discussion with Sanjay and Owen, here are the next steps: Semantics of existing FileSystem.rename() operation will not be changed to ensure backward compatibility. Only change to rename implementation is - Local and HDFS file systems will throw exception instead of returning false. This in most cases should work for existing applications or change required to handle this is relatively minor. FileSystem will have new protected method rename2 to be used in FileContext . This method will implement rename operation as described in the previous comment. In this jira the new rename2 will be implemented in Local and HDFS file system. For other file systems, the implementation will indicate rename2 is not implemented. Jiras will be created to track this for other file systems. New FileContext interface will use rename2 instead of rename.
          Hide
          Doug Cutting added a comment -

          I generally dislike putting version numbers in method names.

          A better option might be to define the semantics of HADOOP-6223's AbstractFileSystem#rename() to never return false but instead throw an exception, no?

          Show
          Doug Cutting added a comment - I generally dislike putting version numbers in method names. A better option might be to define the semantics of HADOOP-6223 's AbstractFileSystem#rename() to never return false but instead throw an exception, no?
          Hide
          Suresh Srinivas added a comment -

          The main issue is not throwing IOException instead of returning false. The proposal is to throw exception instead of returning false with this change. This change in behavior should already be handled at the applications. Other issues:

          1. When different filesystems have different behaviors, the semantics of HDFS will be retained. All the other filesystems will be changed to comply with HDFS.
          2. rename also has the following behavior that is not right:
            • rename(dir1, dir2) has two different behaviors. If dir2 exists then it works like move, dir1 is moved to dir2. If dir2 does not exist, dir1 is renamed as dir2. Posix has consistent behavior in both the cases. If dir2 does not exist, dir1 is renamed as dir2. If dir2 iis empty, dir2 is removed and dir1 is renamed as dir2. If dir2 is not empty, rename operation fails.
            • rename(file1, dir2) has two different behaviors. If dir2 exists then it works like move, file1 is moved to dir2. If dir2 does not exist, rename fails. Posix does not allow src as file and dst as directory.
            • rename(file1, file2) fails if file2 exists. Posix rename removes file2 and renames file1 to file2.

          For backward compatibility, rename semantics will be retained. New rename2 with posix compliant behavior will be added to FileSystem to be used by FileContext. I am not a fan of method name rename2 either. Let me know if there are better ways of naming the method.

          Show
          Suresh Srinivas added a comment - The main issue is not throwing IOException instead of returning false. The proposal is to throw exception instead of returning false with this change. This change in behavior should already be handled at the applications. Other issues: When different filesystems have different behaviors, the semantics of HDFS will be retained. All the other filesystems will be changed to comply with HDFS. rename also has the following behavior that is not right: rename(dir1, dir2) has two different behaviors. If dir2 exists then it works like move, dir1 is moved to dir2. If dir2 does not exist, dir1 is renamed as dir2. Posix has consistent behavior in both the cases. If dir2 does not exist, dir1 is renamed as dir2. If dir2 iis empty, dir2 is removed and dir1 is renamed as dir2. If dir2 is not empty, rename operation fails. rename(file1, dir2) has two different behaviors. If dir2 exists then it works like move, file1 is moved to dir2. If dir2 does not exist, rename fails. Posix does not allow src as file and dst as directory. rename(file1, file2) fails if file2 exists. Posix rename removes file2 and renames file1 to file2. For backward compatibility, rename semantics will be retained. New rename2 with posix compliant behavior will be added to FileSystem to be used by FileContext. I am not a fan of method name rename2 either. Let me know if there are better ways of naming the method.
          Hide
          Doug Cutting added a comment -

          > I am not a fan of method name rename2 either. Let me know if there are better ways of naming the method.

          I am suggesting that we add no new method to FileSystem but instead the semantics you intend for FileSystem#rename2() should be implemented by AbstractFileSystem#rename(). Does that make sense?

          Show
          Doug Cutting added a comment - > I am not a fan of method name rename2 either. Let me know if there are better ways of naming the method. I am suggesting that we add no new method to FileSystem but instead the semantics you intend for FileSystem#rename2() should be implemented by AbstractFileSystem#rename(). Does that make sense?
          Hide
          Sanjay Radia added a comment -

          Doug if you look at HADOOP-6223's AbstractFileSystem there were two options identified. I have gone for option 1 rather than 2, although
          I comment that I have now realized that option 2 is better (even you asked why are I was reconsidering going with option 2)
          Option 2 allows the solution you have proposed above.
          But option 1 does not allow that since FileContext calls FileSystem not AFS; FileContext will be changed to call AFS later (say 22?).
          Given the deadline for 21, I am stuck with option 1 and hence stuck with using a name different from "rename" (rename2 or newRename, etc).
          In FileSystem this method will be marked as protected (ie to be used only by FileContext). When we change FileContext to call AFS (mostly likely
          in 22) it will call a method called AFS#rename().

          Alternatively I can try to switch to option 2, but doubt if I will make the deadline. But even with option 2 I believe one may in a few cases need to add a special method or two in FileSystem to support the transition phase (e.g. FileContext passes absolute permission (already masked) to FileSystem during a create).

          Show
          Sanjay Radia added a comment - Doug if you look at HADOOP-6223 's AbstractFileSystem there were two options identified. I have gone for option 1 rather than 2, although I comment that I have now realized that option 2 is better (even you asked why are I was reconsidering going with option 2) Option 2 allows the solution you have proposed above. But option 1 does not allow that since FileContext calls FileSystem not AFS; FileContext will be changed to call AFS later (say 22?). Given the deadline for 21, I am stuck with option 1 and hence stuck with using a name different from "rename" (rename2 or newRename, etc). In FileSystem this method will be marked as protected (ie to be used only by FileContext). When we change FileContext to call AFS (mostly likely in 22) it will call a method called AFS#rename(). Alternatively I can try to switch to option 2, but doubt if I will make the deadline. But even with option 2 I believe one may in a few cases need to add a special method or two in FileSystem to support the transition phase (e.g. FileContext passes absolute permission (already masked) to FileSystem during a create).
          Hide
          Sanjay Radia added a comment -

          I might have been a little hasty in my comment above. Even with using AFS#rename() (as doug suggests => option 2 of HADOOP-6223), AFS#rename() will have to delegated to FileSystem during our transition from FileSystem to AFS (see the jira for details). This means we will need a FileSystem#rename2. anyway.
          The only way to escape that is to have an impl for each filesystem that extends AFS right now.

          Show
          Sanjay Radia added a comment - I might have been a little hasty in my comment above. Even with using AFS#rename() (as doug suggests => option 2 of HADOOP-6223 ), AFS#rename() will have to delegated to FileSystem during our transition from FileSystem to AFS (see the jira for details). This means we will need a FileSystem#rename2. anyway. The only way to escape that is to have an impl for each filesystem that extends AFS right now.
          Hide
          Doug Cutting added a comment -

          A protected method cannot be called by other classes, yet this must be callable by FileContext. So I'm not sure what visibility you actually intend for rename2.

          Adding temporary public methods to FileSystem seems like a non-ideal plan. Instead you might add the logic into FileContext directly, then, when AbstractFileSystem is added, move it there. This may require an extra getFileStatus() in the interim, but that doesn't seem unbearable.

          Show
          Doug Cutting added a comment - A protected method cannot be called by other classes, yet this must be callable by FileContext. So I'm not sure what visibility you actually intend for rename2. Adding temporary public methods to FileSystem seems like a non-ideal plan. Instead you might add the logic into FileContext directly, then, when AbstractFileSystem is added, move it there. This may require an extra getFileStatus() in the interim, but that doesn't seem unbearable.
          Hide
          Suresh Srinivas added a comment -

          Sorry my bad, I had written it as protected in previous comments. Actually it is package private.

          Show
          Suresh Srinivas added a comment - Sorry my bad, I had written it as protected in previous comments. Actually it is package private.
          Hide
          Doug Cutting added a comment -

          > Actually it is package private.

          If it's package-private then DistributedFileSystem cannot override it, since its in a separate package. If the only implementation will be in FileSystem, and the only caller will be in FileContext, then why not just include this logic in FileContext? Have I missed something?

          Show
          Doug Cutting added a comment - > Actually it is package private. If it's package-private then DistributedFileSystem cannot override it, since its in a separate package. If the only implementation will be in FileSystem, and the only caller will be in FileContext, then why not just include this logic in FileContext? Have I missed something?
          Hide
          Suresh Srinivas added a comment -

          Owen actually educated me that protected will also work here.

          rename2 functionality cannot be implemented in FileContext. Given the existing implementation of rename, for rename2 we have to delete existing rename destination (directories or files) before proceeding to call rename. That will not be atomic. Additionally if rename2 is not being exposed to applications and is not a public method.

          Show
          Suresh Srinivas added a comment - Owen actually educated me that protected will also work here. rename2 functionality cannot be implemented in FileContext. Given the existing implementation of rename, for rename2 we have to delete existing rename destination (directories or files) before proceeding to call rename. That will not be atomic. Additionally if rename2 is not being exposed to applications and is not a public method.
          Hide
          Doug Cutting added a comment -

          So why not make this issue depend on HADOOP-6223? Then we fix it just once, rather than first implementing a temporary fix. Until that's committed we could either (a) implement FileContext#rename() non-atomically or (b) leave this issue open.

          Show
          Doug Cutting added a comment - So why not make this issue depend on HADOOP-6223 ? Then we fix it just once, rather than first implementing a temporary fix. Until that's committed we could either (a) implement FileContext#rename() non-atomically or (b) leave this issue open.
          Hide
          Suresh Srinivas added a comment -

          I still do not understand why protected method is not an option. BTW option (b), assuming means FileContext uses current FileSystem.rename() will result in the same issue. When we want to fix the semantics of FileContext.rename(), we will have same issues of backward compatibility. Personally I am not in favor of doing rename non-atomically.

          Show
          Suresh Srinivas added a comment - I still do not understand why protected method is not an option. BTW option (b), assuming means FileContext uses current FileSystem.rename() will result in the same issue. When we want to fix the semantics of FileContext.rename(), we will have same issues of backward compatibility. Personally I am not in favor of doing rename non-atomically.
          Hide
          Doug Cutting added a comment -

          > Personally I am not in favor of doing rename non-atomically.

          Neither am I. If this issue is made dependent on HADOOP-6223, then it will be non-atomic until each AbstractFileSystem implementation implements rename. Otherwise it will be non-atomic until each FileSystem implementation implements rename2. So the work for fixing this issue is the same in either case, and we can avoid adding oddly named methods to a deprecated class.

          Show
          Doug Cutting added a comment - > Personally I am not in favor of doing rename non-atomically. Neither am I. If this issue is made dependent on HADOOP-6223 , then it will be non-atomic until each AbstractFileSystem implementation implements rename. Otherwise it will be non-atomic until each FileSystem implementation implements rename2. So the work for fixing this issue is the same in either case, and we can avoid adding oddly named methods to a deprecated class.
          Hide
          Suresh Srinivas added a comment -

          I committed this change. Thanks you Boris.

          Show
          Suresh Srinivas added a comment - I committed this change. Thanks you Boris.
          Hide
          Suresh Srinivas added a comment -

          wrong jira. Reopening the issue. Ignore my earlier comments.

          Show
          Suresh Srinivas added a comment - wrong jira. Reopening the issue. Ignore my earlier comments.
          Hide
          Konstantin Shvachko added a comment -

          I just want to comment on atomicity of rename() and whether we should be posix complaint with it. See also my earlier comment HDFS-303

          1. We should not provide atomic renames, mostly because it will extremely hard to support if we ever build a distributed namespace service. Atomic rename is relatively easy within a single name-node, but supporting it down the road for renames across two or more name-servers is really hard.

          2. We should not support posix definition for renames. Mostly because of [1] above. But also because there are other (local) file systems which don't support it either, which means that we cannot guarantee posix behavior for hadoop LocalFileSystem. E.g., if you run LocalFileSystem on a local WinNT drive then renaming of a file to an existing file and a directory to an existing empty directory (posix 2 and 4) will fail, but if you run it on an ext3 volume it will succeed.

          To support the legal basis for these inconsistencies between different file systems I cited Java definition of rename, which says:
          "Many aspects of the behavior of this method are inherently platform-dependent: The rename operation might not be able to move a file from one filesystem to another, it might not be atomic, and it might not succeed if a file with the destination abstract pathname already exists."

          Imo, rename should always be treated as a convenience method. Posix semantics for rename has been problematic for traditional file systems, for distributed ones it should be avoided by all means.
          For example, rename is the only operation in HDFS, which makes edits logs non-idempotent, and we could have done so many things so much better with idempotent journal records.

          Show
          Konstantin Shvachko added a comment - I just want to comment on atomicity of rename() and whether we should be posix complaint with it. See also my earlier comment HDFS-303 1. We should not provide atomic renames, mostly because it will extremely hard to support if we ever build a distributed namespace service. Atomic rename is relatively easy within a single name-node, but supporting it down the road for renames across two or more name-servers is really hard. 2. We should not support posix definition for renames. Mostly because of [1] above. But also because there are other (local) file systems which don't support it either, which means that we cannot guarantee posix behavior for hadoop LocalFileSystem . E.g., if you run LocalFileSystem on a local WinNT drive then renaming of a file to an existing file and a directory to an existing empty directory (posix 2 and 4) will fail, but if you run it on an ext3 volume it will succeed. To support the legal basis for these inconsistencies between different file systems I cited Java definition of rename, which says: "Many aspects of the behavior of this method are inherently platform-dependent: The rename operation might not be able to move a file from one filesystem to another, it might not be atomic, and it might not succeed if a file with the destination abstract pathname already exists." Imo, rename should always be treated as a convenience method. Posix semantics for rename has been problematic for traditional file systems, for distributed ones it should be avoided by all means. For example, rename is the only operation in HDFS, which makes edits logs non-idempotent, and we could have done so many things so much better with idempotent journal records.
          Hide
          Suresh Srinivas added a comment -

          LocalFileSystem can do necessary things (such as deleting an existing dst) before calling java File.renameTo() to make is consistent across different operating systems.

          Show
          Suresh Srinivas added a comment - LocalFileSystem can do necessary things (such as deleting an existing dst) before calling java File.renameTo() to make is consistent across different operating systems.
          Hide
          Doug Cutting added a comment -

          > We should not provide atomic renames [ ... ]

          A middle ground might be to be very clear about where they can be relied on and where they cannot, so that applications are forewarned. They might be only supported, e.g., within a single volume of unix local filesystems, or within a single hdfs filesystem.

          We might even add an atomicRename method that fails in other cases. Then applications that require atomic renames would use that method instead. However, if mapreduce job output promotion is required to be atomic, and S3 does not implement atomic rename, then one could not, e.g., use S3 as a mapreduce output directory.

          > LocalFileSystem can do necessary things (such as deleting an existing dst) before calling java File.renameTo() to make is consistent across different operating systems.

          But that's not atomic, right? If we're willing to tolerate that, then we might implement a non-atomic rename directly in FileContext, in terms of the existing FileSystem#rename(), and skip adding AbstractFileSystem for now. Things like job promotion, etc. will not be switched to use FileContext in 0.21, so, if atomicity were optional, making FileContext#rename() atomic might wait until 0.22. But, from what I've heard thus far, that's not an acceptable option.

          Show
          Doug Cutting added a comment - > We should not provide atomic renames [ ... ] A middle ground might be to be very clear about where they can be relied on and where they cannot, so that applications are forewarned. They might be only supported, e.g., within a single volume of unix local filesystems, or within a single hdfs filesystem. We might even add an atomicRename method that fails in other cases. Then applications that require atomic renames would use that method instead. However, if mapreduce job output promotion is required to be atomic, and S3 does not implement atomic rename, then one could not, e.g., use S3 as a mapreduce output directory. > LocalFileSystem can do necessary things (such as deleting an existing dst) before calling java File.renameTo() to make is consistent across different operating systems. But that's not atomic, right? If we're willing to tolerate that, then we might implement a non-atomic rename directly in FileContext, in terms of the existing FileSystem#rename(), and skip adding AbstractFileSystem for now. Things like job promotion, etc. will not be switched to use FileContext in 0.21, so, if atomicity were optional, making FileContext#rename() atomic might wait until 0.22. But, from what I've heard thus far, that's not an acceptable option.
          Hide
          Todd Lipcon added a comment -

          +1 for any solution that avoids sinking to the lowest common denominator - from an application perspective the really lax semantics make some operations quite difficult. One example is an application that wants to do the equivalent of mkdtemp(3) - I believe this to be impossible with the current FS semantics. See for example https://issues.apache.org/jira/browse/HIVE-718?focusedCommentId=12744143&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12744143

          Show
          Todd Lipcon added a comment - +1 for any solution that avoids sinking to the lowest common denominator - from an application perspective the really lax semantics make some operations quite difficult. One example is an application that wants to do the equivalent of mkdtemp(3) - I believe this to be impossible with the current FS semantics. See for example https://issues.apache.org/jira/browse/HIVE-718?focusedCommentId=12744143&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12744143
          Hide
          Konstantin Shvachko added a comment -

          It was probably too general for me to say "We should not provide atomic renames".
          What I meant is not to extend implementation of atomic rename to the case when the destination exists.
          So I am advocating to keep current semantics of renames in hdfs.
          LocalFileSystem may do the same, this will make it consistent with hdfs.
          And this is the only thing this issue should do.
          Is there a use case, when atomic rename replacing the existing destination is required?

          Show
          Konstantin Shvachko added a comment - It was probably too general for me to say "We should not provide atomic renames". What I meant is not to extend implementation of atomic rename to the case when the destination exists. So I am advocating to keep current semantics of renames in hdfs. LocalFileSystem may do the same, this will make it consistent with hdfs. And this is the only thing this issue should do. Is there a use case, when atomic rename replacing the existing destination is required?
          Hide
          Todd Lipcon added a comment -

          Is there a use case, when atomic rename replacing the existing destination is required?

          I don't have an example of one where it's strictly required, but it certainly makes application developers' lives easier. Think about the case where a distributed application uses HDFS as the source of record for some metadata (eg there is a small text file on HDFS with some application information). As an application developer it would be nice to simply open the file, read it, and be sure you won't fail. Without the atomic rename-replace, the reader can hit a race where the file has been deleted but the new version hasn't been written. Surely the application can know about this and continually retry until it gets a valid read, but it's a bit of an aggravation.

          I could certainly be convinced that this kind of behavior is out of scope for HDFS and applications should use a database or zookeeper if they need semantics like this.

          Show
          Todd Lipcon added a comment - Is there a use case, when atomic rename replacing the existing destination is required? I don't have an example of one where it's strictly required, but it certainly makes application developers' lives easier. Think about the case where a distributed application uses HDFS as the source of record for some metadata (eg there is a small text file on HDFS with some application information). As an application developer it would be nice to simply open the file, read it, and be sure you won't fail. Without the atomic rename-replace, the reader can hit a race where the file has been deleted but the new version hasn't been written. Surely the application can know about this and continually retry until it gets a valid read, but it's a bit of an aggravation. I could certainly be convinced that this kind of behavior is out of scope for HDFS and applications should use a database or zookeeper if they need semantics like this.
          Hide
          Doug Cutting added a comment -

          > it would be nice to simply open the file, read it, and be sure you won't fail.

          Even with atomic rename/replace I'm not sure this would be guaranteed to work. The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist.

          Something that might work is atomic rewrite of a symlink. I have not looked at the symlink patch yet to see how it handles this, but it might be useful to support atomic replacement when both the old and the new files are symlinks. Konstantin, does this case seem supportable long-term?

          Show
          Doug Cutting added a comment - > it would be nice to simply open the file, read it, and be sure you won't fail. Even with atomic rename/replace I'm not sure this would be guaranteed to work. The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist. Something that might work is atomic rewrite of a symlink. I have not looked at the symlink patch yet to see how it handles this, but it might be useful to support atomic replacement when both the old and the new files are symlinks. Konstantin, does this case seem supportable long-term?
          Hide
          Konstantin Shvachko added a comment -

          > I could certainly be convinced that this kind of behavior is out of scope for HDFS and applications should use a database or zookeeper if they need semantics like this.

          Yes. Rename with replace is usually used as locking mechanism. So why not to use a locking service directly. like ZK.

          Show
          Konstantin Shvachko added a comment - > I could certainly be convinced that this kind of behavior is out of scope for HDFS and applications should use a database or zookeeper if they need semantics like this. Yes. Rename with replace is usually used as locking mechanism. So why not to use a locking service directly. like ZK.
          Hide
          Todd Lipcon added a comment -

          The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist.

          Good point - without reader leases the atomicity of rename is unimportant since there's no consistency in the first place.

          Show
          Todd Lipcon added a comment - The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist. Good point - without reader leases the atomicity of rename is unimportant since there's no consistency in the first place.
          Hide
          Konstantin Shvachko added a comment -

          Yep, atomic rewrite of a symlink - is the way to do it. And it should be supportable long-term.

          Show
          Konstantin Shvachko added a comment - Yep, atomic rewrite of a symlink - is the way to do it. And it should be supportable long-term.
          Hide
          Suresh Srinivas added a comment -

          The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist.

          I am not sure this is rename specific. Between open read, some one can delete a file. Going further between open and read some one could delete the file and create a new one with the same name.

          Show
          Suresh Srinivas added a comment - The namenode does not keep track of files open for read, so between the time that you get the file's block list from the namenode and then try to read the first block another process could replace it and that first block might no longer exist. I am not sure this is rename specific. Between open read, some one can delete a file. Going further between open and read some one could delete the file and create a new one with the same name.
          Hide
          Suresh Srinivas added a comment -

          Based on discussion everyone seems to agree with the following behavior:

          1. rename2 will return void and throw IOException to indicate failures.
          2. rename2 will fail when renaming from file to directory or directory to file
          3. rename2 from /a/b to /c/d will fail if {c}

            does not exist

          4. rename2 will not have consistent behavior when the dst directory exists:
            • if dst directory exists it will not behave like move as it does today

          We need to agree on one of the options for rename2 behavior:

          1. rename2 from /a/file1 to /b/file when /b/file2
            • option1: posix behavior - if /b/file2 exists remove it and rename2 it (atomic if possible)
            • option2: current behavior - if /b/file2 exists throw IOException that destination already exists
          1. rename2 from /a/dir1 to /b/file when /b/dir2
            • option1: posix behavior - if /b/dir2 exists remove it and rename2 it (atomic if possible)
            • option2: if /b/dir2 exists throw IOException that destination already exists

          Can you vote on which option to go with?

          I will post a comment about implementation details once we have an agreement on the functionality.

          Show
          Suresh Srinivas added a comment - Based on discussion everyone seems to agree with the following behavior: rename2 will return void and throw IOException to indicate failures. rename2 will fail when renaming from file to directory or directory to file rename2 from /a/b to /c/d will fail if {c} does not exist rename2 will not have consistent behavior when the dst directory exists: if dst directory exists it will not behave like move as it does today We need to agree on one of the options for rename2 behavior: rename2 from /a/file1 to /b/file when /b/file2 option1: posix behavior - if /b/file2 exists remove it and rename2 it (atomic if possible) option2: current behavior - if /b/file2 exists throw IOException that destination already exists rename2 from /a/dir1 to /b/file when /b/dir2 option1: posix behavior - if /b/dir2 exists remove it and rename2 it (atomic if possible) option2: if /b/dir2 exists throw IOException that destination already exists Can you vote on which option to go with? I will post a comment about implementation details once we have an agreement on the functionality.
          Hide
          Suresh Srinivas added a comment -

          Sorry for the typos in the previous confusing comment. Reposting it with correction. Also note that the method name may not be rename2. I am open for suggestions:
          Based on discussion everyone seems to agree with the following behavior:

          1. rename2 will return void and throw IOException to indicate failures.
          2. rename2 will fail when renaming from file to directory or directory to file
          3. rename2 from /a/b to /c/d will fail if {c}

            does not exist

          4. rename2 will not have consistent behavior when the dst directory exists - if dst directory exists it will not behave like move as it does today

          We need to agree on one of the options for rename2 behavior:

          1. rename2 from /a/file1 to /b/file1
            • option1: posix behavior - if /b/file2 exists remove it and rename2 it (atomic if possible)
            • option2: current behavior - if /b/file2 exists throw IOException that destination already exists
          2. rename2 from /a/dir1 to /b/dir2
            • option1: posix behavior - if /b/dir2 exists remove it and rename2 it (atomic if possible)
            • option2: if /b/dir2 exists throw IOException that destination already exists

          Can you vote on which option to go with?

          I will post a comment about implementation details once we have an agreement on the functionality.

          Show
          Suresh Srinivas added a comment - Sorry for the typos in the previous confusing comment. Reposting it with correction. Also note that the method name may not be rename2. I am open for suggestions: Based on discussion everyone seems to agree with the following behavior: rename2 will return void and throw IOException to indicate failures. rename2 will fail when renaming from file to directory or directory to file rename2 from /a/b to /c/d will fail if {c} does not exist rename2 will not have consistent behavior when the dst directory exists - if dst directory exists it will not behave like move as it does today We need to agree on one of the options for rename2 behavior: rename2 from /a/file1 to /b/file1 option1: posix behavior - if /b/file2 exists remove it and rename2 it (atomic if possible) option2: current behavior - if /b/file2 exists throw IOException that destination already exists rename2 from /a/dir1 to /b/dir2 option1: posix behavior - if /b/dir2 exists remove it and rename2 it (atomic if possible) option2: if /b/dir2 exists throw IOException that destination already exists Can you vote on which option to go with? I will post a comment about implementation details once we have an agreement on the functionality.
          Hide
          Jakob Homan added a comment -

          Options 2 definitely seem the less dangerous and more reasonable behavior. Also tends to follow the agreed-upon behavior of being more conservative and explicit about the user's intentions. I vote for that.

          Show
          Jakob Homan added a comment - Options 2 definitely seem the less dangerous and more reasonable behavior. Also tends to follow the agreed-upon behavior of being more conservative and explicit about the user's intentions. I vote for that.
          Hide
          Konstantin Shvachko added a comment -

          > rename2 from /a/file1 to /b/file1

          This should read rename2 from /a/file1 to /b/file1 /b/file2

          +1 on option2

          Show
          Konstantin Shvachko added a comment - > rename2 from /a/file1 to /b/file1 This should read rename2 from /a/file1 to /b/file1 /b/file2 +1 on option2
          Hide
          Sanjay Radia added a comment -

          Any time one is moving from current state to new state one uses an atomic rename.
          A decent filesystem has to support the above use case.
          Is the proposal that we will use the "replace of a symlink" to point from current target to new target as Hadoop's atomic file operation?
          I think I could be persuaded to live with that.


          > .. without reader leases the atomicity of rename is unimportant since there's no consistency in the first place.
          We have this problem everywhere. A delete follow by a create can confuse a reader. It does not invalidate the case for an atomic rename.
          At some point we may fix the problem of reader leases.

          Show
          Sanjay Radia added a comment - Any time one is moving from current state to new state one uses an atomic rename. A decent filesystem has to support the above use case. Is the proposal that we will use the "replace of a symlink" to point from current target to new target as Hadoop's atomic file operation? I think I could be persuaded to live with that. > .. without reader leases the atomicity of rename is unimportant since there's no consistency in the first place. We have this problem everywhere. A delete follow by a create can confuse a reader. It does not invalidate the case for an atomic rename. At some point we may fix the problem of reader leases.
          Hide
          Doug Cutting added a comment -

          Which behaviors are different from the current behavior? My goal is to determine which mandate an HDFS-specific implementation for 0.21 (before we've added AbstractFileSystem) and cannot be correctly implemented in FileContext.

          My guesses are:

          1. rename will return void and throw IOException to indicate failures. This is new, but would be easy to fix generically in FileContext, right?
          2. rename will fail when renaming from file to directory or directory to file. Do we permit this currently? If so, then, in FileContext, we could first stat the files and throw an exception. That would potentially be incorrect if another process removed and/or replaced the files between the stat and the rename, since an illegal rename might then succeed. Is that sort of atomicity critical?
          3. rename from /a/b to /c/d will fail if {c}

            does not exist. I assume this currently succeeds. A generic implementation in FileContext would stat the parent directory, and, if it does not exist, throw an exception. That would potentially be incorrect if another process created the parent directory between the stat and the mkdir, since the rename would succeed. Is that sort of atomicity critical?

          4. rename will not have consistent behavior when the dst directory exists - if dst directory exists it will not behave like move as it does today. Again, this could be handled by first stat'ing the file, and again, it has the same atomicity concerns.

          So the question is, would applications that depend on atomic rename have troubles with a generic implementation of these? Mostly what folks depend on atomic rename for is to know that something has indeed completed. So rename-by-copy is especially dangerous here, since a file or directory's existence can give the appearance of completion when the copy did not in fact complete. But I don't see that peril in the above cases.

          What are the actual risks of a generic implementation of the above? I don't see any to the common use case of promoting result files, but perhaps I'm missing something or there are other important use cases.

          Show
          Doug Cutting added a comment - Which behaviors are different from the current behavior? My goal is to determine which mandate an HDFS-specific implementation for 0.21 (before we've added AbstractFileSystem) and cannot be correctly implemented in FileContext. My guesses are: rename will return void and throw IOException to indicate failures. This is new, but would be easy to fix generically in FileContext, right? rename will fail when renaming from file to directory or directory to file. Do we permit this currently? If so, then, in FileContext, we could first stat the files and throw an exception. That would potentially be incorrect if another process removed and/or replaced the files between the stat and the rename, since an illegal rename might then succeed. Is that sort of atomicity critical? rename from /a/b to /c/d will fail if {c} does not exist. I assume this currently succeeds. A generic implementation in FileContext would stat the parent directory, and, if it does not exist, throw an exception. That would potentially be incorrect if another process created the parent directory between the stat and the mkdir, since the rename would succeed. Is that sort of atomicity critical? rename will not have consistent behavior when the dst directory exists - if dst directory exists it will not behave like move as it does today. Again, this could be handled by first stat'ing the file, and again, it has the same atomicity concerns. So the question is, would applications that depend on atomic rename have troubles with a generic implementation of these? Mostly what folks depend on atomic rename for is to know that something has indeed completed. So rename-by-copy is especially dangerous here, since a file or directory's existence can give the appearance of completion when the copy did not in fact complete. But I don't see that peril in the above cases. What are the actual risks of a generic implementation of the above? I don't see any to the common use case of promoting result files, but perhaps I'm missing something or there are other important use cases.
          Hide
          Suresh Srinivas added a comment -

          Which behaviors are different from the current behavior?

          all the behaviors I documented in my previous comment are different or is something that is not consistently being done by filesystem implementations.

          Show
          Suresh Srinivas added a comment - Which behaviors are different from the current behavior? all the behaviors I documented in my previous comment are different or is something that is not consistently being done by filesystem implementations.
          Hide
          Suresh Srinivas added a comment -

          Doug, Sanjay and others can you please indicate your preference for option 1 or 2 (see my previous comment). I would like to get a closure on the semantics of rename operation before moving onto other details.

          Show
          Suresh Srinivas added a comment - Doug, Sanjay and others can you please indicate your preference for option 1 or 2 (see my previous comment). I would like to get a closure on the semantics of rename operation before moving onto other details.
          Hide
          dhruba borthakur added a comment -

          One of the concerns that I have initially is that most applications use the atomic rename (for files) extensively. Replacing the atomic-rename by a create-and-copy will not work for these applications. can the semantics state that rename2 is atomic (for files), but could change in a future release (when we have distributed namenodes and symlinks are implemented). Otherwise, applications cannot be migrated to the new API till symlink is supported.

          Show
          dhruba borthakur added a comment - One of the concerns that I have initially is that most applications use the atomic rename (for files) extensively. Replacing the atomic-rename by a create-and-copy will not work for these applications. can the semantics state that rename2 is atomic (for files), but could change in a future release (when we have distributed namenodes and symlinks are implemented). Otherwise, applications cannot be migrated to the new API till symlink is supported.
          Hide
          Sanjay Radia added a comment -

          How about the rewording suresh's option 1s as follows:

          1. rename2 from /a/file1 to /b/file1
            • option1: posix behavior - if /b/file2 exists remove it and rename it (atomic if fs supports it and is within the same partition)
          1. rename2 from /a/dir1 to /b/dir2
            • option1: posix behavior - if /b/dir2 exists remove it and rename it (atomic if fs supports it and is within the same partition)

          This implies that

          • some file systems may not support an atomic rename at all (say s3)
          • a non-partitioned hdfs does the rename atomically (one could also use symlinks here)
          • a partitioned scalable hdfs does not do the rename atomically (symlink is the only option here)
          Show
          Sanjay Radia added a comment - How about the rewording suresh's option 1s as follows: rename2 from /a/file1 to /b/file1 option1: posix behavior - if /b/file2 exists remove it and rename it (atomic if fs supports it and is within the same partition) rename2 from /a/dir1 to /b/dir2 option1: posix behavior - if /b/dir2 exists remove it and rename it (atomic if fs supports it and is within the same partition) This implies that some file systems may not support an atomic rename at all (say s3) a non-partitioned hdfs does the rename atomically (one could also use symlinks here) a partitioned scalable hdfs does not do the rename atomically (symlink is the only option here)
          Hide
          Doug Cutting added a comment -

          Suresh> all the behaviors I documented in my previous comment are different

          Thanks. I mostly meant to be asking which of the new behaviors will cause problems for applications if we implement them generically, mandating FileSystem-specific implementations? All of them? None of them?

          Sanjay> atomic if fs supports it and is within the same partition

          And if an FS does not implement it atomically, should it be an error, or just not atomic? If not atomic, that may create some confusion, no?

          Sanjay> a partitioned scalable hdfs does not do the rename atomically (symlink is the only option here)

          Then perhaps we should encourage all applications to use symlinks as the fundamental, reliable, atomic overwriting rename?

          We'd like to have a single, preferred idiom for atomic updates that we suggest applications use. If an application wishes to make atomic updates, it should use a single mechanism, regardless of the FileSystem it's writing to, no?

          Show
          Doug Cutting added a comment - Suresh> all the behaviors I documented in my previous comment are different Thanks. I mostly meant to be asking which of the new behaviors will cause problems for applications if we implement them generically, mandating FileSystem-specific implementations? All of them? None of them? Sanjay> atomic if fs supports it and is within the same partition And if an FS does not implement it atomically, should it be an error, or just not atomic? If not atomic, that may create some confusion, no? Sanjay> a partitioned scalable hdfs does not do the rename atomically (symlink is the only option here) Then perhaps we should encourage all applications to use symlinks as the fundamental, reliable, atomic overwriting rename? We'd like to have a single, preferred idiom for atomic updates that we suggest applications use. If an application wishes to make atomic updates, it should use a single mechanism, regardless of the FileSystem it's writing to, no?
          Hide
          Sanjay Radia added a comment -

          Sanjay> atomic if fs supports it and is within the same partition
          Doug> And if an FS does not implement it atomically, should it be an error, or just not atomic? If not atomic, that may create some confusion, no?
          Turns out locaFS is implemented using Java io and hence it does not guarantee the atomicity.
          We haven't offered such atomicity in the past (e.g. mkdirs, create, or rename) consistently across all FSs and I don't think we will
          be able to in the future.

          Doug> Then perhaps we should encourage all applications to use symlinks as the fundamental, reliable, atomic overwriting rename?
          But even symlinks or the atomic symlink replacement will not be offered consistently across all FSs.

          Hence I believe we should be guided by what spec we can and want to support on hdfs.

          The issue that Konstantine as raised is that if rename is worded too strictly then it prevents someone from building the super-duper dynamically positionable NN in the future.

          Many apps don't care about atomicity of rename.
          For those that do we can provide an API: FileContext#isWithinSamePartition(path1, path2).
          (Owen's idea).

          Summary: leave the wording loose as I have suggested ("atomic if fs supports it and is within the same partition")
          and add the method to check in the same partition.

          Show
          Sanjay Radia added a comment - Sanjay> atomic if fs supports it and is within the same partition Doug> And if an FS does not implement it atomically, should it be an error, or just not atomic? If not atomic, that may create some confusion, no? Turns out locaFS is implemented using Java io and hence it does not guarantee the atomicity. We haven't offered such atomicity in the past (e.g. mkdirs, create, or rename) consistently across all FSs and I don't think we will be able to in the future. Doug> Then perhaps we should encourage all applications to use symlinks as the fundamental, reliable, atomic overwriting rename? But even symlinks or the atomic symlink replacement will not be offered consistently across all FSs. Hence I believe we should be guided by what spec we can and want to support on hdfs. The issue that Konstantine as raised is that if rename is worded too strictly then it prevents someone from building the super-duper dynamically positionable NN in the future. Many apps don't care about atomicity of rename. For those that do we can provide an API: FileContext#isWithinSamePartition(path1, path2). (Owen's idea). Summary: leave the wording loose as I have suggested ("atomic if fs supports it and is within the same partition") and add the method to check in the same partition.
          Hide
          Doug Cutting added a comment -

          > For those that do we can provide an API: FileContext#isWithinSamePartition(path1, path2).

          I don't see how that helps applications.

          Apps should be able to write things in such a way that will work atomically if the filesystem supports atomic renames, and that won't fail on filesystems that don't support atomic renames, they just won't be atomic. We need to tell them how to do that. For example, we might state that renames within a directory will always be atomic if a filesystem supports atomic renames at all. Then apps that want atomic renames would always perform such renames within a directory. Does that make sense?

          Show
          Doug Cutting added a comment - > For those that do we can provide an API: FileContext#isWithinSamePartition(path1, path2). I don't see how that helps applications. Apps should be able to write things in such a way that will work atomically if the filesystem supports atomic renames, and that won't fail on filesystems that don't support atomic renames, they just won't be atomic. We need to tell them how to do that. For example, we might state that renames within a directory will always be atomic if a filesystem supports atomic renames at all. Then apps that want atomic renames would always perform such renames within a directory. Does that make sense?
          Hide
          Sanjay Radia added a comment -

          So proposal is:

          • a renames within a directory is atomic for ALL FSs. Renames across dirs may be be atomic depending on the FS but will always work.
            I like this!
            However, LocalFS uses Java rename and may not be able to support this.

          Are there use cases for renames across dirs?
          If so can I add a statement to the rename spec which says:

          • Some FSs like hdfs support atomic rename across dirs within the same volume and partition.
            Owen's method (FileContext#isWithinSamePartition(path1, path2)) lets an app find out if the FS volumes are correctly configured to support the atomic rename across dirs.

          In many FSs, renames across volumes throw an exception (ie they don't do a copy+delete).
          What shall we do here?

          Can the symlink spec be:

          • Symlinks are supported on some FSs. On those that do, the symlinkReplace(..) operation is guaranteed to be atomic.
          Show
          Sanjay Radia added a comment - So proposal is: a renames within a directory is atomic for ALL FSs. Renames across dirs may be be atomic depending on the FS but will always work. I like this! However, LocalFS uses Java rename and may not be able to support this. Are there use cases for renames across dirs? If so can I add a statement to the rename spec which says: Some FSs like hdfs support atomic rename across dirs within the same volume and partition. Owen's method (FileContext#isWithinSamePartition(path1, path2)) lets an app find out if the FS volumes are correctly configured to support the atomic rename across dirs. In many FSs, renames across volumes throw an exception (ie they don't do a copy+delete). What shall we do here? Can the symlink spec be: Symlinks are supported on some FSs. On those that do, the symlinkReplace(..) operation is guaranteed to be atomic.
          Hide
          Doug Cutting added a comment -

          > Renames across dirs may be be atomic depending on the FS but will always work.

          We can't guarantee they'll always work so I don't see the point of saying more than that this may not be atomic. Beyond that, every filesystem should of course do its best to make something reasonable happen, even if that requires copy+delete.

          > However, LocalFS uses Java rename and may not be able to support this.

          In general we cannot always guarantee that any replacing rename will be atomic even within a directory, but we can tell folks that their best chance is to only expect it to work within a directory, that we'll try to make that work everywhere we can.

          Show
          Doug Cutting added a comment - > Renames across dirs may be be atomic depending on the FS but will always work. We can't guarantee they'll always work so I don't see the point of saying more than that this may not be atomic. Beyond that, every filesystem should of course do its best to make something reasonable happen, even if that requires copy+delete. > However, LocalFS uses Java rename and may not be able to support this. In general we cannot always guarantee that any replacing rename will be atomic even within a directory, but we can tell folks that their best chance is to only expect it to work within a directory, that we'll try to make that work everywhere we can.
          Hide
          Sanjay Radia added a comment -

          Proposal for rename spec

          1. Rename may be atomic; please see the fs impl for further details.
          2. Most file system are likely to support atomic renames within a dir.
            Is item 2 even needed?

          If we all agree on the above then lets move forward.
          There is a part (see below the line) that still puzzles me, but we can resolve that later.

          ------

          It appears that you are being inconsistent in dealing with the lack of guarantee in atomicity for rename across dirs and within dirs.
          Neither are guaranteed to be atomic across all file system but your wording for the two seems to be different.

          You are objecting to me adding

          • Some FSs may also support atomic rename across dirs within the same volume and partition.
            How is this different from item 2 above?

          Further you are implicitly saying that we do NOT support use cases for doing a rename across dirs.
          I am not disagreeing with you but I am worried that since most filesystems support atomic renames across dirs, we are indeed
          dropping an important use case.

          Show
          Sanjay Radia added a comment - Proposal for rename spec Rename may be atomic; please see the fs impl for further details. Most file system are likely to support atomic renames within a dir. Is item 2 even needed? If we all agree on the above then lets move forward. There is a part (see below the line) that still puzzles me, but we can resolve that later. ------ It appears that you are being inconsistent in dealing with the lack of guarantee in atomicity for rename across dirs and within dirs. Neither are guaranteed to be atomic across all file system but your wording for the two seems to be different. You are objecting to me adding Some FSs may also support atomic rename across dirs within the same volume and partition. How is this different from item 2 above? Further you are implicitly saying that we do NOT support use cases for doing a rename across dirs. I am not disagreeing with you but I am worried that since most filesystems support atomic renames across dirs, we are indeed dropping an important use case.
          Hide
          Sanjay Radia added a comment -

          Q. what should FileContext#rename do for renames across different file systems?
          e.g. fc.rename(hdfs://nn1/foo, hdfs://nn2/bar);

          I assume we will do a copy+delete.

          Show
          Sanjay Radia added a comment - Q. what should FileContext#rename do for renames across different file systems? e.g. fc.rename(hdfs://nn1/foo, hdfs://nn2/bar); I assume we will do a copy+delete.
          Hide
          dhruba borthakur added a comment -

          > Rename may be atomic; please see the fs impl for further details.

          I like this statement. The javadoc for the DistributedClient.rename() should specifically state that it is atomic if the file is in the same directory.

          Show
          dhruba borthakur added a comment - > Rename may be atomic; please see the fs impl for further details. I like this statement. The javadoc for the DistributedClient.rename() should specifically state that it is atomic if the file is in the same directory.
          Hide
          dhruba borthakur added a comment -

          I like this statement. The javadoc for the DistributedFileSystem.rename() should specifically state that it is atomic if the file is in the same directory.

          Show
          dhruba borthakur added a comment - I like this statement. The javadoc for the DistributedFileSystem.rename() should specifically state that it is atomic if the file is in the same directory.
          Hide
          Todd Lipcon added a comment -

          The javadoc for the DistributedClient.rename() should specifically state that it is atomic if the file is in the same directory.

          This seems a little strange - we're saying that we don't want to claim support for atomic renames in general within a filesystem because the filesystem may be partitioned. But, we say we can do it within a directory. Isn't that assuming that the (as-of-yet-unborn) partitioned filesystem is going to partition its namespace by directory? It seems like it's equally likely that this future implementation would partition by hash.

          Show
          Todd Lipcon added a comment - The javadoc for the DistributedClient.rename() should specifically state that it is atomic if the file is in the same directory. This seems a little strange - we're saying that we don't want to claim support for atomic renames in general within a filesystem because the filesystem may be partitioned. But, we say we can do it within a directory. Isn't that assuming that the (as-of-yet-unborn) partitioned filesystem is going to partition its namespace by directory? It seems like it's equally likely that this future implementation would partition by hash.
          Hide
          Sanjay Radia added a comment -

          > we're saying that we don't want to claim support for atomic renames in general within a filesystem because the filesystem may be partitioned.
          We are divided on this issue.
          I am willing to accept such statements because a FS impl defines what partition means.
          On the other hand Doug is saying that such a statement does not help an app since it does not know which fs will be atomic;
          his point is that, however, that your best chance (not 100%, but very high) is to do a rename within a directory - most fs will support it,

          You are right in that such a spec does put a restriction on how one builds a partitioned naming system. Blind hashing will
          make atomic renames hard.

          Show
          Sanjay Radia added a comment - > we're saying that we don't want to claim support for atomic renames in general within a filesystem because the filesystem may be partitioned. We are divided on this issue. I am willing to accept such statements because a FS impl defines what partition means. On the other hand Doug is saying that such a statement does not help an app since it does not know which fs will be atomic; his point is that, however, that your best chance (not 100%, but very high) is to do a rename within a directory - most fs will support it, You are right in that such a spec does put a restriction on how one builds a partitioned naming system. Blind hashing will make atomic renames hard.
          Hide
          dhruba borthakur added a comment -

          There are many apps that I have built that depends on atomic renames from files in HDFS as well as other filesystems. This is a primitive that many many applications depend upon. In future we can state that rename of a symlink is atomic but given the fact that symlink is not yet there, what primitive will I use for my application now to ensure atomicity?

          For example, suppose one renames /file1 to /file2 and file2 already existed before the rename. In the absence of atomicity, the above call can result in the following scenarios:

          1. file1 is completely lost. file2 remains the same as it was before the rename.
          2. file2 is deleted but file1 remains as it is
          3. file1 remains as it is. file2's content is replaced by the contents of file1.

          All the above scenarios are bad, especially the first one. An application has to develop plenty of tricky things to recover from the above scenarios. Maybe we can write this tricky code (only once vs every app doing it by themsleves) inside HDFS even if namenode is distributed. If we do not make rename atomics, it feels like we are punting a hard problem that need to be solved by many applications by themselves. If atomic-renames is a performance concern in the distributed namenode scenario, we can introduce a parameter to the rename call to allow applications that do not need atomic-renames to avoid the performance penalty.

          Show
          dhruba borthakur added a comment - There are many apps that I have built that depends on atomic renames from files in HDFS as well as other filesystems. This is a primitive that many many applications depend upon. In future we can state that rename of a symlink is atomic but given the fact that symlink is not yet there, what primitive will I use for my application now to ensure atomicity? For example, suppose one renames /file1 to /file2 and file2 already existed before the rename. In the absence of atomicity, the above call can result in the following scenarios: 1. file1 is completely lost. file2 remains the same as it was before the rename. 2. file2 is deleted but file1 remains as it is 3. file1 remains as it is. file2's content is replaced by the contents of file1. All the above scenarios are bad, especially the first one. An application has to develop plenty of tricky things to recover from the above scenarios. Maybe we can write this tricky code (only once vs every app doing it by themsleves) inside HDFS even if namenode is distributed. If we do not make rename atomics, it feels like we are punting a hard problem that need to be solved by many applications by themselves. If atomic-renames is a performance concern in the distributed namenode scenario, we can introduce a parameter to the rename call to allow applications that do not need atomic-renames to avoid the performance penalty.
          Hide
          Eli Collins added a comment -

          +1 for the value of atomic rename for apps.

          I would qualify that with:

          1. The FileSystem implementation must support it, ie atomic rename is delegated to the particular file system implementation and

          2. The source and target must be on the same file system. Tthey may not in the future, eg rename(/a/b, /x/y) may span file systems if a and x are symbolic links.

          It would be nice to ensure all FileSystem implementations adhere to the same interface (a stated goal of the jira) but I fear that would lead to duplicate implementations (eg of atomic rename both in the FileSystem and above it for file systems that didn't support it) and the FileSystem API having the lowest-common-denominator across current (and future?) file systems.

          Show
          Eli Collins added a comment - +1 for the value of atomic rename for apps. I would qualify that with: 1. The FileSystem implementation must support it, ie atomic rename is delegated to the particular file system implementation and 2. The source and target must be on the same file system. Tthey may not in the future, eg rename(/a/b, /x/y) may span file systems if a and x are symbolic links. It would be nice to ensure all FileSystem implementations adhere to the same interface (a stated goal of the jira) but I fear that would lead to duplicate implementations (eg of atomic rename both in the FileSystem and above it for file systems that didn't support it) and the FileSystem API having the lowest-common-denominator across current (and future?) file systems.
          Hide
          Doug Cutting added a comment -

          > It appears that you are being inconsistent in dealing with the lack of guarantee in atomicity for rename across dirs and within dirs.

          I'm just trying to figure out a reasonable solution here, to determine what promises we can reasonably make and what advice we can best give. I've been hoping we can identify a single idiom that all filesystems will try to support and that all applications will use when atomicity is desired. In this regard I have suggested both symlink rename and within-directory-rename as possibilities. I have never claimed that either is the final right answer. I'm tossing out ideas, so some inconsistency is to be expected.

          It's probably safest to encourage folks not to rely on overwriting-rename unless they really need it. For example, the trash feature relies on non-copying rename to be efficient, but it does not require atomic rename, since when a file with the same name is already in the trash we generate a new unique name.

          Perhaps we could have options, e.g., enum RenameOptions

          {OVERWRITE, ATOMIC}

          , that are passed to rename. If you don't specify OVERWRITE, and the destination exists, and exception is thrown. If you specify ATOMIC and it cannot be done atomically then an exception is thrown. Then applications can state their expectations/requirements. Might that help?

          Show
          Doug Cutting added a comment - > It appears that you are being inconsistent in dealing with the lack of guarantee in atomicity for rename across dirs and within dirs. I'm just trying to figure out a reasonable solution here, to determine what promises we can reasonably make and what advice we can best give. I've been hoping we can identify a single idiom that all filesystems will try to support and that all applications will use when atomicity is desired. In this regard I have suggested both symlink rename and within-directory-rename as possibilities. I have never claimed that either is the final right answer. I'm tossing out ideas, so some inconsistency is to be expected. It's probably safest to encourage folks not to rely on overwriting-rename unless they really need it. For example, the trash feature relies on non-copying rename to be efficient, but it does not require atomic rename, since when a file with the same name is already in the trash we generate a new unique name. Perhaps we could have options, e.g., enum RenameOptions {OVERWRITE, ATOMIC} , that are passed to rename. If you don't specify OVERWRITE, and the destination exists, and exception is thrown. If you specify ATOMIC and it cannot be done atomically then an exception is thrown. Then applications can state their expectations/requirements. Might that help?
          Hide
          Sanjay Radia added a comment -

          I like the Overwrite flag. It resolves the option 1 vs option 2 of Suresh's proposal.

          I dislike the atomic flag: I routinely write applications that are to be deployed on HDFS. But I often test these apps on
          the local file system. The atomic flag prevents me from doing such tests (or I am forced to change the code in order to perform the test).
          Let atomicity be a property of the file system. If you care about it then please deploy on a platform that supports it.
          (So I take back my support for Owen's proposed API of checking if two paths are in the same partition. Doug convinced me that
          such an API is not useful to the app writer.).

          I also like Doug's suggestion that if your use case can be addressed by an atomic rename within a directory then by all means use it
          since it offers the best chance (not 100% but close) of portability.

          Show
          Sanjay Radia added a comment - I like the Overwrite flag. It resolves the option 1 vs option 2 of Suresh's proposal. I dislike the atomic flag: I routinely write applications that are to be deployed on HDFS. But I often test these apps on the local file system. The atomic flag prevents me from doing such tests (or I am forced to change the code in order to perform the test). Let atomicity be a property of the file system. If you care about it then please deploy on a platform that supports it. (So I take back my support for Owen's proposed API of checking if two paths are in the same partition. Doug convinced me that such an API is not useful to the app writer.). I also like Doug's suggestion that if your use case can be addressed by an atomic rename within a directory then by all means use it since it offers the best chance (not 100% but close) of portability.
          Hide
          Suresh Srinivas added a comment -

          Here is the summary of the discussion so far. The rename method will look like this:

          enum RenameOption {
          NONE,
          OVERWRITE
          }
          public void rename(Path src, Path dst, RenameOption options) throw IOException {
          ...
          }
          

          Here is the spec for new rename:

          1. Common functionality:
            • If src is a file, then dst must be a file. If src is a directory, then dst must be a directory.
            • If src does not exist, FileNotFoundException is thrown
            • If dst parent is a file (rename from /a/b to /c/d where c is a file), then IOException is thrown to indicate the failure. Does this merit a specific exception?
          2. Without OVERWRITE option:
            • If dst exists then rename throws FileAlreadyExistsException
          3. With OVERWRITE option set the behavior is:
            • If dst exists and is a file, it will be overwritten during rename
            • If dst exists and is a directory
              • if dst is an empty directory, it will be overwritten during rename
              • if dst is not an empty directory, IOException will be thrown to indicate an error. Does this merit a specific exception?
          Show
          Suresh Srinivas added a comment - Here is the summary of the discussion so far. The rename method will look like this: enum RenameOption { NONE, OVERWRITE } public void rename(Path src, Path dst, RenameOption options) throw IOException { ... } Here is the spec for new rename: Common functionality: If src is a file, then dst must be a file. If src is a directory, then dst must be a directory. If src does not exist, FileNotFoundException is thrown If dst parent is a file (rename from /a/b to /c/d where c is a file), then IOException is thrown to indicate the failure. Does this merit a specific exception? Without OVERWRITE option: If dst exists then rename throws FileAlreadyExistsException With OVERWRITE option set the behavior is: If dst exists and is a file, it will be overwritten during rename If dst exists and is a directory if dst is an empty directory, it will be overwritten during rename if dst is not an empty directory, IOException will be thrown to indicate an error. Does this merit a specific exception?
          Hide
          Owen O'Malley added a comment -

          I like the Overwrite flag. Currently, in the FileOutputFormats we need to do an explicit delete. Getting better support from the FileSystem/FileContext would be great.

          In terms of atomicity, I'm leaning toward a best effort. When we have federated namespaces, we will likely need a way to test whether a given directory structure exists on a single file system. This is equivalent to Berkley DB not working across NFS...

          Show
          Owen O'Malley added a comment - I like the Overwrite flag. Currently, in the FileOutputFormats we need to do an explicit delete. Getting better support from the FileSystem/FileContext would be great. In terms of atomicity, I'm leaning toward a best effort. When we have federated namespaces, we will likely need a way to test whether a given directory structure exists on a single file system. This is equivalent to Berkley DB not working across NFS...
          Hide
          Todd Lipcon added a comment -

          In terms of atomicity, I'm leaning toward a best effort.

          +1. I think "best effort by default" with the addition of an explicit flag like "ATOMIC" that will fail when unavailable is the best course of action.

          I routinely write applications that are to be deployed on HDFS. But I often test these apps on the local file system. The atomic flag prevents me from doing such tests

          Playing devil's advocate: if the tests pass without the ATOMIC flag on your local filesystem, then why would you be passing ATOMIC? It seems to me that you either need ATOMIC for correct operation or you don't.

          Not devil's advocate: I guess there are plenty of cases where you need ATOMIC to be strictly correct, but in single-threaded test cases there's no worry of a race. In that case it might be useful to have overrides for LocalFileSystem so you can say (only in the context of a test case, and only with some effort!) "Pretend to be atomic even though you aren't." Or, if the global override seems too dirty, it would be up to the application writer to get hooks in their tests to not pass ATOMIC when running tests.

          Show
          Todd Lipcon added a comment - In terms of atomicity, I'm leaning toward a best effort. +1. I think "best effort by default" with the addition of an explicit flag like "ATOMIC" that will fail when unavailable is the best course of action. I routinely write applications that are to be deployed on HDFS. But I often test these apps on the local file system. The atomic flag prevents me from doing such tests Playing devil's advocate: if the tests pass without the ATOMIC flag on your local filesystem, then why would you be passing ATOMIC? It seems to me that you either need ATOMIC for correct operation or you don't. Not devil's advocate: I guess there are plenty of cases where you need ATOMIC to be strictly correct, but in single-threaded test cases there's no worry of a race. In that case it might be useful to have overrides for LocalFileSystem so you can say (only in the context of a test case, and only with some effort!) "Pretend to be atomic even though you aren't." Or, if the global override seems too dirty, it would be up to the application writer to get hooks in their tests to not pass ATOMIC when running tests.
          Hide
          Doug Cutting added a comment -

          > I often test these apps on the local file system. The atomic flag prevents me from doing such tests [ ... ]

          Good point. I agree that if the local file system cannot perform atomic renames then the atomic flag would probably be counterproductive. When atomic is specified, we could exec 'mv', which is atomic when used within a filesystem. We have DF#getFilesystem() that we can use to determine if two files are on a common filesystem. So I think we could probably could implement the ATOMIC option correctly for the local filesystem if we wanted.

          It looks to me like S3 implements atomic copy. So you still need to remove the source as a second step, but one can presumably tell by dates that the copy succeeded, so the failure cases are not as bad, but I don't know if that's good enough.

          More generally, would throwing an exception for filesystems where the rename cannot be done atomically ever be useful? Let's say that we don't feel that S3 implements atomic rename sufficiently well. Would it be better, when an application wants an atomic rename, to perform the rename non-atomically or to throw an exception? If the application's okay with non-atomic, then they shouldn't specify ATOMIC. So then the question becomes, should any applications ever specify ATOMIC? Is it ever so important that you'd rather fail than have it non-atomic? My guess is probably not, so perhaps we should, as you suggest, skip the ATOMIC option. What do others think?

          Even if we only have a single option initially, OVERWRITE, we should still probably make the method accept multiple options, to future-proof it. Also note that, if the signature is new, we may not need need a different name!

          Show
          Doug Cutting added a comment - > I often test these apps on the local file system. The atomic flag prevents me from doing such tests [ ... ] Good point. I agree that if the local file system cannot perform atomic renames then the atomic flag would probably be counterproductive. When atomic is specified, we could exec 'mv', which is atomic when used within a filesystem. We have DF#getFilesystem() that we can use to determine if two files are on a common filesystem. So I think we could probably could implement the ATOMIC option correctly for the local filesystem if we wanted. It looks to me like S3 implements atomic copy. So you still need to remove the source as a second step, but one can presumably tell by dates that the copy succeeded, so the failure cases are not as bad, but I don't know if that's good enough. More generally, would throwing an exception for filesystems where the rename cannot be done atomically ever be useful? Let's say that we don't feel that S3 implements atomic rename sufficiently well. Would it be better, when an application wants an atomic rename, to perform the rename non-atomically or to throw an exception? If the application's okay with non-atomic, then they shouldn't specify ATOMIC. So then the question becomes, should any applications ever specify ATOMIC? Is it ever so important that you'd rather fail than have it non-atomic? My guess is probably not, so perhaps we should, as you suggest, skip the ATOMIC option. What do others think? Even if we only have a single option initially, OVERWRITE, we should still probably make the method accept multiple options, to future-proof it. Also note that, if the signature is new, we may not need need a different name!
          Hide
          Sanjay Radia added a comment -

          +1 to suresh's proposal.
          I suspect he has used enum rather than boolean precisely because we can extend it.

          Show
          Sanjay Radia added a comment - +1 to suresh's proposal. I suspect he has used enum rather than boolean precisely because we can extend it.
          Hide
          Doug Cutting added a comment -

          Suresh, I was thinking we'd use varargs, e.g.:

          enum RenameOption { OVERWRITE }
          public void rename(Path src, Path dst, RenameOption... options);
          
          Show
          Doug Cutting added a comment - Suresh, I was thinking we'd use varargs, e.g.: enum RenameOption { OVERWRITE } public void rename(Path src, Path dst, RenameOption... options);
          Hide
          Suresh Srinivas added a comment -

          +1 for varargs. It is much cleaner.

          Show
          Suresh Srinivas added a comment - +1 for varargs. It is much cleaner.
          Hide
          Suresh Srinivas added a comment -

          This patch implements new rename operation as defined in the jira with a default implementation. Patch defines rename options in Options class and moves createOpts from FileContext to Options.

          Show
          Suresh Srinivas added a comment - This patch implements new rename operation as defined in the jira with a default implementation. Patch defines rename options in Options class and moves createOpts from FileContext to Options .
          Hide
          Suresh Srinivas added a comment -

          New patch fixes some test-patch warnings

          Show
          Suresh Srinivas added a comment - New patch fixes some test-patch warnings
          Hide
          Suresh Srinivas added a comment -

          New patch with SuppressWarning for deprecate method use.

          Show
          Suresh Srinivas added a comment - New patch with SuppressWarning for deprecate method use.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419967/hadoop-6240-1.patch
          against trunk revision 816409.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 175 javac compiler warnings (more than the trunk's current 174 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419967/hadoop-6240-1.patch against trunk revision 816409. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 175 javac compiler warnings (more than the trunk's current 174 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/47/console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          submitting again with javadoc warnings fixes

          Show
          Suresh Srinivas added a comment - submitting again with javadoc warnings fixes
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419970/hadoop-6240-2.patch
          against trunk revision 816409.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419970/hadoop-6240-2.patch against trunk revision 816409. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/49/console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          Change comment in the default implementation in FileSystem#rename
          // default implementation is non atomic

          Typo
          FileSystem - line 753 - destination is misspelled.

          testRenameFileToDestinationWithParentFile() - you tested with none, but not with overwrite

          testRenameDirectoryAsNonExistentDirectory() - you tested with none, but not with overwrite

          Shall we add an exception ParentNotDir rather than use IOException?

          Otherwise looks good.
          +1.

          Show
          Sanjay Radia added a comment - Change comment in the default implementation in FileSystem#rename // default implementation is non atomic Typo FileSystem - line 753 - destination is misspelled. testRenameFileToDestinationWithParentFile() - you tested with none, but not with overwrite testRenameDirectoryAsNonExistentDirectory() - you tested with none, but not with overwrite Shall we add an exception ParentNotDir rather than use IOException? Otherwise looks good. +1.
          Hide
          Tom White added a comment -

          Does the new rename method need a comment to explain why it is deprecated? The javadoc for primitiveCreate() says "This a temporary method added to support the transition from FileSystem to FileContext for user applications." which would be appropriate here too.

          Show
          Tom White added a comment - Does the new rename method need a comment to explain why it is deprecated? The javadoc for primitiveCreate() says "This a temporary method added to support the transition from FileSystem to FileContext for user applications." which would be appropriate here too.
          Hide
          Suresh Srinivas added a comment -

          Patch with comments from Sanjay and Tom incorporated.

          Show
          Suresh Srinivas added a comment - Patch with comments from Sanjay and Tom incorporated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12420046/hadoop-6240-3.patch
          against trunk revision 816703.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420046/hadoop-6240-3.patch against trunk revision 816703. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/9/console This message is automatically generated.
          Hide
          Sanjay Radia added a comment -

          +1 looks good.

          Show
          Sanjay Radia added a comment - +1 looks good.
          Hide
          Suresh Srinivas added a comment -

          Minor change - added methods to help serialize Options.Rename enum.

          Show
          Suresh Srinivas added a comment - Minor change - added methods to help serialize Options.Rename enum.
          Hide
          Sanjay Radia added a comment -

          Checked your latest minor changes.
          +1

          Show
          Sanjay Radia added a comment - Checked your latest minor changes. +1
          Hide
          Suresh Srinivas added a comment -

          Refreshing the patch for the latest in the trunk

          Show
          Suresh Srinivas added a comment - Refreshing the patch for the latest in the trunk
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12420083/hadoop-6240-5.patch
          against trunk revision 816752.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420083/hadoop-6240-5.patch against trunk revision 816752. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/10/console This message is automatically generated.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #42 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/42/)
          . Add new FileContext rename operation that posix compliant that allows overwriting existing destination. Contributed by Suresh Srinivas.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #42 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/42/ ) . Add new FileContext rename operation that posix compliant that allows overwriting existing destination. Contributed by Suresh Srinivas.
          Hide
          Suresh Srinivas added a comment -

          I committed the change.

          Show
          Suresh Srinivas added a comment - I committed the change.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #102 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/102/)
          . Add new FileContext rename operation that posix compliant that allows overwriting existing destination. Contributed by Suresh Srinivas.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #102 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/102/ ) . Add new FileContext rename operation that posix compliant that allows overwriting existing destination. Contributed by Suresh Srinivas.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21. Subtask.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21. Subtask.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development