Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 3.0.0
    • Fix Version/s: 2.5.0
    • Component/s: fs
    • Labels:
      None

      Description

      The semantics of FileSystem and FileContext are not completely defined in terms of

      1. core expectations of a filesystem
      2. consistency requirements.
      3. concurrency requirements.
      4. minimum scale limits

      Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes.

      The requirements and method semantics should be defined more strictly.

      1. HADOOP-9371-003.patch
        64 kB
        Steve Loughran
      2. HADOOP-9361.2.patch
        15 kB
        Mike Liddell
      3. HADOOP-9361.patch
        15 kB
        Steve Loughran
      4. HadoopFilesystemContract.pdf
        171 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          This is my initial draft of an FS contract, written while implementing and testing HADOOP-8545. It is a google docs document: a full patch would

          1. define the core requirements in a document (package javadocs or site file)
          2. add the requirements of each method in the javadocs of that method.
          Show
          Steve Loughran added a comment - This is my initial draft of an FS contract, written while implementing and testing HADOOP-8545 . It is a google docs document: a full patch would define the core requirements in a document (package javadocs or site file) add the requirements of each method in the javadocs of that method.
          Hide
          Matthew Farrellee added a comment -

          Page 2, Concurrency, you mention "mkdir/mkdirs is atomic"

          It seems reasonable that mkdir is atomic.

          I've been researching mkdirs(), with a focus on idempotence and atomicity.

          ClientProtocol.java:mkdirs() clearly labels it as @Idempotent, and the documentation and various implementations support that claim. It's also a property that is relatively straight-forward to implement on many back-end filesystems.

          I'm having more difficulty tracking down the atomicity of mkdirs(). The LocalFS implementations are not themselves atomic. I tracked the HDFS implementation back to FSNamesystem.java:mkdirsInt(), which appears to provide an atomic implementation. However, the atomic nature of mkdirsInt() appears to come from HDFS-988, which looks to fix a bug by making mkdirs() atomic rather having an explicit purpose of making mkdirs() atomic by design.

          How are you getting to mkdirs() as atomic?

          A mild concern of mine is that even if mkdirs() isn't atomic by design, for HDFS it has been implemented as atomic and who knows who may silently be relying on the not-by-design atomic property. That said, given mkdirs() is idempotent it isn't suitable for use as a locking mechanism.

          Show
          Matthew Farrellee added a comment - Page 2, Concurrency, you mention "mkdir/mkdirs is atomic" It seems reasonable that mkdir is atomic. I've been researching mkdirs(), with a focus on idempotence and atomicity. ClientProtocol.java:mkdirs() clearly labels it as @Idempotent, and the documentation and various implementations support that claim. It's also a property that is relatively straight-forward to implement on many back-end filesystems. I'm having more difficulty tracking down the atomicity of mkdirs(). The LocalFS implementations are not themselves atomic. I tracked the HDFS implementation back to FSNamesystem.java:mkdirsInt(), which appears to provide an atomic implementation. However, the atomic nature of mkdirsInt() appears to come from HDFS-988 , which looks to fix a bug by making mkdirs() atomic rather having an explicit purpose of making mkdirs() atomic by design. How are you getting to mkdirs() as atomic? A mild concern of mine is that even if mkdirs() isn't atomic by design, for HDFS it has been implemented as atomic and who knows who may silently be relying on the not-by-design atomic property. That said, given mkdirs() is idempotent it isn't suitable for use as a locking mechanism.
          Hide
          Arun C Murthy added a comment -

          +1 for this effort - thanks for taking this on Steve!

          Show
          Arun C Murthy added a comment - +1 for this effort - thanks for taking this on Steve!
          Hide
          Steve Loughran added a comment -

          Matthew Farrellee -I think I just pulled that mkdirs() is atomic fact from HDFS, knowing that it's something blobstores dramatically break (mkdirs() taking the time for a chain of PUT operations from the potentially remote caller.

          You are right, though, there's no guarantee that it has to be atomic, and a quick look at the Posix docs imply that while mkdir() is required to be (it's one of the API calls that must be atomic), mkdirs() can be done client side. When you start to consider cross-volume and NFS mounts, it would have to be non-atomic.

          I'll change that, and we'd better hope that nobody relies on mkdirs being atomic. I wonder if there is a way to check this other than turning it off and seeing what breaks?

          Show
          Steve Loughran added a comment - Matthew Farrellee -I think I just pulled that mkdirs() is atomic fact from HDFS, knowing that it's something blobstores dramatically break ( mkdirs() taking the time for a chain of PUT operations from the potentially remote caller. You are right, though, there's no guarantee that it has to be atomic, and a quick look at the Posix docs imply that while mkdir() is required to be (it's one of the API calls that must be atomic), mkdirs() can be done client side. When you start to consider cross-volume and NFS mounts, it would have to be non-atomic. I'll change that, and we'd better hope that nobody relies on mkdirs being atomic. I wonder if there is a way to check this other than turning it off and seeing what breaks?
          Hide
          Steve Loughran added a comment -

          This is a patch to add the spec as a markdown file under common/site and then into the site. Maven isn't picking up though (yet).

          Show
          Steve Loughran added a comment - This is a patch to add the spec as a markdown file under common/site and then into the site. Maven isn't picking up though (yet).
          Hide
          Mike Liddell added a comment -

          A few items for consideration:

          Possible additions to 'implicit assumption':

          • paths are represented as Unicode strings
          • equality/comparison of paths is based on binary content. this implies case-sensitivity and no locale-specific comparison rules.

          >>The data added to a file during a write or append MAY be visible during while the write operation is in progress.

          • Allowing read(s) during write seems to break the subsequent rule that "readers always see consistent data".

          >> Deleting the root path, /, MUST fail iff recursive==false.

          • If the root path is empty, it seems reasonable for delete("/",false) to succeed but to have no effect.

          >> After a file is created, all ls operations on the file and parent directory MUST not find the file

          • copy-paste error -> "after a file is deleted ..."

          >> Security: if a caller has the rights to list a directory, it has the rights to list directories all the way up the tree.

          • This point raises lots of interesting questions and requirements for individual methods. A section on security assumptions/rules would be great.
          Show
          Mike Liddell added a comment - A few items for consideration: Possible additions to 'implicit assumption': paths are represented as Unicode strings equality/comparison of paths is based on binary content. this implies case-sensitivity and no locale-specific comparison rules. >>The data added to a file during a write or append MAY be visible during while the write operation is in progress. Allowing read(s) during write seems to break the subsequent rule that "readers always see consistent data". >> Deleting the root path, /, MUST fail iff recursive==false. If the root path is empty, it seems reasonable for delete("/",false) to succeed but to have no effect. >> After a file is created, all ls operations on the file and parent directory MUST not find the file copy-paste error -> "after a file is deleted ..." >> Security: if a caller has the rights to list a directory, it has the rights to list directories all the way up the tree. This point raises lots of interesting questions and requirements for individual methods. A section on security assumptions/rules would be great.
          Hide
          Steve Loughran added a comment -

          Mike Liddell all good points.

          How about you submit a patch to the md file for the implicit assumptions, the copy-paste and the root dir -that one being easy to test on all but localfs.

          That "what happens to read during a write or append" is a tough one. HDFS silently serves up new data when the read crosses a block, which I'm not convinced is what anyone expects to have happen.

          We could rephrase consistency "after any update operation has completed, read operations initiated afterwards see a consistent view of the latest data"?

          Even there, the ambiguity of what happens of read-during-write is something we should pull out, as it may be where user expectations != hdfs operation

          Show
          Steve Loughran added a comment - Mike Liddell all good points. How about you submit a patch to the md file for the implicit assumptions, the copy-paste and the root dir -that one being easy to test on all but localfs. That "what happens to read during a write or append" is a tough one. HDFS silently serves up new data when the read crosses a block, which I'm not convinced is what anyone expects to have happen. We could rephrase consistency "after any update operation has completed, read operations initiated afterwards see a consistent view of the latest data"? Even there, the ambiguity of what happens of read-during-write is something we should pull out, as it may be where user expectations != hdfs operation
          Hide
          Mike Liddell added a comment -

          Added HADOOP-9361.2.patch with minor edits.

          • additional assumptions
          • changed detail for fs.delete("/")

          This patch was created via svn diff is not a delta over the original patch.

          Please let me know if the patch format is incorrect.

          Show
          Mike Liddell added a comment - Added HADOOP-9361 .2.patch with minor edits. additional assumptions changed detail for fs.delete("/") This patch was created via svn diff is not a delta over the original patch. Please let me know if the patch format is incorrect.
          Hide
          Matthew Farrellee added a comment -

          Steve Loughran Does delete(path, true) need to be atomic?

          My research suggests that only the HDFS implementation is atomic.

          (Note: current = r2.0.3-alpha on 2013-02-15 19:41)

          http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html

          FilterFileSystem - delegates w/o locking
          ChecksumFileSystem - delegates w/o locking
          LocalFileSystem - inherits, delegates to RawLocalFileSystem
          HarFileSystem - not implemented
          FTPFileSystem - FTPClient.removeDirectory w/o locking
          KosmosFileSystem - (not on trunk) no locking
          NativeS3FileSystem - no locking (even createParent()s to avoid errors, weird)
          RawLocalFileSystem - uses File.delete (if isFile) and FileUtil.fullyDelete w/o locking
          S3FileSystem - no locking
          ViewFileSystem - partial eval, no locking on top level

          • ChRootFileSystem uses RawLocalFileSystem

          http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/AbstractFileSystem.html

          AbstractFileSystem (uses FileContext.delete)
          FilterFs - delegates w/o locking
          ChecksumFs - delegates w/o locking
          LocalFs - inherits, delegates to RawLocalFs
          DelegateToFileSystem - delegates w/o locking
          RawLocalFs - inherits, delegates to RawLocalFileSystem
          FtpFs - inherits, delegates to FTPFileSystem
          ViewFs - partial eval, no locking at top level
          FileContext.delete - no hint of atomic requirement, delegates to AbstractFileSystem

          Side note - it's interesting to see how many FS implementations make their way back to RawLocalFileSystem, sometimes through 3+ layers of indirection.

          Show
          Matthew Farrellee added a comment - Steve Loughran Does delete(path, true) need to be atomic? My research suggests that only the HDFS implementation is atomic. (Note: current = r2.0.3-alpha on 2013-02-15 19:41) http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html FilterFileSystem - delegates w/o locking ChecksumFileSystem - delegates w/o locking LocalFileSystem - inherits, delegates to RawLocalFileSystem HarFileSystem - not implemented FTPFileSystem - FTPClient.removeDirectory w/o locking KosmosFileSystem - (not on trunk) no locking NativeS3FileSystem - no locking (even createParent()s to avoid errors, weird) RawLocalFileSystem - uses File.delete (if isFile) and FileUtil.fullyDelete w/o locking S3FileSystem - no locking ViewFileSystem - partial eval, no locking on top level ChRootFileSystem uses RawLocalFileSystem http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/AbstractFileSystem.html AbstractFileSystem (uses FileContext.delete) FilterFs - delegates w/o locking ChecksumFs - delegates w/o locking LocalFs - inherits, delegates to RawLocalFs DelegateToFileSystem - delegates w/o locking RawLocalFs - inherits, delegates to RawLocalFileSystem FtpFs - inherits, delegates to FTPFileSystem ViewFs - partial eval, no locking at top level FileContext.delete - no hint of atomic requirement, delegates to AbstractFileSystem Side note - it's interesting to see how many FS implementations make their way back to RawLocalFileSystem, sometimes through 3+ layers of indirection.
          Hide
          Steve Loughran added a comment -

          I've just published a copy of my branch of hadoop-trunk with this patch to github

          This has auto rendering of the MD file

          I've merged in Mike's and Matt's comments already.

          Matt: that mkdirs() point is significant. Have you found code that expects atomic directory creation? If so, we'd better fix it.

          (this makes me think of something else: a front end client to DFSClient that downconverts some of the ops to non-atomic. In the case of mkdirs, simply doing the mkdir chain client-side would suffice. I don't see an easy way to do the equivalent of mv without creating the dest dir then moving the entries below the original.)

          Show
          Steve Loughran added a comment - I've just published a copy of my branch of hadoop-trunk with this patch to github This has auto rendering of the MD file I've merged in Mike's and Matt's comments already. Matt: that mkdirs() point is significant. Have you found code that expects atomic directory creation? If so, we'd better fix it. (this makes me think of something else: a front end client to DFSClient that downconverts some of the ops to non-atomic. In the case of mkdirs, simply doing the mkdir chain client-side would suffice. I don't see an easy way to do the equivalent of mv without creating the dest dir then moving the entries below the original.)
          Hide
          bradley childs added a comment -

          Great work here guys. I've been researching the semantics around write locking and have a couple comments. First around this line regarding write atomicity:

          "Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?)", which implies fully atomic write transactions.

          If this line is a MUST (slightly unclear) then the file lock/release would have to be explicit around create(), append(), and open(). Any writer would have to go through a lock/release state for the file during the output stream instantiation (not desirable).

          If you looked at HDFS' DistributedFileSystem.java (linked below) create/open/append methods, a FSDataOutputStream is returned with no locking or lifecycle. Further investigation show's no explicit locking inside the FSDataOutputStream stream class.

          Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class which provides a sync() method. Per the interface a call to the sync method "Synchronize[s] all buffer with the underlying devices."

          To me this says that there is no exclusive Writers. Instead a Writers file consistency is only guaranteed the instant the sync(...) method is called on the underlying OutputStream, after which it only MAY be consistent until the sync(..) method is called again.

          Summary: "I believe Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?)" should be changed to something like "A file may have multiple writers with each writers only guarantee on consistency is during a sync(...) call."

          Ref:
          https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
          https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java
          https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java

          Show
          bradley childs added a comment - Great work here guys. I've been researching the semantics around write locking and have a couple comments. First around this line regarding write atomicity: "Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?)", which implies fully atomic write transactions. If this line is a MUST (slightly unclear) then the file lock/release would have to be explicit around create(), append(), and open(). Any writer would have to go through a lock/release state for the file during the output stream instantiation (not desirable). If you looked at HDFS' DistributedFileSystem.java (linked below) create/open/append methods, a FSDataOutputStream is returned with no locking or lifecycle. Further investigation show's no explicit locking inside the FSDataOutputStream stream class. Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class which provides a sync() method. Per the interface a call to the sync method "Synchronize [s] all buffer with the underlying devices." To me this says that there is no exclusive Writers. Instead a Writers file consistency is only guaranteed the instant the sync(...) method is called on the underlying OutputStream, after which it only MAY be consistent until the sync(..) method is called again. Summary: "I believe Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?)" should be changed to something like "A file may have multiple writers with each writers only guarantee on consistency is during a sync(...) call." Ref: https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java
          Hide
          Steve Loughran added a comment -

          bradley -thanks for your research.

          I wonder if we should just say, in the concurrency section:

          • Multiple writers MAY open a file for writing. If this occurs, the outcome is undefined

          I guess we have to make sure that Syncable is defined here too

          Show
          Steve Loughran added a comment - bradley -thanks for your research. I wonder if we should just say, in the concurrency section: Multiple writers MAY open a file for writing. If this occurs, the outcome is undefined I guess we have to make sure that Syncable is defined here too
          Hide
          Steve Loughran added a comment -

          We also need to specify Seekable, as the FSDataInputStream which must be returned from open() calls implement it, and the specifics of seek(long pos) are not completely defined, consistently implemented, or explicitly tested.

          • some implementation classes validate the range of a seek in the call; it can also be postponed until the next read() (which is how Posix expects it).
          • Not everything rejects negative seek offsets
          • While EOFException would be the appropriate exception to raise on going past the end of the file, it is rarely to be seen in the source.

          Delayed seeks can deliver tangible performance benefits and it would be unwise to demand stricter validation than ::lseek() or ::SetFilePointerEx(). We ought to say "you can if you want", and write tests that verify either the seek fails, or the read straight afterwards fails.

          == Seekable ==

          • When a file is opened, getPos() MUST equal 0
          • Implementations MAY NOT implement seek(), and instead MAY throw an IOException
          • A seek(L) on a closed input stream MUST fail with an IOException.
          • After a successful seek(L), getPos()==L for all L: 0 =< L < length(file)
          • On a seek(L) with L<0 an MUST be thrown. It SHOULD be an IOException. It MAY be IllegalArgumentException or other RuntimeException
          • On a seek(L) with L>length(file), an IOException MAY be thrown. It SHOULD be an EndOfFileException
          • If an IOException is not thrown, then an IOException MUST be thrown on the next read() operation. It SHOULD be an EndOfFileException

          This is actually a relaxation of the Seekable.seek() definition, which states "Can't seek past the end of the file.". The RawLocalFileSystem on which everything ultimately depends does support seeking past the end of the file -it is only on the read operation where an exception is raised.

          • After a seek(L) with L<length(file), read() returns the byte at position L in the file.
          • After a seek(L) with L==length(file), read() returns -1
          • After a seek(L) with L==length(file), read(byte[1],0,1) returns the byte at position L in the file.

          Tests to verify offset validation

          1. open a file of length file_len > 0, verify getPos()==0
          2. seek(file_len), verify getPos()==file_len
            If an exception is not raised, read() and expect an IOException exception
          3. seek(file_len+1), expect an EOFException
            If an exception is not raised, read() and expect the exception then
          4. seek(-1), expect an IOException immediately.

          open a file of length file_len == 0

          1. verify getPos()==0
          2. Verify that seek(0) succeeds.
          3. verify that read() returns -1.

          Test to verify seek() actually changes the location for future reads.

          • verify that after a seek(), read() returns the data at the seek location. This must work for forward and backwards seeks.
          • verify that after a seek(), a read(byte[]) returns the bytes of data at the seek location. This must work for forward and backwards seeks.]
            Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems with local caches/buffers handle longer range seeks correctly.
          Show
          Steve Loughran added a comment - We also need to specify Seekable , as the FSDataInputStream which must be returned from open() calls implement it, and the specifics of seek(long pos) are not completely defined, consistently implemented, or explicitly tested. some implementation classes validate the range of a seek in the call; it can also be postponed until the next read() (which is how Posix expects it). Not everything rejects negative seek offsets While EOFException would be the appropriate exception to raise on going past the end of the file, it is rarely to be seen in the source. Delayed seeks can deliver tangible performance benefits and it would be unwise to demand stricter validation than ::lseek() or ::SetFilePointerEx() . We ought to say "you can if you want", and write tests that verify either the seek fails, or the read straight afterwards fails. == Seekable == When a file is opened, getPos() MUST equal 0 Implementations MAY NOT implement seek() , and instead MAY throw an IOException A seek(L) on a closed input stream MUST fail with an IOException . After a successful seek(L) , getPos()==L for all L: 0 =< L < length(file) On a seek(L) with L<0 an MUST be thrown. It SHOULD be an IOException . It MAY be IllegalArgumentException or other RuntimeException On a seek(L) with L>length(file), an IOException MAY be thrown. It SHOULD be an EndOfFileException If an IOException is not thrown, then an IOException MUST be thrown on the next read() operation. It SHOULD be an EndOfFileException This is actually a relaxation of the Seekable.seek() definition, which states "Can't seek past the end of the file.". The RawLocalFileSystem on which everything ultimately depends does support seeking past the end of the file -it is only on the read operation where an exception is raised. After a seek(L) with L<length(file) , read() returns the byte at position L in the file. After a seek(L) with L==length(file) , read() returns -1 After a seek(L) with L==length(file) , read(byte [1] ,0,1) returns the byte at position L in the file. Tests to verify offset validation open a file of length file_len > 0 , verify getPos()==0 seek(file_len) , verify getPos()==file_len If an exception is not raised, read() and expect an IOException exception seek(file_len+1) , expect an EOFException If an exception is not raised, read() and expect the exception then seek(-1), expect an IOException immediately. open a file of length file_len == 0 verify getPos()==0 Verify that seek(0) succeeds. verify that read() returns -1. Test to verify seek() actually changes the location for future reads. verify that after a seek() , read() returns the data at the seek location. This must work for forward and backwards seeks. verify that after a seek() , a read(byte[]) returns the bytes of data at the seek location. This must work for forward and backwards seeks.] Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems with local caches/buffers handle longer range seeks correctly.
          Hide
          Steve Loughran added a comment -

          note that BufferedFSInputStream doesn't meet this spec as it treats a negative seek as a no-op:

            public void seek(long pos) throws IOException {
              if( pos<0 ) {
                return;
              }
          
          Show
          Steve Loughran added a comment - note that BufferedFSInputStream doesn't meet this spec as it treats a negative seek as a no-op: public void seek( long pos) throws IOException { if ( pos<0 ) { return ; }
          Hide
          Steve Loughran added a comment -

          also note that apple's HFS hasn't offered atomic renames until recently:http://www.weirdnet.nl/apple/rename.html

          Show
          Steve Loughran added a comment - also note that apple's HFS hasn't offered atomic renames until recently: http://www.weirdnet.nl/apple/rename.html
          Hide
          Konstantin Shvachko added a comment -

          Steve, you might want to link the document from github to the jira. Add Link has an option to add a web link.

          Not requiring atomicity for mkdirs() and recursive deletes makes sense to me.
          For renames I think we should also restrict atomicity to one special case, when file or directory name changes, that is file is not moving from one directory to another. I call it in-place rename, which with inode numbers in place is a trivial operation. Atomic moves are hard if you build a distributed namespace service (like Giraffa). Moving a file between directories that are located on different nodes requires distributed coordination, which can be complex.

          Show
          Konstantin Shvachko added a comment - Steve, you might want to link the document from github to the jira. Add Link has an option to add a web link. Not requiring atomicity for mkdirs() and recursive deletes makes sense to me. For renames I think we should also restrict atomicity to one special case, when file or directory name changes, that is file is not moving from one directory to another. I call it in-place rename, which with inode numbers in place is a trivial operation. Atomic moves are hard if you build a distributed namespace service (like Giraffa). Moving a file between directories that are located on different nodes requires distributed coordination, which can be complex.
          Hide
          Steve Loughran added a comment -

          Konstantin -good points

          -I'm going to redo it as .apt so the linking isn't going to be useful (soon). MD may be well tooled, but as there isn't a consistent format for handling tables, it's not that much better than APT (though it does make it easier to use angle brackets in in-line code, and doesn't tie you to a single build tool forever.

          1. Atomic recursive deletes? it sort of happens today in every real FS as the toplevel inode goes away. I don't know how that spans filesystems -can I actually do an rm -rf above a mounted FS in Unix?

          That said: saying "no guarantees about atomicity" is one thing -it gives us flexibility in future - but as all "normal" filesystems appear to provide this, code will tend to assume it anyway. I think we should do it -but call out blobstores for breaking some of these rules.

          1. atomic rename where the parent dir stays the same does seem a good compromise on atomicity; it means that more distributed filesystems don't do it.

          In fact, we could say "there are no guarantees that rename() across filesystems work at all". And then add an explicit exception RenameAcrossFileSystemsUnsupported for this. I'm confident that you can't rename file:///c:/something.txt to file:///d:/something.txt on windows.

          Show
          Steve Loughran added a comment - Konstantin -good points -I'm going to redo it as .apt so the linking isn't going to be useful (soon). MD may be well tooled, but as there isn't a consistent format for handling tables, it's not that much better than APT (though it does make it easier to use angle brackets in in-line code, and doesn't tie you to a single build tool forever. Atomic recursive deletes? it sort of happens today in every real FS as the toplevel inode goes away. I don't know how that spans filesystems -can I actually do an rm -rf above a mounted FS in Unix? That said: saying "no guarantees about atomicity" is one thing -it gives us flexibility in future - but as all "normal" filesystems appear to provide this, code will tend to assume it anyway. I think we should do it -but call out blobstores for breaking some of these rules. atomic rename where the parent dir stays the same does seem a good compromise on atomicity; it means that more distributed filesystems don't do it. In fact, we could say "there are no guarantees that rename() across filesystems work at all". And then add an explicit exception RenameAcrossFileSystemsUnsupported for this. I'm confident that you can't rename file:///c:/something.txt to file:///d:/something.txt on windows.
          Hide
          Suresh Srinivas added a comment -

          Steve,

          What you are trying do in this jira? Because some of the comments in this jira suggests changing the semantics.

          Is your intent to document the semantics rigorously and add tests to ensure any other file system implementation can be tested (I do not know how you can test atomicity easily) and certified based on these tests? or Are you also planning to change the semantics?

          As regards to deciding the semantics, where the documentation is either sparse or not clear, the semantics as implemented by HDFS is the gold standard. Because that is what majority of applications are dependent upon. I would discourage others from second guessing what applications need, because we do not know all the applications that are out there.

          Show
          Suresh Srinivas added a comment - Steve, What you are trying do in this jira? Because some of the comments in this jira suggests changing the semantics. Is your intent to document the semantics rigorously and add tests to ensure any other file system implementation can be tested (I do not know how you can test atomicity easily) and certified based on these tests? or Are you also planning to change the semantics? As regards to deciding the semantics, where the documentation is either sparse or not clear, the semantics as implemented by HDFS is the gold standard. Because that is what majority of applications are dependent upon. I would discourage others from second guessing what applications need, because we do not know all the applications that are out there.
          Hide
          Steve Loughran added a comment -

          Patch with the document in .apt format.

          I'm trying to move away from must/may/should to a more formal definition essentially using set theory I'd like feedback on this approach

          1. we need a good syntax, I've used Standard ML as the rough basis for this, but not perfectly.
          2. I'm not handling concurrency in the formal bits at all; that's a different level of formal logics that I don't want to go near -even if I was confident I could use them.
          3. I'm working on the core operations: create, open, delete, & first.
          4. Rename is more complex than this, I need to go through all the relevant JIRAs as well as the code.
          5. mkdir is surprising too -there are some inconsistencies between local & hdfs that I need to understand better. It looks like hdfs always returns true "there is a directory", while local returns true iff the directory was created.
          6. Permissions need to be defined as well, because of things like "What should the permissions be up the dir tree when I call create(path) and its parents don't exist?"

          Comments welcome

          Show
          Steve Loughran added a comment - Patch with the document in .apt format. I'm trying to move away from must/may/should to a more formal definition essentially using set theory I'd like feedback on this approach we need a good syntax, I've used Standard ML as the rough basis for this, but not perfectly. I'm not handling concurrency in the formal bits at all; that's a different level of formal logics that I don't want to go near -even if I was confident I could use them. I'm working on the core operations: create, open, delete, & first. Rename is more complex than this, I need to go through all the relevant JIRAs as well as the code. mkdir is surprising too -there are some inconsistencies between local & hdfs that I need to understand better. It looks like hdfs always returns true "there is a directory", while local returns true iff the directory was created. Permissions need to be defined as well, because of things like "What should the permissions be up the dir tree when I call create(path) and its parents don't exist?" Comments welcome

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development