Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: fs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. 4185_v1.patch
      3 kB
      Sharad Agarwal
    2. 4185_v2.patch
      5 kB
      Sharad Agarwal
    3. 4185_v3.patch
      7 kB
      Sharad Agarwal

      Activity

      Hide
      Devaraj Das added a comment -

      I just committed this. Thanks, Sharad!

      Show
      Devaraj Das added a comment - I just committed this. Thanks, Sharad!
      Hide
      Sharad Agarwal added a comment -

      all core tests passed on my machine. Also ant test :

      [exec] +1 overall.
      
           [exec]     +1 @author.  The patch does not contain any @author tags.
      
           [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
      
           [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
      
           [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
      
           [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
      
           [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
      
      Show
      Sharad Agarwal added a comment - all core tests passed on my machine. Also ant test : [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
      Hide
      Hairong Kuang added a comment -

      +1. The patch looks good to me.

      Show
      Hairong Kuang added a comment - +1. The patch looks good to me.
      Hide
      Sharad Agarwal added a comment -

      included test case

      Show
      Sharad Agarwal added a comment - included test case
      Hide
      Hairong Kuang added a comment -

      The patch looks good to me. Could we get a junit testcase?

      Show
      Hairong Kuang added a comment - The patch looks good to me. Could we get a junit testcase?
      Hide
      Hairong Kuang added a comment -

      > What do you think of changing (in a separate issue) open and create to an options-style API?
      This sounds like a great idea to me!

      Show
      Hairong Kuang added a comment - > What do you think of changing (in a separate issue) open and create to an options-style API? This sounds like a great idea to me!
      Hide
      Doug Cutting added a comment -

      > Does it make more sense if we make "verifyChecksum" as a a parameter to open [ ...]

      It does make sense to be able to set this on a file-by-file basis, but I'm hesitant to add parameters, since the number of parameters could become unmanageable. Ideally open and create might take an 'options' parameter to pass things like blockSize, verifyData, bufferSize, replication, etc. What do you think of changing (in a separate issue) open and create to an options-style API?

      Show
      Doug Cutting added a comment - > Does it make more sense if we make "verifyChecksum" as a a parameter to open [ ...] It does make sense to be able to set this on a file-by-file basis, but I'm hesitant to add parameters, since the number of parameters could become unmanageable. Ideally open and create might take an 'options' parameter to pass things like blockSize, verifyData, bufferSize, replication, etc. What do you think of changing (in a separate issue) open and create to an options-style API?
      Hide
      Sharad Agarwal added a comment -

      Incorporated Raghu's comment. Passing verifyChecksum to FSInputChecker

      Show
      Sharad Agarwal added a comment - Incorporated Raghu's comment. Passing verifyChecksum to FSInputChecker
      Hide
      Hairong Kuang added a comment -

      I meant adding an open method that alllow the user to disable checksum verification to FileSystem.
      public FSDataInputStream open(Path f, boolean verifyChecksum) throws IOException;

      How does it sound?

      Show
      Hairong Kuang added a comment - I meant adding an open method that alllow the user to disable checksum verification to FileSystem. public FSDataInputStream open(Path f, boolean verifyChecksum) throws IOException; How does it sound?
      Hide
      Raghu Angadi added a comment -

      Regd the patch : Rather than overriding needChecksum(), verifyChecksum should be passed to FSInputChecker in ChecksumFSInputChecker. This avoids various issues ('sums' is checked for null in the implementation, for e.g.)

      Show
      Raghu Angadi added a comment - Regd the patch : Rather than overriding needChecksum(), verifyChecksum should be passed to FSInputChecker in ChecksumFSInputChecker. This avoids various issues ('sums' is checked for null in the implementation, for e.g.)
      Hide
      Hairong Kuang added a comment -

      Does it make more sense if we make "verifyChecksum" as a a parameter to open instead of setting it as an attribute to a file system? File system handles are cached. Setting "verifyChecksum" to be false may bring subtle side effect to later user of the file system handle.

      Show
      Hairong Kuang added a comment - Does it make more sense if we make "verifyChecksum" as a a parameter to open instead of setting it as an attribute to a file system? File system handles are cached. Setting "verifyChecksum" to be false may bring subtle side effect to later user of the file system handle.
      Hide
      Doug Cutting added a comment -

      > clients cannot tell whether the fs supports checksum

      We could have the method return a boolean.

      /** @returns true if data verification is enabled. */
      public boolean setVerifyData(boolean)

      { return false; }

      But do clients actually care whether the fs supports verification? At this point the FsShell does not care. All that it requires is that one can say, "if you support checksums, disable them", which is what the current patch implements. If we someday in the future need to support such inquiries, then we can add support for that then. But right now, I don't see that that is needed.

      I'm +1 for the patch as-is.

      Show
      Doug Cutting added a comment - > clients cannot tell whether the fs supports checksum We could have the method return a boolean. /** @returns true if data verification is enabled. */ public boolean setVerifyData(boolean) { return false; } But do clients actually care whether the fs supports verification? At this point the FsShell does not care. All that it requires is that one can say, "if you support checksums, disable them", which is what the current patch implements. If we someday in the future need to support such inquiries, then we can add support for that then. But right now, I don't see that that is needed. I'm +1 for the patch as-is.
      Hide
      Sharad Agarwal added a comment -

      @Hairong Your solution will work with respect to ChecksumFileSystem as it exists in core. But the problem is with DistributedFileSystem. As Tom said that to break dependency of core on hdfs, we are promoting the setVerifyChecksum method from DistributedFileSystem to FileSystem. And once we choose to promote then ChecksumFileSystem should honor it.

      Does the current patch make sense with respect to making ChecksumFileSystem honor setVerifyChecksum api ?

      Show
      Sharad Agarwal added a comment - @Hairong Your solution will work with respect to ChecksumFileSystem as it exists in core. But the problem is with DistributedFileSystem. As Tom said that to break dependency of core on hdfs, we are promoting the setVerifyChecksum method from DistributedFileSystem to FileSystem. And once we choose to promote then ChecksumFileSystem should honor it. Does the current patch make sense with respect to making ChecksumFileSystem honor setVerifyChecksum api ?
      Hide
      Hairong Kuang added a comment -

      Hadoop only supports two kinds of file systems that support checksum:
      1. HDFS. We already have a way to bypass checksum verification in place.
      2. ChecksumFileSystem: it wraps around a raw file system which has no idea of checksum. To bypass checksum verification, we could get a handle of the raw FS by calling ChecksumFileSystem.getRawFileSystem() and reading a file using the raw file system handle.

      So we could add the following line to FsShell.getSrcFileSystem:

      if (!verifyChecksum && srcFs instanceof ChecksumFileSystem) {
        srcFs = ((ChecksumFileSystem)srcFs).getRawFileSystem();
      }
      
      Show
      Hairong Kuang added a comment - Hadoop only supports two kinds of file systems that support checksum: 1. HDFS. We already have a way to bypass checksum verification in place. 2. ChecksumFileSystem: it wraps around a raw file system which has no idea of checksum. To bypass checksum verification, we could get a handle of the raw FS by calling ChecksumFileSystem.getRawFileSystem() and reading a file using the raw file system handle. So we could add the following line to FsShell.getSrcFileSystem: if (!verifyChecksum && srcFs instanceof ChecksumFileSystem) { srcFs = ((ChecksumFileSystem)srcFs).getRawFileSystem(); }
      Hide
      Tsz Wo Nicholas Sze added a comment -

      I agree that we should avoid casting,

      However, the current codes introduce a new empty method setVerifyChecksum(...) in FileSystem. Clients cannot tell whether calling this method is effective or not, i.e. clients cannot tell whether the fs supports checksum. In this design, we probably should add another method for telling whether the file system supports setVerifyChecksum(..).

      BTW, the name setVerifyChecksum may be confusing. I suggest changing it to setVerifyData. File system may have mechanism other checksum for verifying data.

      Show
      Tsz Wo Nicholas Sze added a comment - I agree that we should avoid casting, However, the current codes introduce a new empty method setVerifyChecksum(...) in FileSystem. Clients cannot tell whether calling this method is effective or not, i.e. clients cannot tell whether the fs supports checksum. In this design, we probably should add another method for telling whether the file system supports setVerifyChecksum(..). BTW, the name setVerifyChecksum may be confusing. I suggest changing it to setVerifyData. File system may have mechanism other checksum for verifying data.
      Hide
      Doug Cutting added a comment -

      > how about putting it to a new interface, say Checksumable

      If any FileSystem might implement this interface then I don't see how this is functionally different than adding this to FileSystem.java. It just adds ugly code whenever you call setVerifyChecksum. Casting and 'instanceof' are indications of poor object modeling. Sometimes they're required, but we shouldn't use them except as a last resort.

      Show
      Doug Cutting added a comment - > how about putting it to a new interface, say Checksumable If any FileSystem might implement this interface then I don't see how this is functionally different than adding this to FileSystem.java. It just adds ugly code whenever you call setVerifyChecksum. Casting and 'instanceof' are indications of poor object modeling. Sometimes they're required, but we shouldn't use them except as a last resort.
      Hide
      Tsz Wo Nicholas Sze added a comment -

      Instead of adding setVerifyChecksum(...) to FileSystem, how about putting it to a new interface, say Checksumable, so that DFS and other FS subclasses may implement it. Then, the codes in FsShell should be

          if (srcFs instanceof Checksumable) {
            ((Checksumable)srcFs).setVerifyChecksum(verifyChecksum);
          }
      
      Show
      Tsz Wo Nicholas Sze added a comment - Instead of adding setVerifyChecksum(...) to FileSystem, how about putting it to a new interface, say Checksumable, so that DFS and other FS subclasses may implement it. Then, the codes in FsShell should be if (srcFs instanceof Checksumable) { ((Checksumable)srcFs).setVerifyChecksum(verifyChecksum); }
      Hide
      Tom White added a comment -

      The motivation to add setVerifyChecksum to FileSystem is to break core's dependency on HDFS from FsShell by promoting the method from DistributedFileSystem to FileSystem. This change allows users using the shell to disable checksum verification for any filesystem, not just HDFS.

      Show
      Tom White added a comment - The motivation to add setVerifyChecksum to FileSystem is to break core's dependency on HDFS from FsShell by promoting the method from DistributedFileSystem to FileSystem. This change allows users using the shell to disable checksum verification for any filesystem, not just HDFS.
      Hide
      Hairong Kuang added a comment -

      Why do you need the method setVerifyChecksum? A ChecksumFileSystem wraps around a raw file system. If a user needs to read a file without verifying its checksum, the user could use the raw file system handle instead.

      Show
      Hairong Kuang added a comment - Why do you need the method setVerifyChecksum? A ChecksumFileSystem wraps around a raw file system. If a user needs to read a file without verifying its checksum, the user could use the raw file system handle instead.
      Hide
      Sharad Agarwal added a comment -

      this patch:
      adds setVerifyChecksum method to FileSystem, FilterFileSystem and ChecksumFileSystem.

      Will appreciate if someone familiar with ChecksumFileSystem can review the changes.

      Show
      Sharad Agarwal added a comment - this patch: adds setVerifyChecksum method to FileSystem, FilterFileSystem and ChecksumFileSystem. Will appreciate if someone familiar with ChecksumFileSystem can review the changes.

        People

        • Assignee:
          Sharad Agarwal
          Reporter:
          Tom White
        • Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development