Hadoop Common
  1. Hadoop Common
  2. HADOOP-7087

SequenceFile.createWriter ignores FileSystem parameter

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The SequenceFile.createWriter methods that take a FileSystem ignore this parameter after HADOOP-6856. This is causing some MR tests to fail and is a breaking change when users pass unqualified paths to these calls.

      1. hadoop-7087.txt
        8 kB
        Todd Lipcon
      2. hadoop-7087.txt
        8 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Attached patch fixes MR trunk build

          Show
          Todd Lipcon added a comment - Attached patch fixes MR trunk build
          Hide
          Owen O'Malley added a comment -

          Rather than add a FileSystem option and then using it, create a method that makes a fully qualified path from a filesystem and a path. It seems like there is already one out there and that won't introduce an option that isn't forward compatible.

          Show
          Owen O'Malley added a comment - Rather than add a FileSystem option and then using it, create a method that makes a fully qualified path from a filesystem and a path. It seems like there is already one out there and that won't introduce an option that isn't forward compatible.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467603/hadoop-7087.txt
          against trunk revision 1055206.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 1049 javac compiler warnings (more than the trunk's current 1048 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467603/hadoop-7087.txt against trunk revision 1055206. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1049 javac compiler warnings (more than the trunk's current 1048 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/160//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Hi Owen. That's the original direction I went with this patch, but then I thought more about it, and decided that it is not quite correct - for example, RawLocalFileSystem and LocalFileSystem (checksummed) both use file:/// URLs. But path.getFileSystem(conf) will always return the checksummed version. So using fs.makeQualified(path) in these wrapper functions fails to preserve the user's choice of file system when they explicitly want the non-checksummed version.

          What do you mean by "an option that isn't forward compatible"? Are you referring to the FileSystem API going out of style in favor of FileContext?

          Show
          Todd Lipcon added a comment - Hi Owen. That's the original direction I went with this patch, but then I thought more about it, and decided that it is not quite correct - for example, RawLocalFileSystem and LocalFileSystem (checksummed) both use file:/// URLs. But path.getFileSystem(conf) will always return the checksummed version. So using fs.makeQualified(path) in these wrapper functions fails to preserve the user's choice of file system when they explicitly want the non-checksummed version. What do you mean by "an option that isn't forward compatible"? Are you referring to the FileSystem API going out of style in favor of FileContext?
          Hide
          Owen O'Malley added a comment -

          But the point is that we are moving away from FileSystem and it particular are pulling it out of interfaces. As we move to FileContext, we don't want to have to change all of the public interfaces. Even before FileContext, we have been ignoring FileSystem parameters in most of the MapReduce APIs.

          Moving forward, we should just use paths. Using FileSystem and path is a very confusing pattern.

          You do have a valid point that we can't distinguish between checksum and raw filesystems. I suspect the fix for that is to define a new prefix for one of them.

          Show
          Owen O'Malley added a comment - But the point is that we are moving away from FileSystem and it particular are pulling it out of interfaces. As we move to FileContext, we don't want to have to change all of the public interfaces. Even before FileContext, we have been ignoring FileSystem parameters in most of the MapReduce APIs. Moving forward, we should just use paths. Using FileSystem and path is a very confusing pattern. You do have a valid point that we can't distinguish between checksum and raw filesystems. I suspect the fix for that is to define a new prefix for one of them.
          Hide
          Todd Lipcon added a comment -

          So would you be OK with this patch if I change the filesystem Option to be private, marked for removal when the createWriter methods are gone?

          Show
          Todd Lipcon added a comment - So would you be OK with this patch if I change the filesystem Option to be private, marked for removal when the createWriter methods are gone?
          Hide
          Owen O'Malley added a comment -

          Why not just do:

          return createWriter(conf, Writer.file(name.makeQualified(fs)), Writer.keyClass(keyClass),
          

          then all of the code will be removed when the deprecated constructors are removed. We won't need to worry about the private FileSystem option being forgotten.

          Show
          Owen O'Malley added a comment - Why not just do: return createWriter(conf, Writer.file(name.makeQualified(fs)), Writer.keyClass(keyClass), then all of the code will be removed when the deprecated constructors are removed. We won't need to worry about the private FileSystem option being forgotten.
          Hide
          Todd Lipcon added a comment -

          Like I said, this breaks the RawLocalFileSystem vs ChecksumLocalFileSystem distinction which is a very big performance difference. Sure, some day, we want to fix that, but this patch is currently a regression from previous behavior and also breaking the MR trunk build, so I don't think we should block it on fixing the URI schemes of Raw vs Checksum FS.

          Show
          Todd Lipcon added a comment - Like I said, this breaks the RawLocalFileSystem vs ChecksumLocalFileSystem distinction which is a very big performance difference. Sure, some day, we want to fix that, but this patch is currently a regression from previous behavior and also breaking the MR trunk build, so I don't think we should block it on fixing the URI schemes of Raw vs Checksum FS.
          Hide
          Owen O'Malley added a comment -

          I've never seen user code actually write a SequenceFile to the RawLocalFileSystem, so the performance loss would only effect unit tests. smile

          It seems much better to leave the compatibility code in the deprecated methods rather than pushing it into mainline code. To make it worse, it would be good to remove the internal use of FileSystem all together to ensure that FileContext is working well. I think it would be wise to minimize the amount of new code using it.

          Show
          Owen O'Malley added a comment - I've never seen user code actually write a SequenceFile to the RawLocalFileSystem, so the performance loss would only effect unit tests. smile It seems much better to leave the compatibility code in the deprecated methods rather than pushing it into mainline code. To make it worse, it would be good to remove the internal use of FileSystem all together to ensure that FileContext is working well. I think it would be wise to minimize the amount of new code using it.
          Hide
          Todd Lipcon added a comment -

          What about also marking the FileSystem Option both private and @deprecated then?

          I'm 100% in agreement we should move our internal usage to FileContext, but that's way outside the scope of this bug.

          Show
          Todd Lipcon added a comment - What about also marking the FileSystem Option both private and @deprecated then? I'm 100% in agreement we should move our internal usage to FileContext, but that's way outside the scope of this bug.
          Hide
          Nigel Daley added a comment -

          Given it looks like HADOOP-6856 changed the semantics of a public API (no matter how insignificant you think that incompatibility might be) then either that incompatibility should be reverted or this should be a blocker. Marking it as such for now.

          Show
          Nigel Daley added a comment - Given it looks like HADOOP-6856 changed the semantics of a public API (no matter how insignificant you think that incompatibility might be) then either that incompatibility should be reverted or this should be a blocker. Marking it as such for now.
          Hide
          Owen O'Malley added a comment -

          What about also marking the FileSystem Option both private and @deprecated then?

          Please also include documentation in the Javadoc about it only being used for backwards compatibility. I still think the other solution is better, less code, and introduces less noise, but it does introduce a slight incompatibility.

          I'm 100% in agreement we should move our internal usage to FileContext, but that's way outside the scope of this bug.

          I wasn't asking you too, just pointing out that eventual migration away from FileSystem means introducing new APIs with FileSystem parameters is increasing the technical debt.

          Show
          Owen O'Malley added a comment - What about also marking the FileSystem Option both private and @deprecated then? Please also include documentation in the Javadoc about it only being used for backwards compatibility. I still think the other solution is better, less code, and introduces less noise, but it does introduce a slight incompatibility. I'm 100% in agreement we should move our internal usage to FileContext, but that's way outside the scope of this bug. I wasn't asking you too, just pointing out that eventual migration away from FileSystem means introducing new APIs with FileSystem parameters is increasing the technical debt.
          Hide
          Todd Lipcon added a comment -

          New version of the patch makes the filesystem Option private, removes it from the public options in the util package, and marks it deprecated both by javadoc and annotations.

          Show
          Todd Lipcon added a comment - New version of the patch makes the filesystem Option private, removes it from the public options in the util package, and marks it deprecated both by javadoc and annotations.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467945/hadoop-7087.txt
          against trunk revision 1056006.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 1049 javac compiler warnings (more than the trunk's current 1048 warnings).

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467945/hadoop-7087.txt against trunk revision 1056006. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 1049 javac compiler warnings (more than the trunk's current 1048 warnings). +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/167//console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          +1, although I don't see what the javadoc warning is coming from.

          Show
          Owen O'Malley added a comment - +1, although I don't see what the javadoc warning is coming from.
          Hide
          Todd Lipcon added a comment -

          I think the new 1 warning is the test case calling the deprecated createWriter API. Will commit shortly.

          Show
          Todd Lipcon added a comment - I think the new 1 warning is the test case calling the deprecated createWriter API. Will commit shortly.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-22-branch #13 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/13/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-22-branch #13 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/13/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #574 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/574/)
          HADOOP-7087. SequenceFile.createWriter ignores FileSystem parameter. Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #574 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/574/ ) HADOOP-7087 . SequenceFile.createWriter ignores FileSystem parameter. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #479 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/479/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #479 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/479/ )

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development