Hadoop Common
  1. Hadoop Common
  2. HADOOP-8449

hadoop fs -text fails with compressed sequence files with the codec file extension

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.3, 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the -text command is run on a file and the file ends in the default extension for a codec (e.g. snappy or gz), but is a compressed sequence file, the command will fail.

      The issue is that it assumes that if it matches the extension, then it's plain compressed file. It might be more helpful to check if it's a sequence file first, and then check the file extension second.

      1. HADOOP-8449.patch
        5 kB
        Harsh J
      2. HADOOP-8449.patch
        1 kB
        Harsh J

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          7h 11m 1 Harsh J 31/May/12 03:35
          Open Open Patch Available Patch Available
          26d 21h 38m 2 Harsh J 26/Jun/12 20:26
          Patch Available Patch Available Resolved Resolved
          3d 9h 39m 1 Harsh J 30/Jun/12 06:06
          Resolved Resolved Closed Closed
          103d 12h 38m 1 Arun C Murthy 11/Oct/12 18:45
          Akira AJISAKA made changes -
          Link This issue supercedes HDFS-2444 [ HDFS-2444 ]
          Jake Farrell made changes -
          Comment [ At the earliest factors of love muscle, the three attacks are enough pharmacological in drug.
          http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851518/7787455-29851518-stopadd40.html
          He often patrols the amygdaloid adderall 5mg how long does it last with his kids. ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Harsh J made changes -
          Link This issue breaks HADOOP-8833 [ HADOOP-8833 ]
          Hide
          Harsh J added a comment -

          Thanks Muddy, silly mistake of mine. I filed https://issues.apache.org/jira/browse/HADOOP-8833

          Show
          Harsh J added a comment - Thanks Muddy, silly mistake of mine. I filed https://issues.apache.org/jira/browse/HADOOP-8833
          Hide
          Muddy Dixon added a comment -

          Hi

          We found the changes in order of switch and guard block in

          private InputStream forMagic(Path p, FileSystem srcFs) throws IOException

          Because of this change, return value of

          codec.createInputStream(i)

          is changed if codec exists.

          cdh3u3

          private InputStream forMagic(Path p, FileSystem srcFs) throws IOException {
              FSDataInputStream i = srcFs.open(p);
          
              // check codecs
              CompressionCodecFactory cf = new CompressionCodecFactory(getConf());
              CompressionCodec codec = cf.getCodec(p);
              if (codec != null) {
                return codec.createInputStream(i);
              }
          
              switch(i.readShort()) {
                 // cases
              }
          

          cdh3u5

          private InputStream forMagic(Path p, FileSystem srcFs) throws IOException {
              FSDataInputStream i = srcFs.open(p);
          
              switch(i.readShort()) { // <=== index (or pointer) processes!!
                // cases
                default: {
                  // Check the type of compression instead, depending on Codec class's
                  // own detection methods, based on the provided path.
                  CompressionCodecFactory cf = new CompressionCodecFactory(getConf());
                  CompressionCodec codec = cf.getCodec(p);
                  if (codec != null) {
                    return codec.createInputStream(i);
                  }
                  break;
                }
              }
          
              // File is non-compressed, or not a file container we know.
              i.seek(0);
              return i;
            }
          
          Show
          Muddy Dixon added a comment - Hi We found the changes in order of switch and guard block in private InputStream forMagic(Path p, FileSystem srcFs) throws IOException Because of this change, return value of codec.createInputStream(i) is changed if codec exists. cdh3u3 private InputStream forMagic(Path p, FileSystem srcFs) throws IOException { FSDataInputStream i = srcFs.open(p); // check codecs CompressionCodecFactory cf = new CompressionCodecFactory(getConf()); CompressionCodec codec = cf.getCodec(p); if (codec != null ) { return codec.createInputStream(i); } switch (i.readShort()) { // cases } cdh3u5 private InputStream forMagic(Path p, FileSystem srcFs) throws IOException { FSDataInputStream i = srcFs.open(p); switch (i.readShort()) { // <=== index (or pointer) processes!! // cases default : { // Check the type of compression instead, depending on Codec class's // own detection methods, based on the provided path. CompressionCodecFactory cf = new CompressionCodecFactory(getConf()); CompressionCodec codec = cf.getCodec(p); if (codec != null ) { return codec.createInputStream(i); } break ; } } // File is non-compressed, or not a file container we know. i.seek(0); return i; }
          Arun C Murthy made changes -
          Fix Version/s 2.0.2-alpha [ 12322473 ]
          Fix Version/s 2.1.0-alpha [ 12321441 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1125 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1125/)
          HADOOP-8449. hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636)

          Result = FAILURE
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1125 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1125/ ) HADOOP-8449 . hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1092 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1092/)
          HADOOP-8449. hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636)

          Result = FAILURE
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1092 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1092/ ) HADOOP-8449 . hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2431 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2431/)
          HADOOP-8449. hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636)

          Result = FAILURE
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2431 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2431/ ) HADOOP-8449 . hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2414 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2414/)
          HADOOP-8449. hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2414 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2414/ ) HADOOP-8449 . hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2482 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2482/)
          HADOOP-8449. hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636)

          Result = SUCCESS
          harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2482 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2482/ ) HADOOP-8449 . hadoop fs -text fails with compressed sequence files with the codec file extension. (harsh) (Revision 1355636) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355636 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Harsh J made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Target Version/s 2.0.1-alpha, 3.0.0 [ 12321441, 12320357 ]
          Fix Version/s 2.0.1-alpha [ 12321441 ]
          Resolution Fixed [ 1 ]
          Hide
          Harsh J added a comment -

          Thank you Daryn. Committed to branch-2 and trunk.

          Show
          Harsh J added a comment - Thank you Daryn. Committed to branch-2 and trunk.
          Hide
          Daryn Sharp added a comment -

          +1

          Show
          Daryn Sharp added a comment - +1
          Harsh J made changes -
          Target Version/s 3.0.0 [ 12320357 ] 2.0.1-alpha, 3.0.0 [ 12321441, 12320357 ]
          Hide
          Harsh J added a comment -

          The javadocs are from MAPREDUCE-4355 and will be addressed from its follow ups.

          Thanks Aaron, Daryn and Joey! Committing by EOD to trunk and branch-2 unless there's any other comment.

          Show
          Harsh J added a comment - The javadocs are from MAPREDUCE-4355 and will be addressed from its follow ups. Thanks Aaron, Daryn and Joey! Committing by EOD to trunk and branch-2 unless there's any other comment.
          Hide
          Daryn Sharp added a comment -

          Looks good, please check if the javadoc warnings are related.

          Show
          Daryn Sharp added a comment - Looks good, please check if the javadoc warnings are related.
          Hide
          Joey Echeverria added a comment -

          I think he means he ran the new tests without the fix to Display.java and it failed as expected, but passes with the patched Display.java as desired.

          Show
          Joey Echeverria added a comment - I think he means he ran the new tests without the fix to Display.java and it failed as expected, but passes with the patched Display.java as desired.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533528/HADOOP-8449.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.viewfs.TestViewFsTrash
          org.apache.hadoop.hdfs.TestDatanodeBlockScanner

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1141//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1141//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533528/HADOOP-8449.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.viewfs.TestViewFsTrash org.apache.hadoop.hdfs.TestDatanodeBlockScanner +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1141//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1141//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Which Display changes are you referring to?

          Show
          Daryn Sharp added a comment - Which Display changes are you referring to?
          Harsh J made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Harsh J made changes -
          Attachment HADOOP-8449.patch [ 12533528 ]
          Hide
          Harsh J added a comment -

          Attached patch addresses Aaron and Daryn's comments. Added test bits fail without the Display.java changes, expectedly. Passes otherwise.

          Show
          Harsh J added a comment - Attached patch addresses Aaron and Daryn's comments. Added test bits fail without the Display.java changes, expectedly. Passes otherwise.
          Hide
          Aaron T. Myers added a comment -

          Please link the hdfs jira to this one.

          I don't think there's a need for a separate JIRA, now that test-patch.sh supports cross-project patches.

          Show
          Aaron T. Myers added a comment - Please link the hdfs jira to this one. I don't think there's a need for a separate JIRA, now that test-patch.sh supports cross-project patches.
          Hide
          Daryn Sharp added a comment -

          Please link the hdfs jira to this one. Pending tests, I think this looks good. One trivial suggestion would be to move the codec stuff into a default case for the switch.

          Show
          Daryn Sharp added a comment - Please link the hdfs jira to this one. Pending tests, I think this looks good. One trivial suggestion would be to move the codec stuff into a default case for the switch.
          Harsh J made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Harsh J added a comment -

          Thanks ATM. Totally missed that class. I'll improve the Text test in it to catch regressions in future, and provide a new patch. Cancelling current patch.

          Show
          Harsh J added a comment - Thanks ATM. Totally missed that class. I'll improve the Text test in it to catch regressions in future, and provide a new patch. Cancelling current patch.
          Hide
          Aaron T. Myers added a comment -

          there are no tests for the FsShell commands.

          There are many tests for the shell commands, but they're unfortunately in the HDFS sub-project, even though FsShell is implemented in Common. See: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java

          Show
          Aaron T. Myers added a comment - there are no tests for the FsShell commands. There are many tests for the shell commands, but they're unfortunately in the HDFS sub-project, even though FsShell is implemented in Common. See: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
          Hide
          Joey Echeverria added a comment -

          As Harsh pointed out, there are no tests for the FsShell commands. The two failing tests look unrelated, so I'm +1 on the patch.

          Show
          Joey Echeverria added a comment - As Harsh pointed out, there are no tests for the FsShell commands. The two failing tests look unrelated, so I'm +1 on the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530258/HADOOP-8449.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.fs.viewfs.TestViewFsTrash
          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1059//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1059//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530258/HADOOP-8449.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.fs.viewfs.TestViewFsTrash org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1059//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1059//console This message is automatically generated.
          Harsh J made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 3.0.0 [ 12320357 ]
          Harsh J made changes -
          Assignee Harsh J [ qwertymaniac ]
          Harsh J made changes -
          Field Original Value New Value
          Attachment HADOOP-8449.patch [ 12530258 ]
          Hide
          Harsh J added a comment -

          This patch ought to take care of this. The reverse order is what Hue does as well, as I remember from my https://issues.cloudera.org/browse/HUE-1 patch.

          I could not find tests for this command (or others) so haven't added any.

          Show
          Harsh J added a comment - This patch ought to take care of this. The reverse order is what Hue does as well, as I remember from my https://issues.cloudera.org/browse/HUE-1 patch. I could not find tests for this command (or others) so haven't added any.
          Hide
          Joey Echeverria added a comment -

          Yup. I'm cool with using extensions, but after we check for SEQ. Today, it's reversed. It checks the extension before checking for the magic header.

          Show
          Joey Echeverria added a comment - Yup. I'm cool with using extensions, but after we check for SEQ. Today, it's reversed. It checks the extension before checking for the magic header.
          Hide
          Harsh J added a comment -

          Snappy is currently un-detectable code wise, at least AFAICT, since it lacks a container format. However, yeah, we should check for SEQ magic header (and the likes) first I think.

          Show
          Harsh J added a comment - Snappy is currently un-detectable code wise, at least AFAICT, since it lacks a container format. However, yeah, we should check for SEQ magic header (and the likes) first I think.
          Joey Echeverria created issue -

            People

            • Assignee:
              Harsh J
              Reporter:
              Joey Echeverria
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development