Hadoop Common
  1. Hadoop Common
  2. HADOOP-10112

har file listing doesn't work with wild card

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.10, 2.2.0
    • Fix Version/s: 0.23.11, 2.3.0
    • Component/s: tools
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      [test@test001 root]$ hdfs dfs -ls har:///tmp/filename.har/*
      -ls: Can not create a Path from an empty string
      Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

      It works without "*".

      1. HADOOP-10112.004.patch
        7 kB
        Brandon Li
      2. HADOOP-10112.004.branch-0.23.patch
        5 kB
        Jason Lowe

        Issue Links

          Activity

          Brandon Li created issue -
          Hide
          Kousuke Saruta added a comment -

          Are you using branch-1 right?
          I tried to reproduce using trunk but I couldn't and I could reproduce branch-1.
          Glob code was changed between branch-1 and trunk (or branch-2).

          Show
          Kousuke Saruta added a comment - Are you using branch-1 right? I tried to reproduce using trunk but I couldn't and I could reproduce branch-1. Glob code was changed between branch-1 and trunk (or branch-2).
          Hide
          Brandon Li added a comment -

          Trunk works fine. I was using branch 2.2. Let me update the JIRA accordingly.

          Show
          Brandon Li added a comment - Trunk works fine. I was using branch 2.2. Let me update the JIRA accordingly.
          Brandon Li made changes -
          Field Original Value New Value
          Affects Version/s 2.2.1 [ 12325254 ]
          Chris Nauroth made changes -
          Link This issue relates to HADOOP-9981 [ HADOOP-9981 ]
          Hide
          Chris Nauroth added a comment -

          It looks like this was fixed by HADOOP-9981, which optimized the new Globber code. If I revert that patch from branch-2, then I see the bug. After restoring that patch, the bug goes away and I see the listing of the har contents as I would expect.

          HADOOP-9981 was committed to trunk and branch-2, but not branch-2.2, because it was a fix in the new Globber code, which doesn't exist in branch-2.2.

          Show
          Chris Nauroth added a comment - It looks like this was fixed by HADOOP-9981 , which optimized the new Globber code. If I revert that patch from branch-2, then I see the bug. After restoring that patch, the bug goes away and I see the listing of the har contents as I would expect. HADOOP-9981 was committed to trunk and branch-2, but not branch-2.2, because it was a fix in the new Globber code, which doesn't exist in branch-2.2.
          Hide
          Kousuke Saruta added a comment -

          So, should we need to modify the glob code for branch-2.2 or back-port new Globber?

          Show
          Kousuke Saruta added a comment - So, should we need to modify the glob code for branch-2.2 or back-port new Globber?
          Hide
          Chris Nauroth added a comment -

          I don't think it's feasible to do a full backport of Globber. This was part of the symlink changes, which were deferred out of the 2.2.x line to mitigate risk.

          I do think it would be valuable to provide a fix in branch-2.2 for FileSystem#globStatusInternal. The HADOOP-9981 patch added several special cases to halt processing early. This was motivated by performance reasons, not correctness, so it may have just been accidental that the patch fixed this bug. Fixing branch-2.2 is probably a matter of pulling out which specific special case (or special cases) fixed this particular bug and translating the code to something compatible with the current FileSystem#globStatusInternal code.

          Show
          Chris Nauroth added a comment - I don't think it's feasible to do a full backport of Globber . This was part of the symlink changes, which were deferred out of the 2.2.x line to mitigate risk. I do think it would be valuable to provide a fix in branch-2.2 for FileSystem#globStatusInternal . The HADOOP-9981 patch added several special cases to halt processing early. This was motivated by performance reasons, not correctness, so it may have just been accidental that the patch fixed this bug. Fixing branch-2.2 is probably a matter of pulling out which specific special case (or special cases) fixed this particular bug and translating the code to something compatible with the current FileSystem#globStatusInternal code.
          Brandon Li made changes -
          Assignee Brandon Li [ brandonli ]
          Hide
          Brandon Li added a comment -

          Basically, the problem is that: when a wildcard is in the path, FileSystem object splits the whole path into components, and then get matches/status of each component. However, Har file root path doesn't start with "/" so it can't process any prefix.

          For example, the .har file is created at /user/me/foo.har.
          When user does "hadoop dfs -ls har:///test/foo.har/* ", FileSystem object passes each component path to HarFileSystem: /, /test, /test/foo.har, /test/foo.har/*. HarFileSystem doesn't recognize any prefix pathes "/" or "/test" since they are not in har file system.

          The fix is to redirect the request to its source file system(e.g., HDFS) for the prefix paths.

          Uploaded a patch for branch 2.3.

          Show
          Brandon Li added a comment - Basically, the problem is that: when a wildcard is in the path, FileSystem object splits the whole path into components, and then get matches/status of each component. However, Har file root path doesn't start with "/" so it can't process any prefix. For example, the .har file is created at /user/me/foo.har. When user does "hadoop dfs -ls har:///test/foo.har/* ", FileSystem object passes each component path to HarFileSystem: /, /test, /test/foo.har, /test/foo.har/*. HarFileSystem doesn't recognize any prefix pathes "/" or "/test" since they are not in har file system. The fix is to redirect the request to its source file system(e.g., HDFS) for the prefix paths. Uploaded a patch for branch 2.3.
          Brandon Li made changes -
          Attachment HADOOP-10112.004.patch [ 12623733 ]
          Hide
          Chris Nauroth added a comment -

          +1 for the patch. Thanks for adding a unit test too. I ran it locally and confirmed that the patch works.

          Show
          Chris Nauroth added a comment - +1 for the patch. Thanks for adding a unit test too. I ran it locally and confirmed that the patch works.
          Chris Nauroth made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Affects Version/s 2.2.0 [ 12325048 ]
          Affects Version/s 2.3.0 [ 12325254 ]
          Target Version/s 2.3.0 [ 12325254 ]
          Hide
          Brandon Li added a comment -

          Thank you, Chris. I've committed the patch to branch2.3.

          Show
          Brandon Li added a comment - Thank you, Chris. I've committed the patch to branch2.3.
          Brandon Li made changes -
          Fix Version/s 2.3.0 [ 12325254 ]
          Brandon Li made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Brandon Li made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5023 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5023/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5023 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5023/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/456/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/456/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1648 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1648/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1648 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1648/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1673 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1673/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1673 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1673/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Andrew Wang added a comment -

          Hey guys. Since the branch-2.3 reswizzle, I believe the new Globber is now in 2.3. Is this JIRA still necessary for branch-2.3? If not, maybe it should be reverted to minimize code difference, since it's somehow in branch-2.3 and not branch-2 or trunk.

          Show
          Andrew Wang added a comment - Hey guys. Since the branch-2.3 reswizzle, I believe the new Globber is now in 2.3. Is this JIRA still necessary for branch-2.3? If not, maybe it should be reverted to minimize code difference, since it's somehow in branch-2.3 and not branch-2 or trunk.
          Hide
          Andrew Wang added a comment -

          My bad, I didn't realize the reswizzle just stuck a giant patch on top of branch-2.3. The commit logs are now a bit mucked up as a result, but I see that the change is no longer actually present in 2.3. I'll fix CHANGES.txt and this JIRA as appropriate.

          Show
          Andrew Wang added a comment - My bad, I didn't realize the reswizzle just stuck a giant patch on top of branch-2.3. The commit logs are now a bit mucked up as a result, but I see that the change is no longer actually present in 2.3. I'll fix CHANGES.txt and this JIRA as appropriate.
          Andrew Wang made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Andrew Wang made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Invalid [ 6 ]
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #5060 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5060/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5060 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5060/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/466/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/466/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Jason Lowe added a comment -

          Reopening this as it's also problem in 0.23 that our customers would like fixed. Posting a backported patch for branch-0.23 shortly.

          Show
          Jason Lowe added a comment - Reopening this as it's also problem in 0.23 that our customers would like fixed. Posting a backported patch for branch-0.23 shortly.
          Jason Lowe made changes -
          Resolution Invalid [ 6 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Jason Lowe made changes -
          Fix Version/s 2.3.0 [ 12325254 ]
          Affects Version/s 0.23.10 [ 12324664 ]
          Target Version/s 2.3.0 [ 12325254 ] 0.23.11 [ 12324665 ]
          Hide
          Jason Lowe added a comment -

          Fairly straightforward backport to branch-0.23 of Brandon's patch. Jenkins isn't going to like this since it only handles trunk patches.

          Show
          Jason Lowe added a comment - Fairly straightforward backport to branch-0.23 of Brandon's patch. Jenkins isn't going to like this since it only handles trunk patches.
          Jason Lowe made changes -
          Attachment HADOOP-10112.004.branch-0.23.patch [ 12627484 ]
          Jason Lowe made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12627484/HADOOP-10112.004.branch-0.23.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3546//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627484/HADOOP-10112.004.branch-0.23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3546//console This message is automatically generated.
          Hide
          Chris Nauroth added a comment -

          +1 for the branch-0.23 backport patch. I don't have an 0.23 environment ready at the moment for testing, but I assume you've already tested. Thanks, Jason!

          Show
          Chris Nauroth added a comment - +1 for the branch-0.23 backport patch. I don't have an 0.23 environment ready at the moment for testing, but I assume you've already tested. Thanks, Jason!
          Hide
          Jason Lowe added a comment -

          Thanks for the review, Chris! Yes, I manually tested this on 0.23 as well as running TestHarFilesystem and TestHadoopArchives to verify they passed. Committing this.

          Show
          Jason Lowe added a comment - Thanks for the review, Chris! Yes, I manually tested this on 0.23 as well as running TestHarFilesystem and TestHadoopArchives to verify they passed. Committing this.
          Hide
          Jason Lowe added a comment -

          Thanks, Brandon! I committed this to branch-0.23. Also marking this as fixed in 2.3.0 since it's covered by HADOOP-9881.

          Show
          Jason Lowe added a comment - Thanks, Brandon! I committed this to branch-0.23. Also marking this as fixed in 2.3.0 since it's covered by HADOOP-9881 .
          Jason Lowe made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.23.11 [ 12324665 ]
          Fix Version/s 2.3.0 [ 12325254 ]
          Resolution Fixed [ 1 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open In Progress In Progress
          59d 22h 26m 1 Brandon Li 17/Jan/14 23:29
          In Progress In Progress Resolved Resolved
          8s 1 Brandon Li 17/Jan/14 23:29
          Reopened Reopened Resolved Resolved
          21s 1 Andrew Wang 29/Jan/14 19:50
          Resolved Resolved Reopened Reopened
          19d 22h 44m 2 Jason Lowe 06/Feb/14 22:14
          Reopened Reopened Patch Available Patch Available
          5m 49s 1 Jason Lowe 06/Feb/14 22:19
          Patch Available Patch Available Resolved Resolved
          51m 20s 1 Jason Lowe 06/Feb/14 23:11
          Resolved Resolved Closed Closed
          17d 21h 45m 1 Arun C Murthy 24/Feb/14 20:57

            People

            • Assignee:
              Brandon Li
              Reporter:
              Brandon Li
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development