Hadoop Common
  1. Hadoop Common
  2. HADOOP-10112

har file listing doesn't work with wild card

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.10, 2.2.0
    • Fix Version/s: 0.23.11, 2.3.0
    • Component/s: tools
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      [test@test001 root]$ hdfs dfs -ls har:///tmp/filename.har/*
      -ls: Can not create a Path from an empty string
      Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]

      It works without "*".

      1. HADOOP-10112.004.patch
        7 kB
        Brandon Li
      2. HADOOP-10112.004.branch-0.23.patch
        5 kB
        Jason Lowe

        Issue Links

          Activity

          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Jason Lowe made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.23.11 [ 12324665 ]
          Fix Version/s 2.3.0 [ 12325254 ]
          Resolution Fixed [ 1 ]
          Hide
          Jason Lowe added a comment -

          Thanks, Brandon! I committed this to branch-0.23. Also marking this as fixed in 2.3.0 since it's covered by HADOOP-9881.

          Show
          Jason Lowe added a comment - Thanks, Brandon! I committed this to branch-0.23. Also marking this as fixed in 2.3.0 since it's covered by HADOOP-9881 .
          Hide
          Jason Lowe added a comment -

          Thanks for the review, Chris! Yes, I manually tested this on 0.23 as well as running TestHarFilesystem and TestHadoopArchives to verify they passed. Committing this.

          Show
          Jason Lowe added a comment - Thanks for the review, Chris! Yes, I manually tested this on 0.23 as well as running TestHarFilesystem and TestHadoopArchives to verify they passed. Committing this.
          Hide
          Chris Nauroth added a comment -

          +1 for the branch-0.23 backport patch. I don't have an 0.23 environment ready at the moment for testing, but I assume you've already tested. Thanks, Jason!

          Show
          Chris Nauroth added a comment - +1 for the branch-0.23 backport patch. I don't have an 0.23 environment ready at the moment for testing, but I assume you've already tested. Thanks, Jason!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12627484/HADOOP-10112.004.branch-0.23.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3546//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627484/HADOOP-10112.004.branch-0.23.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3546//console This message is automatically generated.
          Jason Lowe made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Jason Lowe made changes -
          Attachment HADOOP-10112.004.branch-0.23.patch [ 12627484 ]
          Hide
          Jason Lowe added a comment -

          Fairly straightforward backport to branch-0.23 of Brandon's patch. Jenkins isn't going to like this since it only handles trunk patches.

          Show
          Jason Lowe added a comment - Fairly straightforward backport to branch-0.23 of Brandon's patch. Jenkins isn't going to like this since it only handles trunk patches.
          Jason Lowe made changes -
          Fix Version/s 2.3.0 [ 12325254 ]
          Affects Version/s 0.23.10 [ 12324664 ]
          Target Version/s 2.3.0 [ 12325254 ] 0.23.11 [ 12324665 ]
          Jason Lowe made changes -
          Resolution Invalid [ 6 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Jason Lowe added a comment -

          Reopening this as it's also problem in 0.23 that our customers would like fixed. Posting a backported patch for branch-0.23 shortly.

          Show
          Jason Lowe added a comment - Reopening this as it's also problem in 0.23 that our customers would like fixed. Posting a backported patch for branch-0.23 shortly.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1658 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1658/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1683 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1683/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/466/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #466 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/466/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #5060 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5060/)
          Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5060 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5060/ ) Remove HADOOP-10112 from CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562566 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Andrew Wang made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Invalid [ 6 ]
          Andrew Wang made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Andrew Wang added a comment -

          My bad, I didn't realize the reswizzle just stuck a giant patch on top of branch-2.3. The commit logs are now a bit mucked up as a result, but I see that the change is no longer actually present in 2.3. I'll fix CHANGES.txt and this JIRA as appropriate.

          Show
          Andrew Wang added a comment - My bad, I didn't realize the reswizzle just stuck a giant patch on top of branch-2.3. The commit logs are now a bit mucked up as a result, but I see that the change is no longer actually present in 2.3. I'll fix CHANGES.txt and this JIRA as appropriate.
          Hide
          Andrew Wang added a comment -

          Hey guys. Since the branch-2.3 reswizzle, I believe the new Globber is now in 2.3. Is this JIRA still necessary for branch-2.3? If not, maybe it should be reverted to minimize code difference, since it's somehow in branch-2.3 and not branch-2 or trunk.

          Show
          Andrew Wang added a comment - Hey guys. Since the branch-2.3 reswizzle, I believe the new Globber is now in 2.3. Is this JIRA still necessary for branch-2.3? If not, maybe it should be reverted to minimize code difference, since it's somehow in branch-2.3 and not branch-2 or trunk.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1673 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1673/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1673 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1673/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1648 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1648/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1648 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1648/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/456/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/456/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5023 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5023/)
          Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5023 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5023/ ) Update hadoop-common/CHANGES.txt for HADOOP-10112 in branch2.3 (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1559286 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          Brandon Li made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Brandon Li made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Brandon Li made changes -
          Fix Version/s 2.3.0 [ 12325254 ]
          Hide
          Brandon Li added a comment -

          Thank you, Chris. I've committed the patch to branch2.3.

          Show
          Brandon Li added a comment - Thank you, Chris. I've committed the patch to branch2.3.
          Chris Nauroth made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Affects Version/s 2.2.0 [ 12325048 ]
          Affects Version/s 2.3.0 [ 12325254 ]
          Target Version/s 2.3.0 [ 12325254 ]
          Hide
          Chris Nauroth added a comment -

          +1 for the patch. Thanks for adding a unit test too. I ran it locally and confirmed that the patch works.

          Show
          Chris Nauroth added a comment - +1 for the patch. Thanks for adding a unit test too. I ran it locally and confirmed that the patch works.
          Brandon Li made changes -
          Attachment HADOOP-10112.004.patch [ 12623733 ]
          Hide
          Brandon Li added a comment -

          Basically, the problem is that: when a wildcard is in the path, FileSystem object splits the whole path into components, and then get matches/status of each component. However, Har file root path doesn't start with "/" so it can't process any prefix.

          For example, the .har file is created at /user/me/foo.har.
          When user does "hadoop dfs -ls har:///test/foo.har/* ", FileSystem object passes each component path to HarFileSystem: /, /test, /test/foo.har, /test/foo.har/*. HarFileSystem doesn't recognize any prefix pathes "/" or "/test" since they are not in har file system.

          The fix is to redirect the request to its source file system(e.g., HDFS) for the prefix paths.

          Uploaded a patch for branch 2.3.

          Show
          Brandon Li added a comment - Basically, the problem is that: when a wildcard is in the path, FileSystem object splits the whole path into components, and then get matches/status of each component. However, Har file root path doesn't start with "/" so it can't process any prefix. For example, the .har file is created at /user/me/foo.har. When user does "hadoop dfs -ls har:///test/foo.har/* ", FileSystem object passes each component path to HarFileSystem: /, /test, /test/foo.har, /test/foo.har/*. HarFileSystem doesn't recognize any prefix pathes "/" or "/test" since they are not in har file system. The fix is to redirect the request to its source file system(e.g., HDFS) for the prefix paths. Uploaded a patch for branch 2.3.
          Brandon Li made changes -
          Assignee Brandon Li [ brandonli ]
          Hide
          Chris Nauroth added a comment -

          I don't think it's feasible to do a full backport of Globber. This was part of the symlink changes, which were deferred out of the 2.2.x line to mitigate risk.

          I do think it would be valuable to provide a fix in branch-2.2 for FileSystem#globStatusInternal. The HADOOP-9981 patch added several special cases to halt processing early. This was motivated by performance reasons, not correctness, so it may have just been accidental that the patch fixed this bug. Fixing branch-2.2 is probably a matter of pulling out which specific special case (or special cases) fixed this particular bug and translating the code to something compatible with the current FileSystem#globStatusInternal code.

          Show
          Chris Nauroth added a comment - I don't think it's feasible to do a full backport of Globber . This was part of the symlink changes, which were deferred out of the 2.2.x line to mitigate risk. I do think it would be valuable to provide a fix in branch-2.2 for FileSystem#globStatusInternal . The HADOOP-9981 patch added several special cases to halt processing early. This was motivated by performance reasons, not correctness, so it may have just been accidental that the patch fixed this bug. Fixing branch-2.2 is probably a matter of pulling out which specific special case (or special cases) fixed this particular bug and translating the code to something compatible with the current FileSystem#globStatusInternal code.
          Hide
          Kousuke Saruta added a comment -

          So, should we need to modify the glob code for branch-2.2 or back-port new Globber?

          Show
          Kousuke Saruta added a comment - So, should we need to modify the glob code for branch-2.2 or back-port new Globber?
          Hide
          Chris Nauroth added a comment -

          It looks like this was fixed by HADOOP-9981, which optimized the new Globber code. If I revert that patch from branch-2, then I see the bug. After restoring that patch, the bug goes away and I see the listing of the har contents as I would expect.

          HADOOP-9981 was committed to trunk and branch-2, but not branch-2.2, because it was a fix in the new Globber code, which doesn't exist in branch-2.2.

          Show
          Chris Nauroth added a comment - It looks like this was fixed by HADOOP-9981 , which optimized the new Globber code. If I revert that patch from branch-2, then I see the bug. After restoring that patch, the bug goes away and I see the listing of the har contents as I would expect. HADOOP-9981 was committed to trunk and branch-2, but not branch-2.2, because it was a fix in the new Globber code, which doesn't exist in branch-2.2.
          Chris Nauroth made changes -
          Link This issue relates to HADOOP-9981 [ HADOOP-9981 ]
          Brandon Li made changes -
          Field Original Value New Value
          Affects Version/s 2.2.1 [ 12325254 ]
          Hide
          Brandon Li added a comment -

          Trunk works fine. I was using branch 2.2. Let me update the JIRA accordingly.

          Show
          Brandon Li added a comment - Trunk works fine. I was using branch 2.2. Let me update the JIRA accordingly.
          Hide
          Kousuke Saruta added a comment -

          Are you using branch-1 right?
          I tried to reproduce using trunk but I couldn't and I could reproduce branch-1.
          Glob code was changed between branch-1 and trunk (or branch-2).

          Show
          Kousuke Saruta added a comment - Are you using branch-1 right? I tried to reproduce using trunk but I couldn't and I could reproduce branch-1. Glob code was changed between branch-1 and trunk (or branch-2).
          Brandon Li created issue -

            People

            • Assignee:
              Brandon Li
              Reporter:
              Brandon Li
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development