Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10466

DistributedFileSystem.listLocatedStatus() should return HdfsBlockLocation instead of BlockLocation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hdfs
    • Labels:
      None
    • Target Version/s:

      Description

      https://issues.apache.org/jira/browse/HDFS-202 added a new API listLocatedStatus() to get all files' status with block locations for a directory. This is great that we don't need to call FileSystem.getFileBlockLocations() for each file. it's much faster (about 8-10 times).
      However, the returned LocatedFileStatus only contains basic BlockLocation instead of HdfsBlockLocation, the LocatedBlock details are stripped out.

      It should do the similar as DFSClient.getBlockLocations(), return HdfsBlockLocation which provide full block location details.

      The implementation of DistributedFileSystem. listLocatedStatus() retrieves HdfsLocatedFileStatus which contains all information, but when convert it to LocatedFileStatus, it doesn't keep LocatedBlock data. It's a simple (and compatible) change to make to keep the LocatedBlock details.

        Attachments

        1. HDFS-10466.patch
          2 kB
          Juan Yu
        2. HDFS-10466.001.patch
          2 kB
          Juan Yu

          Activity

            People

            • Assignee:
              jyu@cloudera.com Juan Yu
              Reporter:
              jyu@cloudera.com Juan Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: