Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10466

DistributedFileSystem.listLocatedStatus() should return HdfsBlockLocation instead of BlockLocation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • hdfs
    • None

    Description

      https://issues.apache.org/jira/browse/HDFS-202 added a new API listLocatedStatus() to get all files' status with block locations for a directory. This is great that we don't need to call FileSystem.getFileBlockLocations() for each file. it's much faster (about 8-10 times).
      However, the returned LocatedFileStatus only contains basic BlockLocation instead of HdfsBlockLocation, the LocatedBlock details are stripped out.

      It should do the similar as DFSClient.getBlockLocations(), return HdfsBlockLocation which provide full block location details.

      The implementation of DistributedFileSystem. listLocatedStatus() retrieves HdfsLocatedFileStatus which contains all information, but when convert it to LocatedFileStatus, it doesn't keep LocatedBlock data. It's a simple (and compatible) change to make to keep the LocatedBlock details.

      Attachments

        1. HDFS-10466.001.patch
          2 kB
          Juan Yu
        2. HDFS-10466.patch
          2 kB
          Juan Yu

        Activity

          People

            jyu@cloudera.com Juan Yu
            jyu@cloudera.com Juan Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: