Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15320

Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.3, 2.9.0, 3.0.0
    • Fix Version/s: 3.1.0, 2.9.1
    • Component/s: fs/adl, fs/azure
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      hadoop-azure and hadoop-azure-datalake have its own implementation of getFileBlockLocations(), which faked a list of artificial blocks based on the hard-coded block size. And each block has one host with name "localhost". Take a look at this code:

      https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485

      This is a unnecessary mock up for a "remote" file system to mimic HDFS. And the problem with this mock is that for large (~TB) files we generates lots of artificial blocks, and FileInputFormat.getSplits() is slow in calculating splits based on these blocks.

      We can safely remove this customized getFileBlockLocations() implementation, fall back to the default FileSystem.getFileBlockLocations() implementation, which is to return 1 block for any file with 1 host "localhost". Note that this doesn't mean we will create much less splits, because the number of splits is still limited by the blockSize in FileInputFormat.computeSplitSize():

      return Math.max(minSize, Math.min(goalSize, blockSize));

        Attachments

        1. HADOOP-15320.01.patch
          11 kB
          Chris Douglas
        2. HADOOP-15320.patch
          10 kB
          shanyu zhao

          Issue Links

            Activity

              People

              • Assignee:
                shanyu shanyu zhao
                Reporter:
                shanyu shanyu zhao
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: