Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-202

Add a bulk FIleSystem.getFileBlockLocations



    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: hdfs-client, namenode
    • Labels:
    • Hadoop Flags:
      Incompatible change, Reviewed


      Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
      The downsides are multiple:

      1. Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
      2. The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.

      It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.

      When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...


        1. hdfsListFiles5.patch
          48 kB
          Hairong Kuang
        2. hdfsListFiles4.patch
          47 kB
          Hairong Kuang
        3. hdfsListFiles3.patch
          54 kB
          Hairong Kuang
        4. hdfsListFiles2.patch
          42 kB
          Hairong Kuang
        5. hdfsListFiles1.patch
          40 kB
          Hairong Kuang
        6. hdfsListFiles.patch
          43 kB
          Hairong Kuang

          Issue Links



              • Assignee:
                hairong Hairong Kuang
                acmurthy Arun Murthy
              • Votes:
                1 Vote for this issue
                14 Start watching this issue


                • Created: