Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-202

Add a bulk FIleSystem.getFileBlockLocations

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: hdfs-client, namenode
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed

      Description

      Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
      The downsides are multiple:

      1. Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
      2. The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.

      It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.

      When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

      1. hdfsListFiles5.patch
        48 kB
        Hairong Kuang
      2. hdfsListFiles4.patch
        47 kB
        Hairong Kuang
      3. hdfsListFiles3.patch
        54 kB
        Hairong Kuang
      4. hdfsListFiles2.patch
        42 kB
        Hairong Kuang
      5. hdfsListFiles1.patch
        40 kB
        Hairong Kuang
      6. hdfsListFiles.patch
        43 kB
        Hairong Kuang

        Issue Links

          Activity

          Arun C Murthy created issue -
          Jakob Homan made changes -
          Field Original Value New Value
          Assignee Jakob Homan [ jghoman ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
          Key HADOOP-5795 HDFS-202
          Affects Version/s 0.20.0 [ 12313438 ]
          Component/s dfs [ 12310710 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Hairong Kuang made changes -
          Assignee Jakob Homan [ jghoman ] Hairong Kuang [ hairong ]
          Fix Version/s 0.22.0 [ 12314241 ]
          Hairong Kuang made changes -
          Link This issue relates to HADOOP-6870 [ HADOOP-6870 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles.patch [ 12450343 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles1.patch [ 12450559 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles2.patch [ 12450935 ]
          Hairong Kuang made changes -
          Link This issue blocks MAPREDUCE-1981 [ MAPREDUCE-1981 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles3.patch [ 12451084 ]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hairong Kuang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles4.patch [ 12451798 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles4.patch [ 12451798 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles4.patch [ 12451801 ]
          Hairong Kuang made changes -
          Attachment hdfsListFiles5.patch [ 12451810 ]
          Hairong Kuang made changes -
          Component/s hdfs client [ 12312928 ]
          Component/s name-node [ 12312926 ]
          Hairong Kuang made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Hadoop Flags [Reviewed] [Incompatible change, Reviewed]
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Arun C Murthy
            • Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development