Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-202

Add a bulk FIleSystem.getFileBlockLocations

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.22.0
    • hdfs-client, namenode
    • None
    • Incompatible change, Reviewed

    Description

      Currently map-reduce applications (specifically file-based input-formats) use FileSystem.getFileBlockLocations to compute splits. However they are forced to call it once per file.
      The downsides are multiple:

      1. Even with a few thousand files to process the number of RPCs quickly starts getting noticeable
      2. The current implementation of getFileBlockLocations is too slow since each call results in 'search' in the namesystem. Assuming a few thousand input files it results in that many RPCs and 'searches'.

      It would be nice to have a FileSystem.getFileBlockLocations which can take in a directory, and return the block-locations for all files in that directory. We could eliminate both the per-file RPC and also the 'search' by a 'scan'.

      When I tested this for terasort, a moderate job with 8000 input files the runtime halved from the current 8s to 4s. Clearly this is much more important for latency-sensitive applications...

      Attachments

        1. hdfsListFiles.patch
          43 kB
          Hairong Kuang
        2. hdfsListFiles1.patch
          40 kB
          Hairong Kuang
        3. hdfsListFiles2.patch
          42 kB
          Hairong Kuang
        4. hdfsListFiles3.patch
          54 kB
          Hairong Kuang
        5. hdfsListFiles4.patch
          47 kB
          Hairong Kuang
        6. hdfsListFiles5.patch
          48 kB
          Hairong Kuang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hairong Hairong Kuang
            acmurthy Arun Murthy
            Votes:
            1 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment