Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2349

speed up list[located]status calls from input formats

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4.0
    • Component/s: task
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      when a job has many input paths - listStatus - or the improved listLocatedStatus - calls (invoked from the getSplits() method) can take a long time. Most of the time is spent waiting for the previous call to complete and then dispatching the next call.

      This can be greatly speeded up by dispatching multiple calls at once (via executors). If the same filesystem client is used - then the calls are much better pipelined (since calls are serialized) and don't impose extra burden on the namenode while at the same time greatly reducing the latency to the client. In a simple test on non-peak hours, this resulted in the getSplits() time reducing from about 3s to about 0.5s.

      1. MAPREDUCE-2349.1.wip.txt
        22 kB
        Siddharth Seth
      2. MAPREDUCE-2349.2.txt
        39 kB
        Siddharth Seth
      3. MAPREDUCE-2349.3.txt
        39 kB
        Siddharth Seth
      4. MAPREDUCE-2349.4.txt
        41 kB
        Siddharth Seth
      5. MAPREDUCE-2349.5.txt
        43 kB
        Siddharth Seth

        Issue Links

          Activity

          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.4.0 [ 12326141 ]
          Resolution Fixed [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Link This issue relates to MAPREDUCE-5603 [ MAPREDUCE-5603 ]
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Siddharth Seth made changes -
          Attachment MAPREDUCE-2349.5.txt [ 12635675 ]
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Siddharth Seth made changes -
          Attachment MAPREDUCE-2349.4.txt [ 12634271 ]
          Siddharth Seth made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Siddharth Seth made changes -
          Attachment MAPREDUCE-2349.3.txt [ 12630005 ]
          Siddharth Seth made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Siddharth Seth made changes -
          Attachment MAPREDUCE-2349.2.txt [ 12629825 ]
          Siddharth Seth made changes -
          Assignee Siddharth Seth [ sseth ]
          Siddharth Seth made changes -
          Target Version/s 2.4.0 [ 12326141 ]
          Siddharth Seth made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-2349.1.wip.txt [ 12628972 ]
          Joydeep Sen Sarma created issue -

            People

            • Assignee:
              Siddharth Seth
              Reporter:
              Joydeep Sen Sarma
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development