Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-246

Periodic worker calls to context.progress() will prevent timeout on some Hadoop clusters during barrier waits

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Component/s: bsp
    • Labels:

      Description

      This simple change creates a command-line configurable option in GiraphJob to control the time between calls to context().progress() that allows workers to avoid timeouts during long data load-ins in which some works complete their input split reads much faster than others, or finish a super step faster. I found this allowed jobs that were large-scale but with low memory overhead to complete even when they would previously time out during runs on a Hadoop cluster. Timeout is still possible when the worker crashes or runs out of memory or has other GC or RPC trouble that is legitimate, but prevents unintentional crashes when the worker is actually still healthy.

        Attachments

        1. GIRAPH-246-NEW-FIX-2.patch
          6 kB
          Eli Reisman
        2. GIRAPH-246-NEW-FIX.patch
          6 kB
          Eli Reisman
        3. GIRAPH-246-9.patch
          7 kB
          Eli Reisman
        4. GIRAPH-246-8.patch
          7 kB
          Eli Reisman
        5. GIRAPH-246-7.patch
          11 kB
          Eli Reisman
        6. GIRAPH-246-7_rebase2.patch
          11 kB
          Eli Reisman
        7. GIRAPH-246-7_rebase1.patch
          10 kB
          Eli Reisman
        8. GIRAPH-246-6.patch
          11 kB
          Eli Reisman
        9. GIRAPH-246-5.patch
          11 kB
          Eli Reisman
        10. GIRAPH-246-4.patch
          4 kB
          Eli Reisman
        11. GIRAPH-246-3.patch
          4 kB
          Eli Reisman
        12. GIRAPH-246-2.patch
          4 kB
          Eli Reisman
        13. GIRAPH-246-11.patch
          5 kB
          Eli Reisman
        14. GIRAPH-246-10.patch
          5 kB
          Eli Reisman
        15. GIRAPH-246-1.patch
          5 kB
          Eli Reisman

          Issue Links

            Activity

              People

              • Assignee:
                initialcontext Eli Reisman
                Reporter:
                initialcontext Eli Reisman
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: