Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2162

speculative execution does not handle cases where stddev > mean well

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      the new speculation code only speculates tasks whose progress rate deviates from the mean progress rate of a job by more than some multiple (typically 1.0) of stddev. stddev can be larger than mean. which means that if we ever get into a situation where this condition holds true - then a task with even 0 progress rate will not be speculated.

      it's not clear that this condition is self-correcting. if a job has thousands of tasks - then one laggard task, inspite of not being speculated for a long time, may not be able to fix the condition of stddev > mean.

      we have seen jobs where tasks have not been speculated for hours and this seems one explanation why this may have happened. here's an example job with stddev > mean:

      DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is 2.9381575958035014E-16 mean is 2.8568424662959537E-9 std() is 6.388093955645905E-9

      Attachments

        Activity

          People

            jsensarma Joydeep Sen Sarma
            jsensarma Joydeep Sen Sarma
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: