Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7022

Fast fail rogue jobs based on task scratch dir size

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0, 2.8.0, 2.9.0
    • Fix Version/s: 3.1.0
    • Component/s: task
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      With the introduction of MAPREDUCE-6489 there are some options to kill rogue tasks based on writes to local disk writes. In our environment are we mainly run Hive based jobs we noticed that this counter and the size of the local scratch dirs were very different. We had tasks where BYTES_WRITTEN counter were at 300Gb and where it was at 10Tb both producing around 200Gb on local disk, so it didn't help us much. So to extend this feature tasks should monitor local scratchdir size and fail if they pass the limit. In these cases the tasks should not be retried either but instead the job should fast fail.

        Attachments

        1. MAPREDUCE-7022.001.patch
          28 kB
          Johan Gustavsson
        2. MAPREDUCE-7022.002.patch
          30 kB
          Johan Gustavsson
        3. MAPREDUCE-7022.003.patch
          11 kB
          Johan Gustavsson
        4. MAPREDUCE-7022.004.patch
          28 kB
          Johan Gustavsson
        5. MAPREDUCE-7022.005.patch
          43 kB
          Johan Gustavsson
        6. MAPREDUCE-7022.006.patch
          46 kB
          Johan Gustavsson
        7. MAPREDUCE-7022.007.patch
          51 kB
          Johan Gustavsson
        8. MAPREDUCE-7022.008.patch
          52 kB
          Johan Gustavsson
        9. MAPREDUCE-7022.009.patch
          53 kB
          Johan Gustavsson

          Issue Links

            Activity

              People

              • Assignee:
                johang Johan Gustavsson
                Reporter:
                johang Johan Gustavsson
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: