Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1246

Application health should not be affected by faulty nodes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Slider 0.92
    • Fix Version/s: Slider 1.0.0
    • Component/s: appmaster, core
    • Labels:
      None

      Description

      In case of a faulty node, multiple container failures will be deemed as an application failure.
      Observed this in HIVE-16927, where container failures in certain nodes brings down entire application. Slider has to provide a way to not mark application as unhealthy if certain threshold of containers are running. Tuning failure threshold is not optimal as setting the correct default on large cluster is not trivial. Beyond certain failures, slider should mark the node as unhealthy and report that back to client/AM. Application could continue to run as long as container request is satisfied partially (example: 80% containers are running).

        Attachments

        1. SLIDER-1246.04.patch
          24 kB
          Gour Saha
        2. SLIDER-1246.03.patch
          26 kB
          Gour Saha
        3. SLIDER-1246.02.patch
          29 kB
          Gour Saha
        4. SLIDER-1246.01.patch
          29 kB
          Gour Saha

          Issue Links

            Activity

              People

              • Assignee:
                gsaha Gour Saha
                Reporter:
                prasanth_j Prasanth Jayachandran
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: