[SLIDER-1246] Application health should not be affected by faulty nodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Slider 0.92
Fix Version/s: Slider 1.0.0
Component/s: appmaster, core
Labels:
None

Description

In case of a faulty node, multiple container failures will be deemed as an application failure.
Observed this in ~~HIVE-16927~~, where container failures in certain nodes brings down entire application. Slider has to provide a way to not mark application as unhealthy if certain threshold of containers are running. Tuning failure threshold is not optimal as setting the correct default on large cluster is not trivial. Beyond certain failures, slider should mark the node as unhealthy and report that back to client/AM. Application could continue to run as long as container request is satisfied partially (example: 80% containers are running).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SLIDER-1246.01.patch
28/Sep/17 16:09
29 kB
Gour Saha
SLIDER-1246.02.patch
28/Sep/17 18:47
29 kB
Gour Saha
SLIDER-1246.03.patch
29/Sep/17 09:17
26 kB
Gour Saha
SLIDER-1246.04.patch
29/Sep/17 22:17
24 kB
Gour Saha

Issue Links

blocks

HIVE-16927 LLAP: Slider takes down all daemons when some daemons fail repeatedly

Closed

Sub-Tasks

Tests for Health Threshold Monitoring Feature

Resolved

Gour Saha

Activity

People

Assignee:: Gour Saha

Reporter:: Prasanth Jayachandran

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Sep/17 22:20

Updated:: 04/Oct/17 07:54

Resolved:: 02/Oct/17 05:27