[YARN-3591] Resource localization on a bad disk causes subsequent containers failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.8.0

Description

It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path.

In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open.

Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS.

A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-YARN-3591.patch
07/May/15 12:56
1 kB
Lavkesh Lahngir
0001-YARN-3591.1.patch
08/May/15 08:16
1 kB
Lavkesh Lahngir
YARN-3591.2.patch
13/May/15 06:41
2 kB
Lavkesh Lahngir
YARN-3591.3.patch
19/May/15 12:56
11 kB
Lavkesh Lahngir
YARN-3591.4.patch
20/May/15 07:27
11 kB
Lavkesh Lahngir
YARN-3591.5.patch
17/Jun/15 14:22
13 kB
Lavkesh Lahngir
YARN-3591.6.patch
07/Aug/15 10:21
2 kB
Lavkesh Lahngir
YARN-3591.7.patch
07/Aug/15 11:59
2 kB
Lavkesh Lahngir
YARN-3591.8.patch
02/Sep/15 09:22
12 kB
Lavkesh Lahngir
YARN-3591.9.patch
03/Sep/15 09:00
16 kB
Lavkesh Lahngir

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Lavkesh Lahngir

Reporter:: Lavkesh Lahngir

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 07/May/15 12:51

Updated:: 08/Jun/17 13:03

Resolved:: 07/Sep/15 06:08