Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3591

Resource localization on a bad disk causes subsequent containers failure


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
    • Target Version/s:


      It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path.

      In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open.

      Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS.

      A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1.


        1. YARN-3591.9.patch
          16 kB
          Lavkesh Lahngir
        2. YARN-3591.8.patch
          12 kB
          Lavkesh Lahngir
        3. YARN-3591.7.patch
          2 kB
          Lavkesh Lahngir
        4. YARN-3591.6.patch
          2 kB
          Lavkesh Lahngir
        5. YARN-3591.5.patch
          13 kB
          Lavkesh Lahngir
        6. YARN-3591.4.patch
          11 kB
          Lavkesh Lahngir
        7. YARN-3591.3.patch
          11 kB
          Lavkesh Lahngir
        8. YARN-3591.2.patch
          2 kB
          Lavkesh Lahngir
        9. 0001-YARN-3591.patch
          1 kB
          Lavkesh Lahngir
        10. 0001-YARN-3591.1.patch
          1 kB
          Lavkesh Lahngir
        There are no Sub-Tasks for this issue.



            • Assignee:
              lavkesh Lavkesh Lahngir
              lavkesh Lavkesh Lahngir
            • Votes:
              0 Vote for this issue
              17 Start watching this issue


              • Created: