Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3591

Resource localization on a bad disk causes subsequent containers failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.0
    • 2.8.0, 3.0.0-alpha1
    • None
    • None

    Description

      It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path.

      In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open.

      Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS.

      A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1.

      Attachments

        1. YARN-3591.9.patch
          16 kB
          Lavkesh Lahngir
        2. YARN-3591.8.patch
          12 kB
          Lavkesh Lahngir
        3. YARN-3591.7.patch
          2 kB
          Lavkesh Lahngir
        4. YARN-3591.6.patch
          2 kB
          Lavkesh Lahngir
        5. YARN-3591.5.patch
          13 kB
          Lavkesh Lahngir
        6. YARN-3591.4.patch
          11 kB
          Lavkesh Lahngir
        7. YARN-3591.3.patch
          11 kB
          Lavkesh Lahngir
        8. YARN-3591.2.patch
          2 kB
          Lavkesh Lahngir
        9. 0001-YARN-3591.1.patch
          1 kB
          Lavkesh Lahngir
        10. 0001-YARN-3591.patch
          1 kB
          Lavkesh Lahngir

        Activity

          People

            lavkesh Lavkesh Lahngir
            lavkesh Lavkesh Lahngir
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: