Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9670

Missing Fsync for localized resources before updating to finalized in statestore

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • None
    • nodemanager
    • None

    Description

      A resource that was localized is not properly FSynced before the state-manager is updated to track this resource as finalized. The Download is currently considered finished after the target local outputstream is closed. The data might not have made it to the blockdevice before the statestore is updated. Containers relying on the resource may see only parts of the resource after recovery usually leading to them crashing.

       

      Possible fixes:

      Introduce a new step in the state machine that Fsyncs the downloaded path before calling the statestore.

      On recovery we can compare the size (and we probably have to unpack archives again)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jfilipiak Jan Filipiak
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: