Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-676

[Umbrella] Daemons crashing because of invalid state transitions

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There are several tickets tracking invalid transitions which essentially crash the daemons - RM, NM or AM. This is tracking ticket.

      We should try to fix as many of them as soon as possible.

      Attachments

        Issue Links

          1.
          Handle ( or throw a proper error when receiving) status updates from application masters that have not registered Sub-task Closed Mayank Bansal
          2.
          RM crash with NPE on NODE_REMOVED event with FairScheduler Sub-task Closed Mayank Bansal
          3.
          Node Manager can not handle duplicate responses Sub-task Open Mayank Bansal
          4.
          Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl Sub-task Closed Mayank Bansal
          5.
          Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Sub-task Resolved Mayank Bansal
          6.
          Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE Sub-task Resolved Mayank Bansal
          7.
          InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager Sub-task Resolved Mayank Bansal
          8.
          NodeManager has invalid state transition after error in resource localization Sub-task Closed Mayank Bansal
          9.
          RM crash with NPE on NODE_UPDATE Sub-task Closed Mayank Bansal
          10.
          Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED Sub-task Resolved Devaraj Kavali
          11.
          ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Sub-task Closed Zhijie Shen
          12.
          Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed Sub-task Closed Zhijie Shen
          13.
          ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL Sub-task Closed Zhijie Shen

          Activity

            People

              Unassigned Unassigned
              vinodkv Vinod Kumar Vavilapalli
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: