Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-676

[Umbrella] Daemons crashing because of invalid state transitions

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There are several tickets tracking invalid transitions which essentially crash the daemons - RM, NM or AM. This is tracking ticket.

      We should try to fix as many of them as soon as possible.

        Attachments

          Issue Links

          1.
          Handle ( or throw a proper error when receiving) status updates from application masters that have not registered Sub-task Closed Mayank Bansal
          2.
          RM crash with NPE on NODE_REMOVED event with FairScheduler Sub-task Closed Mayank Bansal
          3.
          Node Manager can not handle duplicate responses Sub-task Open Mayank Bansal
          4.
          Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl Sub-task Closed Mayank Bansal
          5.
          Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Sub-task Resolved Mayank Bansal
          6.
          Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE Sub-task Resolved Mayank Bansal
          7.
          InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager Sub-task Resolved Mayank Bansal
          8.
          NodeManager has invalid state transition after error in resource localization Sub-task Closed Mayank Bansal
          9.
          RM crash with NPE on NODE_UPDATE Sub-task Closed Mayank Bansal
          10.
          Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED Sub-task Resolved Devaraj Kavali
          11.
          ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Sub-task Closed Zhijie Shen
          12.
          Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed Sub-task Closed Zhijie Shen
          13.
          ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL Sub-task Closed Zhijie Shen

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vinodkv Vinod Kumar Vavilapalli
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: