Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-676

[Umbrella] Daemons crashing because of invalid state transitions

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There are several tickets tracking invalid transitions which essentially crash the daemons - RM, NM or AM. This is tracking ticket.

      We should try to fix as many of them as soon as possible.

      Attachments

        Issue Links

        1.
        Handle ( or throw a proper error when receiving) status updates from application masters that have not registered Sub-task Closed Mayank Bansal Actions
        2.
        RM crash with NPE on NODE_REMOVED event with FairScheduler Sub-task Closed Mayank Bansal Actions
        3.
        Node Manager can not handle duplicate responses Sub-task Open Mayank Bansal Actions
        4.
        Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl Sub-task Closed Mayank Bansal Actions
        5.
        Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Sub-task Resolved Mayank Bansal Actions
        6.
        Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE Sub-task Resolved Mayank Bansal Actions
        7.
        InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE for ContainerImpl in Node Manager Sub-task Resolved Mayank Bansal Actions
        8.
        NodeManager has invalid state transition after error in resource localization Sub-task Closed Mayank Bansal Actions
        9.
        RM crash with NPE on NODE_UPDATE Sub-task Closed Mayank Bansal Actions
        10.
        Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED Sub-task Resolved Devaraj Kavali Actions
        11.
        ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Sub-task Closed Zhijie Shen Actions
        12.
        Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed Sub-task Closed Zhijie Shen Actions
        13.
        ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL Sub-task Closed Zhijie Shen Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            vinodkv Vinod Kumar Vavilapalli

            Dates

              Created:
              Updated:

              Slack

                Issue deployment