Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1127 Fix failing and intermittent Ozone unit tests
  3. HDDS-3358

Intermittent test failure related to a race conditon during PipelineManager close

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • HDDS BadLands

    Description

      The test which is failed:

      TestSCMNodeManager

      The end of the log is:

      2020-04-08 10:49:44,544 ERROR events.SingleThreadExecutor (SingleThreadExecutor.java:lambda$onMessage$1(84)) - Error on execution message 19844615-0d70-4172-8c34-96e5b7295ef2{ip: 196.189.243.187, host: localhost-196.189.243.187, networkLocation: /default-rack, certSerialId: null}
      java.lang.NullPointerException
              at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.finalizeAndDestroyPipeline(SCMPipelineManager.java:380)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:63)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:38)
              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2020-04-08 10:49:44,544 INFO  node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null} moved to stale state. Finalizing its pipelines [PipelineID=fd1f9e92-2f90-43e7-8406-94ba6ac356b0, PipelineID=8d380e3c-b632-4bda-aa7a-554774fba09d]
      2020-04-08 10:49:44,544 INFO  pipeline.SCMPipelineManager (SCMPipelineManager.java:finalizeAndDestroyPipeline(373)) - Destroying pipeline:Pipeline[ Id: fd1f9e92-2f90-43e7-8406-94ba6ac356b0, Nodes: 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-04-08T10:49:37.441Z]
      2020-04-08 10:49:44,544 INFO  pipeline.PipelineStateManager (PipelineStateManager.java:finalizePipeline(120)) - Pipeline Pipeline[ Id: fd1f9e92-2f90-43e7-8406-94ba6ac356b0, Nodes: 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:CLOSED, leaderId:null, CreationTimestamp2020-04-08T10:49:37.441Z] moved to CLOSED state
      2020-04-08 10:49:44,544 ERROR events.SingleThreadExecutor (SingleThreadExecutor.java:lambda$onMessage$1(84)) - Error on execution message 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}
      java.lang.NullPointerException
              at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.finalizeAndDestroyPipeline(SCMPipelineManager.java:380)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:63)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:38)
              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2020-04-08 10:49:44,544 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$close$4(208)) - Send pipeline:PipelineID=e0e155c6-9fbe-46a7-b742-e805ea9baacf close command to datanode 30a24b04-1289-4c30-a28a-034edfe29e3d
      2020-04-08 10:49:44,545 WARN  events.EventQueue (EventQueue.java:fireEvent(151)) - Processing of TypedEvent{payloadType=CommandForDatanode, name='Datanode_Command'} is skipped, EventQueue is not running
      2020-04-08 10:49:44,544 INFO  node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode 59bdd26b-05da-47d1-8c3f-8350d55d7299{ip: 248.147.58.17, host: localhost-248.147.58.17, networkLocation: /default-rack, certSerialId: null} moved to stale state. Finalizing its pipelines [PipelineID=17b032b7-b9c4-41eb-bba6-50106881886d, PipelineID=60de1ca6-4115-415b-bbf1-06b86113df94]
      2020-04-08 10:49:44,576 WARN  server.ServerUtils (ServerUtils.java:getScmDbDir(148)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      2020-04-08 10:49:44,579 WARN  server.ServerUtils (ServerUtils.java:getScmDbDir(148)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      2020-04-08 10:49:44,579 WARN  db.DBDefinition (DBDefinition.java:createDBStoreBuilder(63)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            elek Marton Elek
            elek Marton Elek

            Dates

              Created:
              Updated:

              Agile

                Completed Sprint:
                HDDS BadLands ended 05/Aug/19
                View on Board

                Slack

                  Issue deployment