Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1127 Fix failing and intermittent Ozone unit tests
  3. HDDS-3358

Intermittent test failure related to a race conditon during PipelineManager close

    XMLWordPrintableJSON

    Details

    • Target Version/s:
    • Sprint:
      HDDS BadLands

      Description

      The test which is failed:

      TestSCMNodeManager

      The end of the log is:

      2020-04-08 10:49:44,544 ERROR events.SingleThreadExecutor (SingleThreadExecutor.java:lambda$onMessage$1(84)) - Error on execution message 19844615-0d70-4172-8c34-96e5b7295ef2{ip: 196.189.243.187, host: localhost-196.189.243.187, networkLocation: /default-rack, certSerialId: null}
      java.lang.NullPointerException
              at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.finalizeAndDestroyPipeline(SCMPipelineManager.java:380)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:63)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:38)
              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2020-04-08 10:49:44,544 INFO  node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null} moved to stale state. Finalizing its pipelines [PipelineID=fd1f9e92-2f90-43e7-8406-94ba6ac356b0, PipelineID=8d380e3c-b632-4bda-aa7a-554774fba09d]
      2020-04-08 10:49:44,544 INFO  pipeline.SCMPipelineManager (SCMPipelineManager.java:finalizeAndDestroyPipeline(373)) - Destroying pipeline:Pipeline[ Id: fd1f9e92-2f90-43e7-8406-94ba6ac356b0, Nodes: 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-04-08T10:49:37.441Z]
      2020-04-08 10:49:44,544 INFO  pipeline.PipelineStateManager (PipelineStateManager.java:finalizePipeline(120)) - Pipeline Pipeline[ Id: fd1f9e92-2f90-43e7-8406-94ba6ac356b0, Nodes: 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:CLOSED, leaderId:null, CreationTimestamp2020-04-08T10:49:37.441Z] moved to CLOSED state
      2020-04-08 10:49:44,544 ERROR events.SingleThreadExecutor (SingleThreadExecutor.java:lambda$onMessage$1(84)) - Error on execution message 0914e56d-c7f8-4e0a-8fd1-845a9806172b{ip: 57.46.156.17, host: localhost-57.46.156.17, networkLocation: /default-rack, certSerialId: null}
      java.lang.NullPointerException
              at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.finalizeAndDestroyPipeline(SCMPipelineManager.java:380)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:63)
              at org.apache.hadoop.hdds.scm.node.StaleNodeHandler.onMessage(StaleNodeHandler.java:38)
              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2020-04-08 10:49:44,544 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$close$4(208)) - Send pipeline:PipelineID=e0e155c6-9fbe-46a7-b742-e805ea9baacf close command to datanode 30a24b04-1289-4c30-a28a-034edfe29e3d
      2020-04-08 10:49:44,545 WARN  events.EventQueue (EventQueue.java:fireEvent(151)) - Processing of TypedEvent{payloadType=CommandForDatanode, name='Datanode_Command'} is skipped, EventQueue is not running
      2020-04-08 10:49:44,544 INFO  node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode 59bdd26b-05da-47d1-8c3f-8350d55d7299{ip: 248.147.58.17, host: localhost-248.147.58.17, networkLocation: /default-rack, certSerialId: null} moved to stale state. Finalizing its pipelines [PipelineID=17b032b7-b9c4-41eb-bba6-50106881886d, PipelineID=60de1ca6-4115-415b-bbf1-06b86113df94]
      2020-04-08 10:49:44,576 WARN  server.ServerUtils (ServerUtils.java:getScmDbDir(148)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      2020-04-08 10:49:44,579 WARN  server.ServerUtils (ServerUtils.java:getScmDbDir(148)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      2020-04-08 10:49:44,579 WARN  db.DBDefinition (DBDefinition.java:createDBStoreBuilder(63)) - ozone.scm.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
      

        Attachments

          Activity

            People

            • Assignee:
              elek Marton Elek
              Reporter:
              elek Marton Elek
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: