Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5626 Track and Address Flaky tests
  3. HDDS-5971

[disabled] TestHDDSUpgrade fails to allocate pipeline after finalization

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      TestHDDSUpgrade is frequently hitting maven global test timeout threshold (about 1 hr), causing integration (filesystem-hdds) to fail. The class's junit timeout is set to 11000000ms (3 hrs+).

      I've seen this at least 3 times recently for new PR CI runs. Need to investigate why some test cases can become stuck for so long. I ran the test class locally with IntelliJ and it finished in 5 min 55 sec:

      CC Aravindan Vijayan Ethan Rose

      Failing run:

      https://github.com/apache/ozone/runs/4160837361

      Found this I the above run's artifact bundle: No healthy node found to allocate container ?

      org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt
      2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find enough nodes that meet the space requirement of 1073741824 bytes for metadata and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
      2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found to allocate container.
      2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception while replicating container 2.
      org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to allocate container.
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
      	at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
      	at java.util.ArrayList.forEach(ArrayList.java:1259)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
      	at java.lang.Thread.run(Thread.java:748)
      2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  container.ReplicationManager (ReplicationManager.java:processAll(371)) - Replication Monitor Thread took 3 milliseconds for processing 2 containers.
      2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            smeng Siyao Meng

            Dates

              Created:
              Updated:

              Slack

                Issue deployment