[HDDS-5971] [disabled] TestHDDSUpgrade fails to allocate pipeline after finalization - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Add vote

Voters

Watch issue

Watchers

Convert to Issue

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

TestHDDSUpgrade is frequently hitting maven global test timeout threshold (about 1 hr), causing integration (filesystem-hdds) to fail. The class's junit timeout is set to 11000000ms (3 hrs+).

I've seen this at least 3 times recently for new PR CI runs. Need to investigate why some test cases can become stuck for so long. I ran the test class locally with IntelliJ and it finished in 5 min 55 sec:

CC Aravindan Vijayan Ethan Rose

Failing run:

https://github.com/apache/ozone/runs/4160837361

Found this I the above run's artifact bundle: No healthy node found to allocate container ?

org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt

2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find enough nodes that meet the space requirement of 1073741824 bytes for metadata and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found to allocate container.
2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception while replicating container 2.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to allocate container.
	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
	at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
	at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
	at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
	at java.lang.Thread.run(Thread.java:748)
2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  container.ReplicationManager (ReplicationManager.java:processAll(371)) - Replication Monitor Thread took 3 milliseconds for processing 2 containers.
2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.

Attachments

4390545403-it-filesystem-hdds.zip
02/Dec/21 05:55
332 kB
Siyao Meng
screenshot-1.jpg
11/Nov/21 03:39
67 kB
Siyao Meng

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned Assign to me

Reporter:: Siyao Meng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Nov/21 03:35

Updated:: 17/Jan/24 13:16

Agile

View on Board

[disabled] TestHDDSUpgrade fails to allocate pipeline after finalization

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment