Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0-SNAPSHOT
-
None
-
2022-11-Cluster
Description
m_0905_0095eb3,3副本3C3D
3个dataregion , 每个node上有1个leader。
ip72 断网3分钟(16:52 ~ 16:55),查看集群状态,切主成功后,
ip73断网2分钟,之后不执行故障操作。
同步慢,multiLeader一直在写入限流,但是限流性能也回不去,如下,统计1分钟的写入数据量(bm中的batch)
IoTDB> select count(latency) from root.result.moresession_2022_09_06_04_47_03.INGESTION where okPoint>0 group by ([1662454041076000186,1662459764764000179),1m);
-----------------------------------------------------------------------------------------------------+
Time | count(root.result.moresession_2022_09_06_04_47_03.INGESTION.latency) |
-----------------------------------------------------------------------------------------------------+
2022-09-06T16:47:21.076000186+08:00 | 5544 |
2022-09-06T16:48:21.076000186+08:00 | 6282 |
2022-09-06T16:49:21.076000186+08:00 | 5671 |
2022-09-06T16:50:21.076000186+08:00 | 4589 |
2022-09-06T16:51:21.076000186+08:00 | 5350 |
2022-09-06T16:52:21.076000186+08:00 | 1121 |
2022-09-06T16:53:21.076000186+08:00 | 901 |
2022-09-06T16:54:21.076000186+08:00 | 201 |
2022-09-06T16:55:21.076000186+08:00 | 334 |
2022-09-06T16:56:21.076000186+08:00 | 3501 |
2022-09-06T16:57:21.076000186+08:00 | 3677 |
2022-09-06T16:58:21.076000186+08:00 | 3111 |
2022-09-06T16:59:21.076000186+08:00 | 1948 |
2022-09-06T17:00:21.076000186+08:00 | 3889 |
2022-09-06T17:01:21.076000186+08:00 | 2982 |
2022-09-06T17:02:21.076000186+08:00 | 4465 |
2022-09-06T17:03:21.076000186+08:00 | 4871 |
2022-09-06T17:04:21.076000186+08:00 | 4478 |
2022-09-06T17:05:21.076000186+08:00 | 3242 |
2022-09-06T17:06:21.076000186+08:00 | 2545 |
2022-09-06T17:07:21.076000186+08:00 | 2579 |
2022-09-06T17:08:21.076000186+08:00 | 133 |
2022-09-06T17:09:21.076000186+08:00 | 488 |
2022-09-06T17:10:21.076000186+08:00 | 253 |
2022-09-06T17:11:21.076000186+08:00 | 445 |
2022-09-06T17:12:21.076000186+08:00 | 2122 |
2022-09-06T17:13:21.076000186+08:00 | 1799 |
2022-09-06T17:14:21.076000186+08:00 | 1568 |
2022-09-06T17:15:21.076000186+08:00 | 355 |
2022-09-06T17:16:21.076000186+08:00 | 1127 |
2022-09-06T17:17:21.076000186+08:00 | 803 |
2022-09-06T17:18:21.076000186+08:00 | 674 |
2022-09-06T17:19:21.076000186+08:00 | 621 |
2022-09-06T17:20:21.076000186+08:00 | 361 |
2022-09-06T17:21:21.076000186+08:00 | 367 |
2022-09-06T17:22:21.076000186+08:00 | 999 |
2022-09-06T17:23:21.076000186+08:00 | 1119 |
2022-09-06T17:24:21.076000186+08:00 | 1113 |
2022-09-06T17:25:21.076000186+08:00 | 1737 |
2022-09-06T17:26:21.076000186+08:00 | 1282 |
2022-09-06T17:27:21.076000186+08:00 | 4454 |
2022-09-06T17:28:21.076000186+08:00 | 2013 |
2022-09-06T17:29:21.076000186+08:00 | 623 |
2022-09-06T17:30:21.076000186+08:00 | 313 |
2022-09-06T17:31:21.076000186+08:00 | 455 |
2022-09-06T17:32:21.076000186+08:00 | 353 |
2022-09-06T17:33:21.076000186+08:00 | 347 |
2022-09-06T17:34:21.076000186+08:00 | 587 |
2022-09-06T17:35:21.076000186+08:00 | 1370 |
2022-09-06T17:36:21.076000186+08:00 | 341 |
2022-09-06T17:37:21.076000186+08:00 | 1555 |
2022-09-06T17:38:21.076000186+08:00 | 3266 |
2022-09-06T17:39:21.076000186+08:00 | 1344 |
2022-09-06T17:40:21.076000186+08:00 | 1057 |
2022-09-06T17:41:21.076000186+08:00 | 682 |
2022-09-06T17:42:21.076000186+08:00 | 231 |
2022-09-06T17:43:21.076000186+08:00 | 170 |
2022-09-06T17:44:21.076000186+08:00 | 729 |
2022-09-06T17:45:21.076000186+08:00 | 118 |
2022-09-06T17:46:21.076000186+08:00 | 135 |
2022-09-06T17:47:21.076000186+08:00 | 109 |
2022-09-06T17:48:21.076000186+08:00 | 167 |
2022-09-06T17:49:21.076000186+08:00 | 139 |
2022-09-06T17:50:21.076000186+08:00 | 138 |
2022-09-06T17:51:21.076000186+08:00 | 321 |
2022-09-06T17:52:21.076000186+08:00 | 138 |
2022-09-06T17:53:21.076000186+08:00 | 326 |
2022-09-06T17:54:21.076000186+08:00 | 166 |
2022-09-06T17:55:21.076000186+08:00 | 70 |
2022-09-06T17:56:21.076000186+08:00 | 302 |
2022-09-06T17:57:21.076000186+08:00 | 587 |
2022-09-06T17:58:21.076000186+08:00 | 25 |
2022-09-06T17:59:21.076000186+08:00 | 427 |
2022-09-06T18:00:21.076000186+08:00 | 2 |
2022-09-06T18:01:21.076000186+08:00 | 96 |
2022-09-06T18:02:21.076000186+08:00 | 72 |
2022-09-06T18:03:21.076000186+08:00 | 94 |
2022-09-06T18:04:21.076000186+08:00 | 99 |
2022-09-06T18:05:21.076000186+08:00 | 66 |
2022-09-06T18:06:21.076000186+08:00 | 230 |
2022-09-06T18:07:21.076000186+08:00 | 10 |
2022-09-06T18:08:21.076000186+08:00 | 335 |
2022-09-06T18:09:21.076000186+08:00 | 25 |
2022-09-06T18:10:21.076000186+08:00 | 10 |
2022-09-06T18:11:21.076000186+08:00 | 18 |
2022-09-06T18:12:21.076000186+08:00 | 142 |
2022-09-06T18:13:21.076000186+08:00 | 281 |
2022-09-06T18:14:21.076000186+08:00 | 30 |
2022-09-06T18:15:21.076000186+08:00 | 14 |
2022-09-06T18:16:21.076000186+08:00 | 8 |
2022-09-06T18:17:21.076000186+08:00 | 7 |
2022-09-06T18:18:21.076000186+08:00 | 38 |
2022-09-06T18:19:21.076000186+08:00 | 13 |
2022-09-06T18:20:21.076000186+08:00 | 40 |
2022-09-06T18:21:21.076000186+08:00 | 12 |
2022-09-06T18:22:21.076000186+08:00 | 10 |
-----------------------------------------------------------------------------------------------------+
Total line number = 96
复现流程:
1. 机器配置 192.168.10.72/73/74 48核386GB
bm在ip71
2. 数据库配置
ConfigNode
MAX_HEAP_SIZE="8G"
schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
schema_replication_factor=3
data_replication_factor=3
DataNode
MAX_HEAP_SIZE="256G"
MAX_DIRECT_MEMORY_SIZE="32G"
max_connection_for_internal_service=1100
max_waiting_time_when_insert_blocked=3600000
query_timeout_threshold=3600000
2. benchmark见附件
故障前的region信息
3. 断网
ip72断网
cat restart_network.sh
#!/bin/bash
sudo ifconfig enp129s0f1 down
sleep $1
sudo ifconfig enp129s0f1 up
nohup sh -x restart_network.sh "120" > a.log &
查看region状态,切主成功后,
ip73也执行断网,ip73恢复后
网络传输