[IOTDB-5019] [write]data region leader write many wal files file after restarting datanode on it - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.14.0-SNAPSHOT
Fix Version/s: None
Component/s: Core/Cluster
Labels:
- pull-request-available

Sprint:
2022-11-Cluster

Description

[write]data region leader write many wal files after restarting datanode on it

environment:
3C3D cluster, Nov. 21

reproduction:
1. Using iotdb-benchmarks write data to iotdb cluster for more than 6 hours, only 1 device 1 sensor with double values. 2 replicas.
2. The 46 node failed to writing data, so I restart data node of it, and it's the data region leader
3. Continue writing data to the same timeseries for about 8 hours. I find that most of data lay on 44 node

问题：

1. 为什么重启46前，44，46节点上的数据分布还是很均衡的，重启46后，wal文件几乎就只写在44上了呢

2. 为什么写了那么多的wal文件，远远大于数据数量和size

10	SchemaRegion	Running	root.aggr.g_0	1	0	1	172.20.70.44	6667	Follower
10	SchemaRegion	Running	root.aggr.g_0	1	0	5	172.20.70.46	6667	Leader
11	DataRegion	Running	root.aggr.g_0	1	10	1	172.20.70.44	6667	Follower
11	DataRegion	Running	root.aggr.g_0	1	10	5	172.20.70.46	6667	Leader

iotdb-1: 44
iotdb-2: 45
iotdb-3: 46
files:

atmos@i-rh6m726k root.aggr.g_0]$ ansible allnodes -m shell -a "find $IOTDB_HOME/data/datanode/data/sequence/root.aggr.g_0 -type f |wc -l"
iotdb-1 | CHANGED | rc=0 >>
1694
iotdb-2 | CHANGED | rc=0 >>
966
iotdb-3 | CHANGED | rc=0 >>
183