Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-4350

[ MultiLeader Throttle Down] Performance does not return to normal after “Throttle Down“

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0-SNAPSHOT
    • 1.0.0
    • mpp-cluster
    • None
    • 2022-11-Cluster

    Description

      m_0905_0095eb3,3副本3C3D

      3个dataregion , 每个node上有1个leader。
      ip72 断网3分钟(16:52 ~ 16:55),查看集群状态,切主成功后,
      ip73断网2分钟,之后不执行故障操作。

      同步慢,multiLeader一直在写入限流,但是限流性能也回不去,如下,统计1分钟的写入数据量(bm中的batch)

      IoTDB> select count(latency) from root.result.moresession_2022_09_06_04_47_03.INGESTION where okPoint>0 group by ([1662454041076000186,1662459764764000179),1m);

      -----------------------------------------------------------------------------------------------------+

      Time count(root.result.moresession_2022_09_06_04_47_03.INGESTION.latency)

      -----------------------------------------------------------------------------------------------------+

      2022-09-06T16:47:21.076000186+08:00 5544
      2022-09-06T16:48:21.076000186+08:00 6282
      2022-09-06T16:49:21.076000186+08:00 5671
      2022-09-06T16:50:21.076000186+08:00 4589
      2022-09-06T16:51:21.076000186+08:00 5350
      2022-09-06T16:52:21.076000186+08:00 1121
      2022-09-06T16:53:21.076000186+08:00 901
      2022-09-06T16:54:21.076000186+08:00 201
      2022-09-06T16:55:21.076000186+08:00 334
      2022-09-06T16:56:21.076000186+08:00 3501
      2022-09-06T16:57:21.076000186+08:00 3677
      2022-09-06T16:58:21.076000186+08:00 3111
      2022-09-06T16:59:21.076000186+08:00 1948
      2022-09-06T17:00:21.076000186+08:00 3889
      2022-09-06T17:01:21.076000186+08:00 2982
      2022-09-06T17:02:21.076000186+08:00 4465
      2022-09-06T17:03:21.076000186+08:00 4871
      2022-09-06T17:04:21.076000186+08:00 4478
      2022-09-06T17:05:21.076000186+08:00 3242
      2022-09-06T17:06:21.076000186+08:00 2545
      2022-09-06T17:07:21.076000186+08:00 2579
      2022-09-06T17:08:21.076000186+08:00 133
      2022-09-06T17:09:21.076000186+08:00 488
      2022-09-06T17:10:21.076000186+08:00 253
      2022-09-06T17:11:21.076000186+08:00 445
      2022-09-06T17:12:21.076000186+08:00 2122
      2022-09-06T17:13:21.076000186+08:00 1799
      2022-09-06T17:14:21.076000186+08:00 1568
      2022-09-06T17:15:21.076000186+08:00 355
      2022-09-06T17:16:21.076000186+08:00 1127
      2022-09-06T17:17:21.076000186+08:00 803
      2022-09-06T17:18:21.076000186+08:00 674
      2022-09-06T17:19:21.076000186+08:00 621
      2022-09-06T17:20:21.076000186+08:00 361
      2022-09-06T17:21:21.076000186+08:00 367
      2022-09-06T17:22:21.076000186+08:00 999
      2022-09-06T17:23:21.076000186+08:00 1119
      2022-09-06T17:24:21.076000186+08:00 1113
      2022-09-06T17:25:21.076000186+08:00 1737
      2022-09-06T17:26:21.076000186+08:00 1282
      2022-09-06T17:27:21.076000186+08:00 4454
      2022-09-06T17:28:21.076000186+08:00 2013
      2022-09-06T17:29:21.076000186+08:00 623
      2022-09-06T17:30:21.076000186+08:00 313
      2022-09-06T17:31:21.076000186+08:00 455
      2022-09-06T17:32:21.076000186+08:00 353
      2022-09-06T17:33:21.076000186+08:00 347
      2022-09-06T17:34:21.076000186+08:00 587
      2022-09-06T17:35:21.076000186+08:00 1370
      2022-09-06T17:36:21.076000186+08:00 341
      2022-09-06T17:37:21.076000186+08:00 1555
      2022-09-06T17:38:21.076000186+08:00 3266
      2022-09-06T17:39:21.076000186+08:00 1344
      2022-09-06T17:40:21.076000186+08:00 1057
      2022-09-06T17:41:21.076000186+08:00 682
      2022-09-06T17:42:21.076000186+08:00 231
      2022-09-06T17:43:21.076000186+08:00 170
      2022-09-06T17:44:21.076000186+08:00 729
      2022-09-06T17:45:21.076000186+08:00 118
      2022-09-06T17:46:21.076000186+08:00 135
      2022-09-06T17:47:21.076000186+08:00 109
      2022-09-06T17:48:21.076000186+08:00 167
      2022-09-06T17:49:21.076000186+08:00 139
      2022-09-06T17:50:21.076000186+08:00 138
      2022-09-06T17:51:21.076000186+08:00 321
      2022-09-06T17:52:21.076000186+08:00 138
      2022-09-06T17:53:21.076000186+08:00 326
      2022-09-06T17:54:21.076000186+08:00 166
      2022-09-06T17:55:21.076000186+08:00 70
      2022-09-06T17:56:21.076000186+08:00 302
      2022-09-06T17:57:21.076000186+08:00 587
      2022-09-06T17:58:21.076000186+08:00 25
      2022-09-06T17:59:21.076000186+08:00 427
      2022-09-06T18:00:21.076000186+08:00 2
      2022-09-06T18:01:21.076000186+08:00 96
      2022-09-06T18:02:21.076000186+08:00 72
      2022-09-06T18:03:21.076000186+08:00 94
      2022-09-06T18:04:21.076000186+08:00 99
      2022-09-06T18:05:21.076000186+08:00 66
      2022-09-06T18:06:21.076000186+08:00 230
      2022-09-06T18:07:21.076000186+08:00 10
      2022-09-06T18:08:21.076000186+08:00 335
      2022-09-06T18:09:21.076000186+08:00 25
      2022-09-06T18:10:21.076000186+08:00 10
      2022-09-06T18:11:21.076000186+08:00 18
      2022-09-06T18:12:21.076000186+08:00 142
      2022-09-06T18:13:21.076000186+08:00 281
      2022-09-06T18:14:21.076000186+08:00 30
      2022-09-06T18:15:21.076000186+08:00 14
      2022-09-06T18:16:21.076000186+08:00 8
      2022-09-06T18:17:21.076000186+08:00 7
      2022-09-06T18:18:21.076000186+08:00 38
      2022-09-06T18:19:21.076000186+08:00 13
      2022-09-06T18:20:21.076000186+08:00 40
      2022-09-06T18:21:21.076000186+08:00 12
      2022-09-06T18:22:21.076000186+08:00 10

      -----------------------------------------------------------------------------------------------------+
      Total line number = 96

      复现流程:
      1. 机器配置 192.168.10.72/73/74 48核386GB
      bm在ip71

      2. 数据库配置
      ConfigNode
      MAX_HEAP_SIZE="8G"

      schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
      data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
      schema_replication_factor=3
      data_replication_factor=3

      DataNode
      MAX_HEAP_SIZE="256G"
      MAX_DIRECT_MEMORY_SIZE="32G"
      max_connection_for_internal_service=1100
      max_waiting_time_when_insert_blocked=3600000
      query_timeout_threshold=3600000

      2. benchmark见附件
      故障前的region信息

      3. 断网
      ip72断网
      cat restart_network.sh
      #!/bin/bash
      sudo ifconfig enp129s0f1 down
      sleep $1
      sudo ifconfig enp129s0f1 up

      nohup sh -x restart_network.sh "120" > a.log &

      查看region状态,切主成功后,

      ip73也执行断网,ip73恢复后

      网络传输

      Attachments

        1. image-2022-09-07-14-52-58-266.png
          50 kB
          刘珍
        2. image-2023-02-17-08-59-08-601.png
          35 kB
          刘珍
        3. image-2023-02-17-09-01-50-592.png
          35 kB
          刘珍
        4. net_restart.conf
          14 kB
          刘珍
        5. screenshot-1.png
          31 kB
          刘珍
        6. screenshot-2.png
          30 kB
          刘珍
        7. screenshot-3.png
          29 kB
          刘珍
        8. screenshot-4.png
          197 kB
          刘珍
        9. screenshot-5.png
          292 kB
          刘珍
        10. screenshot-6.png
          23 kB
          刘珍

        Activity

          People

            spricoder Hongyin Zhang
            刘珍 刘珍
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: