Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-5928

DeadLock between TTL and Compaction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.1.1
    • None
    • 2023-3-Storage

    Description

      版本

      Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1)

      复现步骤

      问题描述:
      TTL 和 合并并发产生死锁,数据写入不进去(没报错信息)。
      测试流程如下:
      1. 测试版本
      Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) 
      启动3副本3C5D集群,配置参数以ip74为例:
      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/confignode-env.sh

      MAX_HEAP_SIZE="8G"

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-confignode.properties
      cn_internal_address=192.168.10.74
      cn_target_config_node_list=192.168.10.72:10710
      cn_connection_timeout_ms=120000
      cn_metric_reporter_list=PROMETHEUS
      cn_metric_level=IMPORTANT
      cn_metric_prometheus_reporter_port=9081

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/datanode-env.sh
      MAX_HEAP_SIZE="256G"
      MAX_DIRECT_MEMORY_SIZE="32G"

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-datanode.properties
      dn_rpc_address=192.168.10.74
      dn_internal_address=192.168.10.74
      dn_target_config_node_list=192.168.10.72:10710,192.168.10.73:10710,192.168.10.74:10710
      dn_connection_timeout_ms=120000
      dn_metric_reporter_list=PROMETHEUS
      dn_metric_level=IMPORTANT

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-common.properties
      schema_replication_factor=3
      data_replication_factor=3
      series_slot_num=1000
      schema_region_group_extension_policy=CUSTOM
      default_schema_region_group_num_per_database=10
      data_region_group_extension_policy=CUSTOM
      default_data_region_group_num_per_database=20
      disk_space_warning_threshold=0.01
      query_timeout_threshold=36000000
      iot_consensus_throttle_threshold_in_byte=536870912000
      2. 启动benchmark 读写,配置文件见附件 0517_rc4_lt.conf
      3.启动TTL 脚本,配置文件见附件set_ttl.sh

      每48小时,先把集群置为READONLY, 再设置TTL 删除所有的tsfile(没flush,没封口的tsfile不删除),unset ttl ,设置集群为RUNNING。(这期间benchmark客户端读写操作不停


      4.运行4 day,出现死锁,数据写入不进去。
      监控看到的write point per second 为0

       

      Bug 现象

      TTL 和 合并并发产生死锁

      预期结果

      无死锁

      Attachments

        Issue Links

          Activity

            People

              xingtanzjr Jinrui Zhang
              xingtanzjr Jinrui Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: