Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-5928

DeadLock between TTL and Compaction

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.1.1
    • None
    • 2023-3-Storage

    Description

      版本

      Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1)

      复现步骤

      问题描述:
      TTL 和 合并并发产生死锁,数据写入不进去(没报错信息)。
      测试流程如下:
      1. 测试版本
      Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) 
      启动3副本3C5D集群,配置参数以ip74为例:
      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/confignode-env.sh

      MAX_HEAP_SIZE="8G"

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-confignode.properties
      cn_internal_address=192.168.10.74
      cn_target_config_node_list=192.168.10.72:10710
      cn_connection_timeout_ms=120000
      cn_metric_reporter_list=PROMETHEUS
      cn_metric_level=IMPORTANT
      cn_metric_prometheus_reporter_port=9081

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/datanode-env.sh
      MAX_HEAP_SIZE="256G"
      MAX_DIRECT_MEMORY_SIZE="32G"

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-datanode.properties
      dn_rpc_address=192.168.10.74
      dn_internal_address=192.168.10.74
      dn_target_config_node_list=192.168.10.72:10710,192.168.10.73:10710,192.168.10.74:10710
      dn_connection_timeout_ms=120000
      dn_metric_reporter_list=PROMETHEUS
      dn_metric_level=IMPORTANT

      liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-common.properties
      schema_replication_factor=3
      data_replication_factor=3
      series_slot_num=1000
      schema_region_group_extension_policy=CUSTOM
      default_schema_region_group_num_per_database=10
      data_region_group_extension_policy=CUSTOM
      default_data_region_group_num_per_database=20
      disk_space_warning_threshold=0.01
      query_timeout_threshold=36000000
      iot_consensus_throttle_threshold_in_byte=536870912000
      2. 启动benchmark 读写,配置文件见附件 0517_rc4_lt.conf
      3.启动TTL 脚本,配置文件见附件set_ttl.sh

      每48小时,先把集群置为READONLY, 再设置TTL 删除所有的tsfile(没flush,没封口的tsfile不删除),unset ttl ,设置集群为RUNNING。(这期间benchmark客户端读写操作不停


      4.运行4 day,出现死锁,数据写入不进去。
      监控看到的write point per second 为0

       

      Bug 现象

      TTL 和 合并并发产生死锁

      预期结果

      无死锁

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xingtanzjr Jinrui Zhang
            xingtanzjr Jinrui Zhang

            Dates

              Created:
              Updated:

              Agile

                Active Sprint:
                2023-3-Storage ends 31/Mar/23
                View on Board

                Slack

                  Issue deployment