Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-3247

[Persistent schema] [wal recovery] Aligned sensors, query lost data

    XMLWordPrintableJSON

Details

    • 2022-6-Cluster

    Description

      master_0519_81b9117
      问题描述(元数据持久化 + WAL恢复):
      100sg,500dev,20万序列/dev,共1亿对齐序列,每个序列写入10个点。
      每个device,delete 51个序列,重启iotdb,wal恢复有2个问题:
      问题1:未被delete的部分序列,查询少数据(值小于10)
      问题2:恢复过程中有NPE
      2022-05-20 14:16:09,213 [pool-15-IoTDB-WAL-Recover-2] WARN o.a.i.d.w.r.f.UnsealedTsFileRecoverPerformer:208 - meet error when redo wal of /data/liuzhen_test/master_0519_81b9117/datanode/./sbin/../data/data/sequence/root.test.g_99/0/0/1652977295224-2-0-0.tsfile
      org.apache.iotdb.db.exception.WriteProcessException: java.lang.NullPointerException
      at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:394)
      at org.apache.iotdb.db.wal.recover.file.TsFilePlanRedoer.redoInsert(TsFilePlanRedoer.java:128)
      at org.apache.iotdb.db.wal.recover.file.UnsealedTsFileRecoverPerformer.redoLog(UnsealedTsFileRecoverPerformer.java:191)
      at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.recoverTsFiles(WALNodeRecoverTask.java:137)
      at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.run(WALNodeRecoverTask.java:63)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.NullPointerException: null
      at org.apache.iotdb.db.utils.datastructure.AlignedTVList.arrayCopy(AlignedTVList.java:808)
      at org.apache.iotdb.db.utils.datastructure.AlignedTVList.putAlignedValues(AlignedTVList.java:736)
      at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.putAlignedValues(AlignedWritableMemChunk.java:152)
      at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.writeAlignedValues(AlignedWritableMemChunk.java:182)
      at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunkGroup.writeValues(AlignedWritableMemChunkGroup.java:55)
      at org.apache.iotdb.db.engine.memtable.AbstractMemTable.writeAlignedTablet(AbstractMemTable.java:545)
      at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:377)
      ... 9 common frames omitted

      测试流程
      1. 192.168.10.68 72C256G
      iotdb路径:/data/liuzhen_test/master_0519_81b9117/datanode
      iotdb配置(其余不改动):
      MAX_HEAP_SIZE="192G"
      MAX_DIRECT_MEMORY_SIZE="32G"
      mlog_buffer_size=10485760
      schema_engine_mode=Schema_File

      benchmark路径:/data/benchmark/weekly_shell/bm_0514_ee75a49
      bm配置见附件。

      2. 启动iotdb,运行benchmark
      耗时大概3小时。

      3. delete 序列前的数据验证
      正确
      count_ts_500dev.sh 每个设备20万序列
      select_count_ts_500dev.sh 查询序列10个点数据。
      4. 每个设备delete 51个序列
      运行del_ts.sh

      5. delete 序列后,停止iotdb前,再次验证数据的正确性
      正确
      count_ts_500dev.sh 每个设备199949序列
      select_count_ts_500dev.sh 查询序列10个点数据。

      6.停止iotdb

      7. 备份数据,日志

      8.重新启动iotdb,查看日志,有NPE

      9. iotdb恢复成功,执行
      select_count_ts_500dev.sh 部分少数据的序列(只列举部分)

      Attachments

        1. select_count_ts_500dev.sh
          60 kB
          刘珍
        2. image-2022-05-20-16-09-01-848.png
          106 kB
          刘珍
        3. get_dev_name.sh
          0.1 kB
          刘珍
        4. dev_name.txt
          17 kB
          刘珍
        5. del_ts.sh
          1 kB
          刘珍
        6. count_ts_500dev.sh
          52 kB
          刘珍
        7. config.properties
          14 kB
          刘珍

        Activity

          People

            cpaulyz yanze chen
            刘珍 刘珍
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: