Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0-SNAPSHOT
-
2022-6-Cluster
Description
master_0519_81b9117
问题描述(元数据持久化 + WAL恢复):
100sg,500dev,20万序列/dev,共1亿对齐序列,每个序列写入10个点。
每个device,delete 51个序列,重启iotdb,wal恢复有2个问题:
问题1:未被delete的部分序列,查询少数据(值小于10)
问题2:恢复过程中有NPE
2022-05-20 14:16:09,213 [pool-15-IoTDB-WAL-Recover-2] WARN o.a.i.d.w.r.f.UnsealedTsFileRecoverPerformer:208 - meet error when redo wal of /data/liuzhen_test/master_0519_81b9117/datanode/./sbin/../data/data/sequence/root.test.g_99/0/0/1652977295224-2-0-0.tsfile
org.apache.iotdb.db.exception.WriteProcessException: java.lang.NullPointerException
at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:394)
at org.apache.iotdb.db.wal.recover.file.TsFilePlanRedoer.redoInsert(TsFilePlanRedoer.java:128)
at org.apache.iotdb.db.wal.recover.file.UnsealedTsFileRecoverPerformer.redoLog(UnsealedTsFileRecoverPerformer.java:191)
at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.recoverTsFiles(WALNodeRecoverTask.java:137)
at org.apache.iotdb.db.wal.recover.WALNodeRecoverTask.run(WALNodeRecoverTask.java:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
at org.apache.iotdb.db.utils.datastructure.AlignedTVList.arrayCopy(AlignedTVList.java:808)
at org.apache.iotdb.db.utils.datastructure.AlignedTVList.putAlignedValues(AlignedTVList.java:736)
at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.putAlignedValues(AlignedWritableMemChunk.java:152)
at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunk.writeAlignedValues(AlignedWritableMemChunk.java:182)
at org.apache.iotdb.db.engine.memtable.AlignedWritableMemChunkGroup.writeValues(AlignedWritableMemChunkGroup.java:55)
at org.apache.iotdb.db.engine.memtable.AbstractMemTable.writeAlignedTablet(AbstractMemTable.java:545)
at org.apache.iotdb.db.engine.memtable.AbstractMemTable.insertAlignedTablet(AbstractMemTable.java:377)
... 9 common frames omitted
测试流程
1. 192.168.10.68 72C256G
iotdb路径:/data/liuzhen_test/master_0519_81b9117/datanode
iotdb配置(其余不改动):
MAX_HEAP_SIZE="192G"
MAX_DIRECT_MEMORY_SIZE="32G"
mlog_buffer_size=10485760
schema_engine_mode=Schema_File
benchmark路径:/data/benchmark/weekly_shell/bm_0514_ee75a49
bm配置见附件。
2. 启动iotdb,运行benchmark
耗时大概3小时。
3. delete 序列前的数据验证
正确
count_ts_500dev.sh 每个设备20万序列
select_count_ts_500dev.sh 查询序列10个点数据。
4. 每个设备delete 51个序列
运行del_ts.sh
5. delete 序列后,停止iotdb前,再次验证数据的正确性
正确
count_ts_500dev.sh 每个设备199949序列
select_count_ts_500dev.sh 查询序列10个点数据。
6.停止iotdb
7. 备份数据,日志
8.重新启动iotdb,查看日志,有NPE
9. iotdb恢复成功,执行
select_count_ts_500dev.sh 部分少数据的序列(只列举部分)