Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In some special circumstances,DataNode register failed.
the reason is : when DN fistst register , it will fetch system configuration from ConfigNode, if ConfigNode has some error or leader is not ready. the fetched configuration will be null, so PNE will abort DN register process, and the
'SYSTEM_PROPERTIES.deleteOnExit();' skiped.
so when restart the DN again , it restart failed beacause nodeId is -1
在一些极端特殊的情况下,DN会注册失败
原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑
当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。
DN log info:
2023-09-20 21:45:29,041 | INFO | [main] | Successfully update ConfigNode: [TEndPoint(ip:x.x.x.x, port:xxxx), TEndPoint(ip:x.x.x.x, port:xxxx), TEndPoint(ip:x.x.x.x, port:xxxx]. | org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
2023-09-20 21:45:29,042 | INFO | [main] | Pulling system configurations from the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode (DataNode.java:238)
2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command | org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69)
java.lang.NullPointerException: null
at org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930)
at org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275)
at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164)
at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:02,198 | INFO | [main] | Start to read config file file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164)
2023-09-20 21:46:02,221 | INFO | [main] | Start to read config file file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181)
2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForRead = 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1583)
2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForWrite = 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1584)
2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForSchema = 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1585)
2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForConsensus = 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1586)
2023-09-20 21:46:02,248 | INFO | [main] | allocateMemoryForSchemaRegion = 107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1710)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForSchemaCache = 64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1713)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForPartitionCache = 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1717)
2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForLastCache = 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1720)
2023-09-20 21:46:02,257 | INFO | [main] | try loading iotdb-common.properties from /opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties | org.apache.iotdb.tsfile.common.conf.TSFileDescriptor (TSFileDescriptor.java:135)
2023-09-20 21:46:02,388 | INFO | [main] | IoTDB enable memory control: true | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383)
2023-09-20 21:46:02,492 | INFO | [main] | IoTDB-DataNode environment variables:
IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb;
IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc;
IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode (DataNode.java:150)
2023-09-20 21:46:02,777 | INFO | [main] | new single scheduled thread pool: Stateful-Trigger-Information-Updater | org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory (IoTDBThreadPoolFactory.java:192)
2023-09-20 21:46:02,781 | INFO | [main] | Running mode -s | org.apache.iotdb.db.service.DataNodeServerCommandLine (DataNodeServerCommandLine.java:96)
2023-09-20 21:46:02,790 | INFO | [main] | Starting IoTDB 1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) | org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174)
2023-09-20 21:46:02,815 | WARN | [main] | Failed to copy file from /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to /srv/BigData/data1/iotdb/iotdbserver/data/system.properties | org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421)
2023-09-20 21:46:02,822 | INFO | [main] | Start JMX remotely: JMX is enabled to receive remote connection on port 22258 | org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80)
2023-09-20 21:46:02,823 | INFO | [main] | JDK version is 8. | org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49)
2023-09-20 21:46:02,832 | INFO | [main] | Successfully update ConfigNode: [TEndPoint(ip:x.x.x.x, port:xxxx, TEndPoint(ip:x.x.x.x, port:xxxx, TEndPoint(ip:x.x.x.x, port:xxxx]. | org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96)
2023-09-20 21:46:02,835 | INFO | [main] | Pulling system configurations from the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode (DataNode.java:238)
2023-09-20 21:46:03,514 | WARN | [main] | Failed to connect to ConfigNode TEndPoint(ip:x.x.x.x, port:xxxx from DataNode TEndPoint(ip:x.x.x.x, port:xxxx, because the current node is not leader, try next node | org.apache.iotdb.db.client.ConfigNodeClient (ConfigNodeClient.java:308)
2023-09-20 21:46:04,760 | INFO | [main] | Create system.properties.tmp /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. | org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537)
2023-09-20 21:46:04,764 | INFO | [main] | Successfully pull system configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode (DataNode.java:306)
2023-09-20 21:46:04,764 | INFO | [main] | Sending restart request to ConfigNode-leader... | org.apache.iotdb.db.service.DataNode (DataNode.java:405)
2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server | org.apache.iotdb.db.service.DataNode (DataNode.java:189)
org.apache.iotdb.commons.exception.StartupException: Reject DataNode restart. Because the nodeId of the current DataNode is -1. Possible solutions are as follows:
1. Delete "data" dir and retry.
at org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452)
at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171)
at org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
at org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:04,808 | INFO | [main] | Deactivating IoTDB DataNode... | org.apache.iotdb.db.service.DataNode (DataNode.java:864)
Attachments
Issue Links
- links to