Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Cannot Reproduce
-
None
-
All
-
None
Description
How
I create an ensemble with 3 nodes(It works well), then I add the fourth node to join the party.
when executing nodetool info, get the following exception:
➜ bin ./nodetool info java.lang.NullPointerException at org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744) at org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) ➜ bin ./nodetool info WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731 RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when sending TCM_COMMIT_REQ, retrying on CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null -- StackTrace -- java.lang.NullPointerException at org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260)
server 1 cannot execute node info and cql shell, server 2 and 3 can do it. Try to query the system prefix tables, I attach stack error log for the further debugging. Cannot find a way to recover. After deleting data(losing all data), restart and everything became OK
➜ bin ./nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.2 ? 16 51.2% 6d194555-f6eb-41d0-c000-000000000002 rack1 DN 127.0.0.4 ? 16 48.8% 6d194555-f6eb-41d0-c000-000000000001 rack1
When
It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to protect its propagation anywhere
Implementation of Transactional Cluster Metadata as described in CEP-21 Hash: ae084237 code diff: public String getLocalHostId() { - UUID id = getLocalHostUUID(); - return id != null ? id.toString() : null; + return getLocalHostUUID().toString(); } public UUID getLocalHostUUID() { - UUID id = getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort()); - if (id != null) - return id; - // this condition is to prevent accessing the tables when the node is not started yet, and in particular, - // when it is not going to be started at all (e.g. when running some unit tests or client tools). - else if ((DatabaseDescriptor.isDaemonInitialized() || DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted()) - return SystemKeyspace.getLocalHostId(); - - return null; + // Metadata collector requires using local host id, and flush of IndexInfo may race with + // creation and initialization of cluster metadata service. Metadata collector does accept + // null localhost ID values, it's just that TokenMetadata was created earlier. + ClusterMetadata metadata = ClusterMetadata.currentNullable(); + if (metadata == null || metadata.directory.peerId(getBroadcastAddressAndPort()) == null) + return null; + return metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID(); }
Attachments
Attachments
Issue Links
- links to