Details
-
Bug
-
Status: Resolved
-
High
-
Resolution: Fixed
-
3.11.17, 4.0.12, 4.1.4, 5.0-alpha1, 5.0
-
None
-
Degradation
-
Normal
-
Normal
-
Adhoc Test
-
All
-
None
-
Description
UDFs fail to reload properly after a rolling restart.
Symptom:
NPE thrown when used after restart.
Steps to recreate:
- Create a cluster as per cql file
- Populate the cluster with data.cql.
- Execute SELECT city_measurements(city, measurement, 16.5) AS m FROM current
- expect min and max values for cities.
- Performing a rolling restart on one server.
- When the server is back up
- Execute SELECT city_measurements(city, measurement, 16.5) AS m FROM current
- expect: error result with NPE message.
Analysis:
During system restart the SchemaKeyspace.fetchNonSystemKeyspaces() is called, when a keyspace with a UDF is loaded the SchemaKeyspace method createUDFFromRow() is called, this in turn calls UDFunction.create() which eventually calls back to UDFunction constructor where the Schema.instance.getKeyspaceMetadata() is called with the keyspace for the UDF name as the argument. However, the keyspace for the UDF name is being constructed and is not yet in the instance so the method returns null for the KeyspaceMetadata. That null KeyspaceMetadata is then used in the udfContext.
Later when the UDF method is called, if there is a need to call a method on the keyspaceMetadata, such as udfContext.newUDTValue() where the implementation uses keyspaceMetadata.types, a null pointer is thrown.
I have verified this affects version 4.0, 4.1 and trunk. I have not verified 3.x but I suspect it is the same there.
I modified UDFunction constructor to assert that the metadata was not null and received the following stack trace
ERROR [main] 2023-08-09 11:44:46,408 CassandraDaemon.java:911 - Exception encountered during startup
java.lang.AssertionError: No metadata for temperatures.city_measurements_sfunc
at org.apache.cassandra.cql3.functions.UDFunction.<init>(UDFunction.java:240)
at org.apache.cassandra.cql3.functions.JavaBasedUDFunction.<init>(JavaBasedUDFunction.java:195)
at org.apache.cassandra.cql3.functions.UDFunction.create(UDFunction.java:276)
at org.apache.cassandra.schema.SchemaKeyspace.createUDFFromRow(SchemaKeyspace.java:1182)
at org.apache.cassandra.schema.SchemaKeyspace.fetchUDFs(SchemaKeyspace.java:1131)
at org.apache.cassandra.schema.SchemaKeyspace.fetchFunctions(SchemaKeyspace.java:1119)
at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:859)
at org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:848)
at org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:836)
at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:132)
at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:121)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:287)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
Possible solution:
Version 4.x
Create a KeyspaceMetadata.Builder class that uses accepts the types, tables and views but uses a builder for the functions.
Add a KeyspaceMetadata constructor to accept the KeyspaceMetadata.Builder so that the function builder keyspaceMetadata value can be set correctly during construction of the KeyspaceMetadata.
Modify SchemaKeyspace.fetchKeyspace(string) so that it uses the KeyspaceMetadata.Builder.
Version 5.x
Similar to 4.x except that the KeyspaceMetadata.Builder will have to have builders for Views and Tables because the functions necessary to construct those objects will not be available until the KeyspaceMetadata.Builder constructs it.