Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
As IGNITE-17048 demonstrates, our tests sometimes fail with message like the following:
java.lang.AssertionError: Raft groups are still running
The leftover Raft groups always relate to table partitions (and NOT metastorage/cmg).
It looks like this can happen due to TableManager.stop() being called before some table creation is completed (on some Ignite node). As a result, TableManager.stop() does not see this table, so the table does not get stopped, and its Raft groups are left forever.
Adding a delay to table creation completion
public void onSqlSchemaReady(long causalityToken) {
if (Math.random() < 0.33) {
try
catch (InterruptedException e)
{ // ignore }}
LOG.info("SCHEMA READY FOR " + causalityToken);
tablesByIdVv.complete(causalityToken);
}
makes the failure manifest itself easily.
The reproducer is in https://github.com/gridgain/apache-ignite-3/tree/ignite-17286-repr
To run the reproducer, just run ItComputeTest.executesColocatedByClassNameWithTupleKey()
It usually takes less than 10 iterations to bump into the assertion.
UPD:
As a result, a set of busylock were added to the table creation flow, in places like SqlSchemaManagerImpl , SchemaManager and others. Also, added logic of stopping resources in case of stopping node in the middle of the table creation flow
Attachments
Attachments
Issue Links
- causes
-
IGNITE-17048 Some failing tests make other tests fail too
- Open
- links to