Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Correctness - Transient Incorrect Response
-
Low
-
Normal
-
User Report
-
All
-
None
Description
Symptoms
I was migrating from Cassandra-3.11.15 to Cassandra-4.1.2(Single node), and I encountered the following error message when I drained and shutdown the 3.11.15 node.
(This might also happen when shutting down Cassandsra-3.11.16)
INFO [MigrationStage:1] 2023-06-30 02:41:08,822 ColumnFamilyStore.java:432 - Initializing uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi.dbqxbgocor INFO [MutationStage-4] 2023-06-30 02:41:11,041 ColumnFamilyStore.java:2250 - Truncating uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi INFO [MutationStage-4] 2023-06-30 02:41:13,547 ColumnFamilyStore.java:2304 - Truncate of uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi is complete INFO [Native-Transport-Requests-5] 2023-06-30 02:41:13,552 MigrationManager.java:362 - Drop Keyspace 'uuidff0deb8dbf894fb1b3f59c0eafb94467' INFO [IndexSummaryManager:1] 2023-06-30 02:41:36,223 IndexSummaryRedistribution.java:78 - Redistributing index summaries INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,637 StorageService.java:1660 - DRAINING: starting drain process INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,641 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 Server.java:179 - Stop listening for CQL clients INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 Gossiper.java:1720 - Announcing shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,648 StorageService.java:2585 - Node /192.168.215.2 state jump to shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:48,652 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/192.168.215.2] 2023-06-30 02:41:48,655 MessagingService.java:1346 - MessagingService has terminated the accept() thread INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:52,618 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:56,835 StorageService.java:1660 - DRAINED ERROR [MigrationStage:1] 2023-06-30 02:41:56,838 CassandraDaemon.java:244 - Exception in thread Thread[MigrationStage:1,5,main] java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.elementData(ArrayList.java:424) at java.util.ArrayList.get(ArrayList.java:437) at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289) at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227) at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390) at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) at java.lang.Thread.run(Thread.java:750) ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,839 QueryMessage.java:129 - Unexpected error during query java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442) at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429) at org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363) at org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63) at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116) at org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688) at org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) at java.lang.Thread.run(Thread.java:750) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438) ... 14 common frames omitted Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.elementData(ArrayList.java:424) at java.util.ArrayList.get(ArrayList.java:437) at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289) at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227) at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390) at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) ... 1 common frames omitted ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,841 ErrorMessage.java:384 - Unexpected exception during request java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442) at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429) at org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363) at org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63) at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116) at org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688) at org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) at java.lang.Thread.run(Thread.java:750) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438) ... 14 common frames omitted Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.elementData(ArrayList.java:424) at java.util.ArrayList.get(ArrayList.java:437) at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289) at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227) at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413) at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390) at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) ... 1 common frames omitted INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-30 02:41:59,326 CassandraDaemon.java:576 - Cassandra shutting down...
The java.lang.ArrayIndexOutOfBoundsException is thrown in MigrationStage thread.
Root Cause
From my preliminary analysis, this bug is related to thread interleaving between
- Thread trying to shutdown the Cassandra instance.
- MigrationStage thread trying to recycle segments.
if user invokes bin/nodetool stopdaemon, the activeSegments in AbstractCommitLogSegmentManager will be empty.
// AbstractCommitLogSegmentManager.java /** Active segments, containing unflushed data. The tail of this queue is the one we allocate writes to */private final ConcurrentLinkedQueue<CommitLogSegment> activeSegments = new ConcurrentLinkedQueue<>();
At this time, if the migration thread tries to invoke the forceRecycleAll function, it throws ArrayIndexOutOfBoundsException if there's no active segment.
// AbstractCommitLogSegmentManager.java void forceRecycleAll(Iterable<UUID> droppedCfs) { List<CommitLogSegment> segmentsToRecycle = new ArrayList<>(activeSegments); CommitLogSegment last = segmentsToRecycle.get(segmentsToRecycle.size() - 1); advanceAllocatingFrom(last); // wait for the commit log modifications last.waitForModifications();
It seems that this ERROR won't impact the data integrity. But I am not sure whether it will cause some problems in the segment recycling process.
I think this should be handled considering its ERROR level in system.log. It might confuse users whether their data is still in an integrity state and whether they can perform the migration from 3.x to 4.x.
I have attached my full system.log and cassandra.yaml file.