Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18644

ERROR: ArrayIndexOutOfBoundsException: -1 in MigrationStage thread when shutdown Cassandra-3.11.16 node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 3.11.x
    • Cluster/Schema
    • None
    • Correctness - Transient Incorrect Response
    • Low
    • Normal
    • User Report
    • All
    • None

    Description

      Symptoms

      I was migrating from Cassandra-3.11.15 to Cassandra-4.1.2(Single node), and I encountered the following error message when I drained and shutdown the 3.11.15 node.

      (This might also happen when shutting down Cassandsra-3.11.16)

      INFO  [MigrationStage:1] 2023-06-30 02:41:08,822 ColumnFamilyStore.java:432 - Initializing uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi.dbqxbgocor
      INFO  [MutationStage-4] 2023-06-30 02:41:11,041 ColumnFamilyStore.java:2250 - Truncating uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi
      INFO  [MutationStage-4] 2023-06-30 02:41:13,547 ColumnFamilyStore.java:2304 - Truncate of uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi is complete
      INFO  [Native-Transport-Requests-5] 2023-06-30 02:41:13,552 MigrationManager.java:362 - Drop Keyspace 'uuidff0deb8dbf894fb1b3f59c0eafb94467'
      INFO  [IndexSummaryManager:1] 2023-06-30 02:41:36,223 IndexSummaryRedistribution.java:78 - Redistributing index summaries
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,637 StorageService.java:1660 - DRAINING: starting drain process
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,641 HintsService.java:210 - Paused hints dispatch
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 Server.java:179 - Stop listening for CQL clients
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 Gossiper.java:1720 - Announcing shutdown
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,648 StorageService.java:2585 - Node /192.168.215.2 state jump to shutdown
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:48,652 MessagingService.java:985 - Waiting for messaging service to quiesce
      INFO  [ACCEPT-/192.168.215.2] 2023-06-30 02:41:48,655 MessagingService.java:1346 - MessagingService has terminated the accept() thread
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:52,618 HintsService.java:210 - Paused hints dispatch
      INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:56,835 StorageService.java:1660 - DRAINED
      ERROR [MigrationStage:1] 2023-06-30 02:41:56,838 CassandraDaemon.java:244 - Exception in thread Thread[MigrationStage:1,5,main]
      java.lang.ArrayIndexOutOfBoundsException: -1
      	at java.util.ArrayList.elementData(ArrayList.java:424)
      	at java.util.ArrayList.get(ArrayList.java:437)
      	at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
      	at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
      	at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
      	at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
      	at java.lang.Thread.run(Thread.java:750)
      ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,839 QueryMessage.java:129 - Unexpected error during query
      java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1
      	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
      	at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
      	at org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
      	at org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
      	at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
      	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
      	at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
      	at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
      	at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
      	at org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
      	at org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
      	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
      	at java.lang.Thread.run(Thread.java:750)
      Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1
      	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
      	... 14 common frames omitted
      Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
      	at java.util.ArrayList.elementData(ArrayList.java:424)
      	at java.util.ArrayList.get(ArrayList.java:437)
      	at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
      	at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
      	at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
      	at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
      	... 1 common frames omitted
      ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,841 ErrorMessage.java:384 - Unexpected exception during request
      java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1
      	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
      	at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
      	at org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
      	at org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
      	at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
      	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
      	at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
      	at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
      	at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
      	at org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
      	at org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
      	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
      	at java.lang.Thread.run(Thread.java:750)
      Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1
      	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
      	... 14 common frames omitted
      Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
      	at java.util.ArrayList.elementData(ArrayList.java:424)
      	at java.util.ArrayList.get(ArrayList.java:437)
      	at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
      	at org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
      	at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
      	at org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
      	at org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
      	... 1 common frames omitted
      INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-06-30 02:41:59,326 CassandraDaemon.java:576 - Cassandra shutting down... 

      The java.lang.ArrayIndexOutOfBoundsException is thrown in MigrationStage thread.

      Root Cause

      From my preliminary analysis, this bug is related to thread interleaving between

      1. Thread trying to shutdown the Cassandra instance.
      2. MigrationStage thread trying to recycle segments.

      if user invokes bin/nodetool stopdaemon, the activeSegments in AbstractCommitLogSegmentManager will be empty.

      // AbstractCommitLogSegmentManager.java
      /** Active segments, containing unflushed data. The tail of this queue is the one we allocate writes to */private final ConcurrentLinkedQueue<CommitLogSegment> activeSegments = new ConcurrentLinkedQueue<>(); 

      At this time, if the migration thread tries to invoke the forceRecycleAll function, it throws ArrayIndexOutOfBoundsException if there's no active segment.

      // AbstractCommitLogSegmentManager.java
      void forceRecycleAll(Iterable<UUID> droppedCfs)
      {
          List<CommitLogSegment> segmentsToRecycle = new ArrayList<>(activeSegments);
          CommitLogSegment last = segmentsToRecycle.get(segmentsToRecycle.size() - 1);
          advanceAllocatingFrom(last);    // wait for the commit log modifications
          last.waitForModifications(); 

      It seems that this ERROR won't impact the data integrity. But I am not sure whether it will cause some problems in the segment recycling process.

      I think this should be handled considering its ERROR level in system.log. It might confuse users whether their data is still in an integrity state and whether they can perform the migration from 3.x to 4.x.

      I have attached my full system.log and cassandra.yaml file.

       

      Attachments

        1. system.log
          1.60 MB
          Ke Han
        2. cassandra.yaml
          4 kB
          Ke Han

        Activity

          People

            Unassigned Unassigned
            kehan5800 Ke Han
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: