Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3908

Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Cannot Reproduce
    • Fix Version/s: None
    • Component/s: None
    • Environment:

      Ubuntu

    • Severity:
      Normal

      Description

      This problem looks like 2792

      I am bootstrapping a new node into my cluster.

      There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.

      Netstats on the bootstrapping node reports it is still streaming:

      root:/var/lib/cassandra/data# nodetool -h localhost netstats
      Mode: JOINING
      Not sending any streams.
      Streaming from: /192.168.1.9
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
      Streaming from: /192.168.1.4
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
      FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
      Pool Name Active Pending Completed
      Commands n/a 0 478
      Responses n/a 0 496302

      However, running netstats on the source nodes reports they are not streaming:

      root:~# nodetool -h localhost netstats
      Mode: NORMAL
      Nothing streaming to /192.168.1.11
      Not receiving any streams.
      Pool Name Active Pending Completed
      Commands n/a 0 13291116
      Responses n/a 0 8334754

      Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:

      pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
      INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
      INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
      sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
      INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
      var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
      var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
      INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
      INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11

      There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...

      ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
      java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
      at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
      at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
      at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
      java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
      at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
      at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
      at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dccwilliams Dominic Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified