Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14369

infinite loop when decommission a node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 3.11.x
    • None
    • None
    • Normal

    Description

      I have 6 nodes (N1 to N6), N2 to N6 are new hardwares with two SSDs on each, N1 is an old box with spinning disks, and I am trying to decommission N1. Then I see two nodes are trying to receive streaming from N1 infinitely. The log rotates so quickly that I can only see this:

       

      INFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,560 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,561 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,561 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidatesINFO  [CompactionExecutor:19401] 2018-04-07 13:07:56,561 LeveledManifest.java:474 - Adding high-level (L3) BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') to candidates

      nodetool tpstats shows some of the compactions are pending:

       

      Pool Name                         Active   Pending      Completed   Blocked  All time blockedReadStage                              0         0        1366419         0                 0MiscStage                              0         0              0         0                 0CompactionExecutor                     9         9          77739         0                 0MutationStage                          0         0        7504702         0                 0MemtableReclaimMemory                  0         0            327         0                 0PendingRangeCalculator                 0         0             20         0                 0GossipStage                            0         0         486365         0                 0SecondaryIndexManagement               0         0              0         0                 0

       

      This is from the jstack output:

      "CompactionExecutor:16666" #26533 daemon prio=1 os_prio=4 tid=0x00007f971812f170 nid=0x6581 waiting for monitor entry [0x00007f9990f4a000]   java.lang.Thread.State: BLOCKED (on object monitor)    at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:310)    - waiting to lock <0x00000001c14acab0> (a org.apache.cassandra.db.compaction.LeveledManifest)    at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:119)    at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119)    at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:262)    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)    at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/2123444693.run(Unknown Source)    at java.lang.Thread.run(Thread.java:748)    Locked ownable synchronizers:    - <0x00000002b48f5ff0> (a java.util.concurrent.ThreadPoolExecutor$Worker) "CompactionExecutor:16632" #26499 daemon prio=1 os_prio=4 tid=0x00007f970c16c420 nid=0x6553 runnable [0x00007f9982714000]   java.lang.Thread.State: RUNNABLE    at org.apache.cassandra.db.compaction.LeveledManifest.getLevelSize(LeveledManifest.java:489)    - locked <0x00000001c14acab0> (a org.apache.cassandra.db.compaction.LeveledManifest)    at org.apache.cassandra.db.compaction.LeveledManifest.getOverlappingStarvedSSTables(LeveledManifest.java:448)    at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:370)    - locked <0x00000001c14acab0> (a org.apache.cassandra.db.compaction.LeveledManifest)    at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:119)    at org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119)    at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:262)    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)    at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/2123444693.run(Unknown Source)    at java.lang.Thread.run(Thread.java:748)    Locked ownable synchronizers:

       

      Now the problem is, this is my online production environment, how can I fix it online?

      Attachments

        Activity

          People

            Unassigned Unassigned
            danielywoo Daniel Woo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: