Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-1221

loadbalance operation never completes on a 3 node cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.6.4
    • None
    • None
    • Normal

    Description

      Arya Goudarzi reports:

      Please confirm if this is an issue and should be reported or I am doing something wrong. I could not find anything relevant on JIRA:

      Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way:

      • Added one node;
      • Loaded default schema with RF 1 from YAML using JMX;
      • Loaded 2M keys using py_stress;
      • Bootstrapped a second node;
      • Cleaned up the first node;
      • Bootstrapped a third node;
      • Cleaned up the second node;

      I got the following ring:

      Address Status Load Range Ring
      154293670372423273273390365393543806425
      10.50.26.132 Up 518.63 MB 69164917636305877859094619660693892452 |<--|
      10.50.26.134 Up 234.8 MB 111685517405103688771527967027648896391 | |
      10.50.26.133 Up 235.26 MB 154293670372423273273390365393543806425 |-->|

      Now I ran:

      nodetool --host 10.50.26.132 loadbalance

      It's been going for a while. I checked the streams

      nodetool --host 10.50.26.134 streams
      Mode: Normal
      Not sending any streams.
      Streaming from: /10.50.26.132
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), (22206096,27271682)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), (15180462,18656982)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), (353139829,433883659)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), (366336059,450095320)]

      nodetool --host 10.50.26.132 streams
      Mode: Leaving: streaming data to other nodes
      Streaming to: /10.50.26.134
      /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]
      Not receiving any streams.

      These have been going for the past 2 hours.

      I see in the logs of the node with 134 IP address and I saw this:

      INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not change my token ownership to /10.50.26.132

      So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming for the past 2 hours and the size of ring has not changed. The log in the first node says it has started streaming for the past hours:

      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452]
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables for Keyspace1...
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]] 1 sstables.
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream initiate message to /10.50.26.134 ...
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for transfer to /10.50.26.134 to complete
      INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', position=720)
      INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing flush of Memtable(LocationInfo)@1637794189
      INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing Memtable(LocationInfo)@1637794189
      INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db
      INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line 374) Discarding 1000

      Nothing more after this line.

      Am I doing something wrong?

      Attachments

        1. 0.6-conviction-fix.diff
          2 kB
          Gary Dusbabek
        2. 0001-Gossiper-and-FD-never-called-MS.convict-to-shut-down.patch
          3 kB
          Gary Dusbabek
        3. system1.log
          27 kB
          Arya Goudarzi
        4. system2.log
          16 kB
          Arya Goudarzi
        5. system3.log
          10 kB
          Arya Goudarzi

        Activity

          People

            gdusbabek Gary Dusbabek
            gdusbabek Gary Dusbabek
            Gary Dusbabek
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: