Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-1221

loadbalance operation never completes on a 3 node cluster

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 0.6.4
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal

      Description

      Arya Goudarzi reports:

      Please confirm if this is an issue and should be reported or I am doing something wrong. I could not find anything relevant on JIRA:

      Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way:

      • Added one node;
      • Loaded default schema with RF 1 from YAML using JMX;
      • Loaded 2M keys using py_stress;
      • Bootstrapped a second node;
      • Cleaned up the first node;
      • Bootstrapped a third node;
      • Cleaned up the second node;

      I got the following ring:

      Address Status Load Range Ring
      154293670372423273273390365393543806425
      10.50.26.132 Up 518.63 MB 69164917636305877859094619660693892452 |<--|
      10.50.26.134 Up 234.8 MB 111685517405103688771527967027648896391 | |
      10.50.26.133 Up 235.26 MB 154293670372423273273390365393543806425 |-->|

      Now I ran:

      nodetool --host 10.50.26.132 loadbalance

      It's been going for a while. I checked the streams

      nodetool --host 10.50.26.134 streams
      Mode: Normal
      Not sending any streams.
      Streaming from: /10.50.26.132
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), (22206096,27271682)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), (15180462,18656982)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), (353139829,433883659)]
      Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), (366336059,450095320)]

      nodetool --host 10.50.26.132 streams
      Mode: Leaving: streaming data to other nodes
      Streaming to: /10.50.26.134
      /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]
      Not receiving any streams.

      These have been going for the past 2 hours.

      I see in the logs of the node with 134 IP address and I saw this:

      INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not change my token ownership to /10.50.26.132

      So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming for the past 2 hours and the size of ring has not changed. The log in the first node says it has started streaming for the past hours:

      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452]
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables for Keyspace1...
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]] 1 sstables.
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream initiate message to /10.50.26.134 ...
      INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for transfer to /10.50.26.134 to complete
      INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', position=720)
      INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing flush of Memtable(LocationInfo)@1637794189
      INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing Memtable(LocationInfo)@1637794189
      INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db
      INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line 374) Discarding 1000

      Nothing more after this line.

      Am I doing something wrong?

        Attachments

        1. 0.6-conviction-fix.diff
          2 kB
          Gary Dusbabek
        2. 0001-Gossiper-and-FD-never-called-MS.convict-to-shut-down.patch
          3 kB
          Gary Dusbabek
        3. system1.log
          27 kB
          Arya Goudarzi
        4. system2.log
          16 kB
          Arya Goudarzi
        5. system3.log
          10 kB
          Arya Goudarzi

          Activity

            People

            • Assignee:
              gdusbabek Gary Dusbabek Assign to me
              Reporter:
              gdusbabek Gary Dusbabek
              Authors:
              Gary Dusbabek

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment