Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5525

Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • None
    • Normal

    Description

      12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', restarted and ran upgradesstables and cleanup.

      Tried to join 2 additional nodes into the ring.

      However, 1 of the new nodes ran out of disk space. This started causing 'no host id' alerts in the live cluster when attempting to store hints for that node.

      ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main]
      java.lang.AssertionError: Missing host ID 
      

      The other node I killed to stop it from continuing to join. Since the live cluster was now in some sort of broken state dropping mutation messages on 3 nodes. This was fixed by restarting them, however 1 node never stopped, so had to decomm it (leaving the original cluster at 11 nodes.)

      Ring pre-join:

      Load       Tokens  Owns (effective)  Host ID                             
      147.55 GB  256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
      124.99 GB  256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
      136.63 GB  256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
      141.78 GB  253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
      137.74 GB  256     16.7%             6d726cbf-147d-426e-a735-e14928c95e45
      135.9 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
      165.96 GB  256     16.7%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
      135.41 GB  256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
      143.38 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
      178.05 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
      194.92 GB  256     25.0%             361d7e31-b155-4ce1-8890-451b3ddf46cf
      150.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
      

      Ring after decomm bad node:

      Load       Tokens  Owns (effective)  Host ID
      80.95 GB   256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
      87.15 GB   256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
      98.16 GB   256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
      142.6 GB   253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
      77.64 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
      194.31 GB  256     25.0%             6d726cbf-147d-426e-a735-e14928c95e45
      221.94 GB  256     33.3%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
      87.61 GB   256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
      101.02 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
      172.44 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
      108.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
      

      Attachments

        1. Screen Shot 2013-04-25 at 12.35.24 PM.png
          37 kB
          John Watson
        2. cass-ring.txt
          325 kB
          John Watson

        Activity

          People

            Unassigned Unassigned
            dctrwatson John Watson
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: