Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 rc1
    • Component/s: Core
    • Labels:

      Description

      We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id

      We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
      UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
      UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
      DN  10.6.27.98        ?          256     126.3%            null                                  27
      

      We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        3h 36m 1 Brandon Williams 05/Dec/12 23:59
        Patch Available Patch Available Resolved Resolved
        14h 26m 1 Brandon Williams 06/Dec/12 14:26
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12753558 ] reopen-resolved, no closed status, patch-avail, testing [ 12758808 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12737045 ] patch-available, re-open possible [ 12753558 ]
        Brandon Williams made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Brandon Williams added a comment -

        Committed.

        Show
        Brandon Williams added a comment - Committed.
        Hide
        T Jake Luciani added a comment -

        +1

        Show
        T Jake Luciani added a comment - +1
        Hide
        Brandon Williams added a comment -

        I tested it and it works, but yeah, you'll have to wipe out the system table, or I think starting once with -Dcassandra.load_ring=false will work.

        Show
        Brandon Williams added a comment - I tested it and it works, but yeah, you'll have to wipe out the system table, or I think starting once with -Dcassandra.load_ring=false will work.
        Hide
        T Jake Luciani added a comment -

        Looks straight fwd. I can test but this looks like I'll need to recreate the cluster/system table?

        Show
        T Jake Luciani added a comment - Looks straight fwd. I can test but this looks like I'll need to recreate the cluster/system table?
        Brandon Williams made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Reviewer tjake
        Brandon Williams made changes -
        Hide
        Brandon Williams added a comment -

        First patch renames 'ring_id' in peers to 'host_id' because that's what we call it everywhere else including nodetool, so we should be consistent.

        Second patch updates the hostid in TMD when it updates the tokens.

        Show
        Brandon Williams added a comment - First patch renames 'ring_id' in peers to 'host_id' because that's what we call it everywhere else including nodetool, so we should be consistent. Second patch updates the hostid in TMD when it updates the tokens.
        Brandon Williams made changes -
        Fix Version/s 1.2.0 rc1 [ 12323481 ]
        T Jake Luciani made changes -
        Description We took down one of our nodes for maintenance and during that time it seems the other nodes haves lost the downed nodes node id

        We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

        {code}
        Status=Up/Down
        |/ State=Normal/Leaving/Joining/Moving
        -- Address Load Tokens Owns (effective) Host ID Rack
        UN 10.6.27.96 129.37 GB 256 140.0% 59f3df94-e551-45ce-a3b0-51462f3ea868 27
        UN 10.6.27.97 125.24 GB 256 133.7% f5bb146c-db51-475c-a44f-9facf2f1ad6e 27
        DN 10.6.27.98 ? 256 126.3% null 27
        {code}

        We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.
        We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id

        We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

        {code}
        Status=Up/Down
        |/ State=Normal/Leaving/Joining/Moving
        -- Address Load Tokens Owns (effective) Host ID Rack
        UN 10.6.27.96 129.37 GB 256 140.0% 59f3df94-e551-45ce-a3b0-51462f3ea868 27
        UN 10.6.27.97 125.24 GB 256 133.7% f5bb146c-db51-475c-a44f-9facf2f1ad6e 27
        DN 10.6.27.98 ? 256 126.3% null 27
        {code}

        We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.
        T Jake Luciani made changes -
        Summary Downed node looses its host-id Downed node loses its host-id
        Brandon Williams made changes -
        Field Original Value New Value
        Summary Downed node looses it's host-id Downed node looses its host-id
        T Jake Luciani created issue -

          People

          • Assignee:
            Brandon Williams
            Reporter:
            T Jake Luciani
            Reviewer:
            T Jake Luciani
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development