Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5179

Hinted handoff sends over 1000 rows for one column change

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.4
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows 7
      Java 1.6u38

      Description

      We have a small test environment with two datacenters (DC1 and DC2) running on Windows 7 laptops.
      Both datacenters have one node. We use network topology strategy to replicate all data to both datacenters.

      We started with empty db.
      1. Created a keyspace with strategy options [DC1:1, DC2:1]
      2. Added one row to a column family with CLI to DC1. Change was replicated to DC2.
      3. Disconnected network cable from DC2.
      4. Gossiper noticed, that other DC is dead.
      5. Added another row to DC1.
      6. Reconnected cable on DC2.
      7. DC1 started hinted handoff for DC2.
      8. Hinted handoff is finished with message: "Finished hinted handoff of 1969 rows to endpoint <DC2 ip>"

      We repeated test with same results on Linux cluster with Cassandra 1.2.0.

      On Cassandra 1.1.5 Linux cluster, only one row was sent to endpoint. "Finished hinted handoff of 1 rows to endpoint <DC2 ip>"

      1. cassandra_receiver.log
        431 kB
        Antti Koivisto
      2. cassandra_sender.log
        410 kB
        Antti Koivisto

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          Can you give us a debug log?

          Show
          jbellis Jonathan Ellis added a comment - Can you give us a debug log?
          Hide
          cscetbon Cyril Scetbon added a comment -

          It's written that it's fixed in 1.2.2
          Do you know if a bugfix corrected it ?

          Show
          cscetbon Cyril Scetbon added a comment - It's written that it's fixed in 1.2.2 Do you know if a bugfix corrected it ?
          Hide
          jbellis Jonathan Ellis added a comment -

          Fix-for is target fix version. The issue is not fixed until resolved.

          Show
          jbellis Jonathan Ellis added a comment - Fix-for is target fix version. The issue is not fixed until resolved.
          Hide
          cscetbon Cyril Scetbon added a comment -

          Ok thanks for this information …

          Show
          cscetbon Cyril Scetbon added a comment - Ok thanks for this information …
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          4881221363f984ab6610756cab38e1a016b79e15 (CASSANDRA-4761) broke it. Current pagination logic doesn't deal well with the fact that deleteHint is now being called from a response handler callback and can go through the same hint again and again and again until it's finally replaced with a tombstone. This becomes really visible in multi-dc setups, where it can take a while to complete the write, but I was able to trigger it with ccm as well (the same hint got sent up to 3 times).

          Should we reopen CASSANDRA-4761 or deal with it here?

          Show
          iamaleksey Aleksey Yeschenko added a comment - 4881221363f984ab6610756cab38e1a016b79e15 ( CASSANDRA-4761 ) broke it. Current pagination logic doesn't deal well with the fact that deleteHint is now being called from a response handler callback and can go through the same hint again and again and again until it's finally replaced with a tombstone. This becomes really visible in multi-dc setups, where it can take a while to complete the write, but I was able to trigger it with ccm as well (the same hint got sent up to 3 times). Should we reopen CASSANDRA-4761 or deal with it here?
          Hide
          jbellis Jonathan Ellis added a comment -

          Let's fix it here.

          Show
          jbellis Jonathan Ellis added a comment - Let's fix it here.
          Show
          iamaleksey Aleksey Yeschenko added a comment - k. This is the problematic branch btw: https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManager.java#L336
          Show
          iamaleksey Aleksey Yeschenko added a comment - https://github.com/iamaleksey/cassandra/compare/5179
          Hide
          mkjellman Michael Kjellman added a comment -

          Jonathan Ellis patch looks good to me

          Show
          mkjellman Michael Kjellman added a comment - Jonathan Ellis patch looks good to me
          Hide
          jbellis Jonathan Ellis added a comment -

          If we time out we should probably cease further delivery attempts, to avoid hammering a node that is behind.

          Show
          jbellis Jonathan Ellis added a comment - If we time out we should probably cease further delivery attempts, to avoid hammering a node that is behind.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          If we time out we should probably cease further delivery attempts, to avoid hammering a node that is behind.

          Probably. But that's not what causing this particular issue - the patch only fixes that faulty pagination logic.

          I don't see an easy way to stop it on a timeout as we did in 1.1 now (yet), but that's a problem for another ticket anyway.

          Show
          iamaleksey Aleksey Yeschenko added a comment - If we time out we should probably cease further delivery attempts, to avoid hammering a node that is behind. Probably. But that's not what causing this particular issue - the patch only fixes that faulty pagination logic. I don't see an easy way to stop it on a timeout as we did in 1.1 now (yet), but that's a problem for another ticket anyway.
          Hide
          jbellis Jonathan Ellis added a comment -

          We can't put that code in the empty catch block here?

          Show
          jbellis Jonathan Ellis added a comment - We can't put that code in the empty catch block here?
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          It'll be of no use - all the requests have been sent at that point. It's waiting for responses before forcing compaction.

          Show
          iamaleksey Aleksey Yeschenko added a comment - It'll be of no use - all the requests have been sent at that point. It's waiting for responses before forcing compaction.
          Hide
          jbellis Jonathan Ellis added a comment -

          What if we made it wait per page, instead of all at once?

          Seems like that would be a good way to put a bound on how many callbacks we need to keep around too.

          Show
          jbellis Jonathan Ellis added a comment - What if we made it wait per page, instead of all at once? Seems like that would be a good way to put a bound on how many callbacks we need to keep around too.
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          What if we made it wait per page, instead of all at once?

          This goes against CASSANDRA-4761 somewhat, but I think it's a good compromise between sending hints one at a time and pouring everything out. Updated https://github.com/iamaleksey/cassandra/compare/5179

          Show
          iamaleksey Aleksey Yeschenko added a comment - What if we made it wait per page, instead of all at once? This goes against CASSANDRA-4761 somewhat, but I think it's a good compromise between sending hints one at a time and pouring everything out. Updated https://github.com/iamaleksey/cassandra/compare/5179
          Hide
          jbellis Jonathan Ellis added a comment -

          You could use break-to-label instead of a bool in the get() loop, but shouldn't it just be a return instead of break?

          Show
          jbellis Jonathan Ellis added a comment - You could use break-to-label instead of a bool in the get() loop, but shouldn't it just be a return instead of break?
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          You could use break-to-label instead of a bool in the get() loop, but shouldn't it just be a return instead of break?

          Yes I could. But some requests after the timed-out one could actually have succeeded - with the ratelimiter delay and all, and since we sent them, we might as well wait for the replies before triggering compaction (hints are deleted in that callback). And with return instead of break 1) compaction wouldn't be triggered, event if it's the last page of many and 2) "Finished hinted handoff .." message wouldn't be logged.

          Show
          iamaleksey Aleksey Yeschenko added a comment - You could use break-to-label instead of a bool in the get() loop, but shouldn't it just be a return instead of break? Yes I could. But some requests after the timed-out one could actually have succeeded - with the ratelimiter delay and all, and since we sent them, we might as well wait for the replies before triggering compaction (hints are deleted in that callback). And with return instead of break 1) compaction wouldn't be triggered, event if it's the last page of many and 2) "Finished hinted handoff .." message wouldn't be logged.
          Hide
          jbellis Jonathan Ellis added a comment -

          I think the reasoning for return instead of break in the FD block was that, if we haven't finished sending hints then we probably don't want to force a major compaction that will rewrite a bunch of undelivered hints.

          Show
          jbellis Jonathan Ellis added a comment - I think the reasoning for return instead of break in the FD block was that, if we haven't finished sending hints then we probably don't want to force a major compaction that will rewrite a bunch of undelivered hints.
          Show
          iamaleksey Aleksey Yeschenko added a comment - Maybe. It was break in 1.1 though - https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/db/HintedHandOffManager.java#L375
          Hide
          jbellis Jonathan Ellis added a comment -

          True... and 1.0 had a "if rowsReplayed > 0" around it. Not sure why we removed that, except maybe that it would almost always be true.

          How about we make the 1.0 check smarter and compact if we delivered over half the hints?

          Show
          jbellis Jonathan Ellis added a comment - True... and 1.0 had a "if rowsReplayed > 0" around it. Not sure why we removed that, except maybe that it would almost always be true. How about we make the 1.0 check smarter and compact if we delivered over half the hints?
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          How about we make the 1.0 check smarter and compact if we delivered over half the hints?

          wfm

          Show
          iamaleksey Aleksey Yeschenko added a comment - How about we make the 1.0 check smarter and compact if we delivered over half the hints? wfm
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Actually, it doesn't wfm. HHM is already complicated enough. I think returning completely is just fine there - as long as the log message mentions how many hints have been delivered.

          There will be many more opportunities for compaction - when hints to another node are all sent, or after another attempt in <= 10 minutes.

          Updated the branch (also did some very minor refactoring so that the paging logic would at least fit in a single screen, plus some even minorer changes that I just couldn't resist).

          Show
          iamaleksey Aleksey Yeschenko added a comment - Actually, it doesn't wfm. HHM is already complicated enough. I think returning completely is just fine there - as long as the log message mentions how many hints have been delivered. There will be many more opportunities for compaction - when hints to another node are all sent, or after another attempt in <= 10 minutes. Updated the branch (also did some very minor refactoring so that the paging logic would at least fit in a single screen, plus some even minorer changes that I just couldn't resist).
          Hide
          jbellis Jonathan Ellis added a comment -

          +1

          Show
          jbellis Jonathan Ellis added a comment - +1
          Hide
          iamaleksey Aleksey Yeschenko added a comment -

          Thanks, committed.

          Show
          iamaleksey Aleksey Yeschenko added a comment - Thanks, committed.

            People

            • Assignee:
              iamaleksey Aleksey Yeschenko
              Reporter:
              anttiko Antti Koivisto
              Reviewer:
              Jonathan Ellis
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development