Cassandra
  1. Cassandra
  2. CASSANDRA-3554

Hints are not replayed unless node was marked down

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.0.7
    • Component/s: None
    • Labels:

      Description

      If B drops a write from A because it is overwhelmed (but not dead), A will hint the write. But it will never get notified that B is back up (since it was never down), so it will never attempt hint delivery.

      1. 0001-cleanup.patch
        12 kB
        Jonathan Ellis
      2. 0002-deliver.patch
        5 kB
        Jonathan Ellis
      3. 3554-1.0.txt
        14 kB
        Jonathan Ellis
      4. 3554-1.0-v2.txt
        15 kB
        Jonathan Ellis

        Activity

        Hide
        Jonathan Ellis added a comment -

        Unclear how we should tell when it's a good idea to re-attempt delivery in this scenario.

        Possibly the best solution is to just make FD smarter and mark nodes as "effectively down" in this situation. A "local" FD as in CASSANDRA-3533 could address this.

        Show
        Jonathan Ellis added a comment - Unclear how we should tell when it's a good idea to re-attempt delivery in this scenario. Possibly the best solution is to just make FD smarter and mark nodes as "effectively down" in this situation. A "local" FD as in CASSANDRA-3533 could address this.
        Hide
        Sylvain Lebresne added a comment -

        Couldn't B handles this? When it drops writes from A, it could record it. Then B could have a scheduled tasks that looks for locally dropped writes and decide if it's ok to get hints based on the mutation stage queue. It could then request the hint delivery from A.

        Show
        Sylvain Lebresne added a comment - Couldn't B handles this? When it drops writes from A, it could record it. Then B could have a scheduled tasks that looks for locally dropped writes and decide if it's ok to get hints based on the mutation stage queue. It could then request the hint delivery from A.
        Hide
        Jonathan Ellis added a comment -

        That could work, but I cringe at adding both "pull" and "push" modes for hint delivery, which has historically been a source of enough bugs that more complexity is counterindicated.

        Show
        Jonathan Ellis added a comment - That could work, but I cringe at adding both "pull" and "push" modes for hint delivery, which has historically been a source of enough bugs that more complexity is counterindicated.
        Hide
        Brandon Williams added a comment -

        At risk of heresy, would bringing the hourly scan back be so bad now that our hint model doesn't suck like it did the first time we did that?

        Show
        Brandon Williams added a comment - At risk of heresy, would bringing the hourly scan back be so bad now that our hint model doesn't suck like it did the first time we did that?
        Hide
        T Jake Luciani added a comment -

        Can we keep a running tally of hints per endpoint when they are written, when they reach a threshold we deliver them? + hourly scan

        Show
        T Jake Luciani added a comment - Can we keep a running tally of hints per endpoint when they are written, when they reach a threshold we deliver them? + hourly scan
        Hide
        Jonathan Ellis added a comment -

        The problem I have with a "brute force" hourly scan or hint threshold is that you're likely to run into the same overload scenario that caused the hinting in the first place.

        Show
        Jonathan Ellis added a comment - The problem I have with a "brute force" hourly scan or hint threshold is that you're likely to run into the same overload scenario that caused the hinting in the first place.
        Hide
        Edward Capriolo added a comment - - edited

        Could we use the badness detector in dynamic switch?

        Show
        Edward Capriolo added a comment - - edited Could we use the badness detector in dynamic switch?
        Hide
        T Jake Luciani added a comment -

        The other problem is even if we fix the replay issue it's still terribly slow due to excessive throttling

        I like the idea of changing from a push to pull mode for hint delivery. Similar to how mysql replication is client pull. Clients know how swamped they are and can throttle their own delivery.

        Show
        T Jake Luciani added a comment - The other problem is even if we fix the replay issue it's still terribly slow due to excessive throttling I like the idea of changing from a push to pull mode for hint delivery. Similar to how mysql replication is client pull. Clients know how swamped they are and can throttle their own delivery.
        Hide
        Jonathan Ellis added a comment -

        The problem with a pull model is that the node doing the pulling usually won't know "I was down, therefore I should ask for hints." (The exception is on restart, but hints due to overload conditions or GC pauses are much more common.)

        Show
        Jonathan Ellis added a comment - The problem with a pull model is that the node doing the pulling usually won't know "I was down, therefore I should ask for hints." (The exception is on restart, but hints due to overload conditions or GC pauses are much more common.)
        Hide
        T Jake Luciani added a comment -

        Right, however its never going to know if hints are on a coordinator node due to the coordinator needed to drop some messages (backpressure?)

        So either the clients can poll all nodes slowly and fetch hints or we perhaps gossip hints available flag so nodes know when hints are there to read?

        Show
        T Jake Luciani added a comment - Right, however its never going to know if hints are on a coordinator node due to the coordinator needed to drop some messages (backpressure?) So either the clients can poll all nodes slowly and fetch hints or we perhaps gossip hints available flag so nodes know when hints are there to read?
        Hide
        Jonathan Ellis added a comment -

        Right. So, not clearly simpler than doing something to our existing push model, but much more code churn. I'd rather stick with push.

        Show
        Jonathan Ellis added a comment - Right. So, not clearly simpler than doing something to our existing push model, but much more code churn. I'd rather stick with push.
        Hide
        Edward Capriolo added a comment -

        How expensive is the process of "1) Wake Up. 2)Check for hints. 3) try to deliver them." As for parts 1 & 2 I would think we can do this more often then hourly, maybe a background thread that sleeps for N seconds and attempts again.


        The other problem is even if we fix the replay issue it's still terribly slow due to excessive throttling

        Side note. This throttle can not currently be adjusted at runtime. This should be JMX able. The default may be too low. Historically the problem was the hint sending node got hammered. In my mind the throttle was protecting that system.

        Show
        Edward Capriolo added a comment - How expensive is the process of "1) Wake Up. 2)Check for hints. 3) try to deliver them." As for parts 1 & 2 I would think we can do this more often then hourly, maybe a background thread that sleeps for N seconds and attempts again. The other problem is even if we fix the replay issue it's still terribly slow due to excessive throttling Side note. This throttle can not currently be adjusted at runtime. This should be JMX able. The default may be too low. Historically the problem was the hint sending node got hammered. In my mind the throttle was protecting that system.
        Hide
        Jonathan Ellis added a comment -

        Brute force fix attached to check for hints-to-deliver every 10 minutes. If a hint cannot be replayed because of timing out, we abort (to that target). Reduces default hint throttle delay to 1ms.

        Show
        Jonathan Ellis added a comment - Brute force fix attached to check for hints-to-deliver every 10 minutes. If a hint cannot be replayed because of timing out, we abort (to that target). Reduces default hint throttle delay to 1ms.
        Hide
        Jonathan Ellis added a comment -

        0001 does some related cleanup, including moving the mbean method StorageService.deliverHints to HHOM.scheduleHintDelivery.

        Show
        Jonathan Ellis added a comment - 0001 does some related cleanup, including moving the mbean method StorageService.deliverHints to HHOM.scheduleHintDelivery.
        Hide
        Jonathan Ellis added a comment -

        combined patch against 1.0

        Show
        Jonathan Ellis added a comment - combined patch against 1.0
        Hide
        Jonathan Ellis added a comment -

        updated 1.0 rebase that is pre-1034 friendly

        Show
        Jonathan Ellis added a comment - updated 1.0 rebase that is pre-1034 friendly
        Hide
        Brandon Williams added a comment -

        I'm not sure how exactly, but obviously the keys being passed here are not quite what we think they are:

        DEBUG 11:57:03,907 Started scheduleAllDeliveries
        DEBUG 11:57:03,907 deliverHints to /7fff:ffff:ffff:ffff:ffff:ffff:ffff:fffe
        DEBUG 11:57:03,908 deliverHints to /5555:5555:5555:5555:5555:5555:5555:5554
        DEBUG 11:57:03,908 Checking remote(/7fff:ffff:ffff:ffff:ffff:ffff:ffff:fffe) schema before delivering hints
        DEBUG 11:57:03,908 Finished scheduleAllDeliveries
        ERROR 11:57:03,909 Fatal exception in thread Thread[HintedHandoff:3,1,main]
        java.lang.NullPointerException
                at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:206)
                at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:238)
                at org.apache.cassandra.db.HintedHandOffManager.access$200(HintedHandOffManager.java:84)
                at org.apache.cassandra.db.HintedHandOffManager$3.runMayThrow(HintedHandOffManager.java:383)
                at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        
        Show
        Brandon Williams added a comment - I'm not sure how exactly, but obviously the keys being passed here are not quite what we think they are: DEBUG 11:57:03,907 Started scheduleAllDeliveries DEBUG 11:57:03,907 deliverHints to /7fff:ffff:ffff:ffff:ffff:ffff:ffff:fffe DEBUG 11:57:03,908 deliverHints to /5555:5555:5555:5555:5555:5555:5555:5554 DEBUG 11:57:03,908 Checking remote(/7fff:ffff:ffff:ffff:ffff:ffff:ffff:fffe) schema before delivering hints DEBUG 11:57:03,908 Finished scheduleAllDeliveries ERROR 11:57:03,909 Fatal exception in thread Thread[HintedHandoff:3,1,main] java.lang.NullPointerException at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:206) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:238) at org.apache.cassandra.db.HintedHandOffManager.access$200(HintedHandOffManager.java:84) at org.apache.cassandra.db.HintedHandOffManager$3.runMayThrow(HintedHandOffManager.java:383) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
        Hide
        Jonathan Ellis added a comment -

        You're right, we switched from InetAddress as the key to Tokens. v2 attached.

        Show
        Jonathan Ellis added a comment - You're right, we switched from InetAddress as the key to Tokens. v2 attached.
        Hide
        Brandon Williams added a comment -

        +1

        Show
        Brandon Williams added a comment - +1
        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Hide
        MaHaiyang added a comment -

        +1
        "shedule deliver hints" should have been more ealier brought in .

        Show
        MaHaiyang added a comment - +1 "shedule deliver hints" should have been more ealier brought in .

          People

          • Assignee:
            Jonathan Ellis
            Reporter:
            Jonathan Ellis
            Reviewer:
            Brandon Williams
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development