Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 2
    • Component/s: None
    • Labels:

      Description

      Hints delivery to remote DC can take a long time (currently we have 70 ms for each hint). In our estimates 700 MB of data (stored hints) will be transmitting to remote DC more than one day (in our case), it's unacceptable for us. We suggest to enter hints delivery using batch operations.

      What do you think about it? Is there some facts that won't allow to implement that mechanism?

      I'll try to implement it if you approve and clarify right way.

      1. cassandra-1.1-4761-async_hints.txt
        21 kB
        Alexey Zotov
      2. cassandra-1.2-4761-async_hints.txt
        10 kB
        Alexey Zotov
      3. cassandra-1.2-4761-async_hints-v2.txt
        10 kB
        Alexey Zotov
      4. notices.txt
        2 kB
        Alexey Zotov

        Issue Links

          Activity

          Hide
          brandon.williams Brandon Williams added a comment -

          Sounds similar to CASSANDRA-4047, but I'm at an impasse there so if you work something up I'll be glad to take a look.

          Show
          brandon.williams Brandon Williams added a comment - Sounds similar to CASSANDRA-4047 , but I'm at an impasse there so if you work something up I'll be glad to take a look.
          Hide
          jbellis Jonathan Ellis added a comment -

          I don't think they are very similar. 4047 is about "how can I hint an entire sstable," and this is about "how can I optimize hint delivery over the WAN."

          Personally I don't think you'd be well-served by writing out sstables locally for hints and streaming them over. Easier and more effective to make hint delivery asynchronous – instead of waiting for each hint to be acked before sending another, send hints continuously (until throttled) and create callbacks to delete successfully delivered ones.

          Show
          jbellis Jonathan Ellis added a comment - I don't think they are very similar. 4047 is about "how can I hint an entire sstable," and this is about "how can I optimize hint delivery over the WAN." Personally I don't think you'd be well-served by writing out sstables locally for hints and streaming them over. Easier and more effective to make hint delivery asynchronous – instead of waiting for each hint to be acked before sending another, send hints continuously (until throttled) and create callbacks to delete successfully delivered ones.
          Hide
          azotcsit Alexey Zotov added a comment -

          I've attached two files: first version of the patch and file with comments.

          Show
          azotcsit Alexey Zotov added a comment - I've attached two files: first version of the patch and file with comments.
          Hide
          jbellis Jonathan Ellis added a comment -

          Thanks for the patch. At a high level things look reasonable. Some feedback:

          • patch needs to be against trunk
          • you're not going to gain much from adding multiple threads to RowMutation delivery, since it has to go over the same MessagingService thread anyway. We do want to allow multiple threads to allow parallelism across multiple destinations, but trunk already supports this.
          Show
          jbellis Jonathan Ellis added a comment - Thanks for the patch. At a high level things look reasonable. Some feedback: patch needs to be against trunk you're not going to gain much from adding multiple threads to RowMutation delivery, since it has to go over the same MessagingService thread anyway. We do want to allow multiple threads to allow parallelism across multiple destinations, but trunk already supports this.
          Hide
          azotcsit Alexey Zotov added a comment -

          I've attached file with discussed fixes. Also I've added some test for slice queries on expired columns.

          One more question, in 1.1 version there is the following bug:
          When destination node has been down during hints delivery process, source node doesn't stop the process. It tries to send hints until destination node won't be up again. Should I create separate ticket or can simply attach patch to it?

          Show
          azotcsit Alexey Zotov added a comment - I've attached file with discussed fixes. Also I've added some test for slice queries on expired columns. One more question, in 1.1 version there is the following bug: When destination node has been down during hints delivery process, source node doesn't stop the process. It tries to send hints until destination node won't be up again. Should I create separate ticket or can simply attach patch to it?
          Hide
          jbellis Jonathan Ellis added a comment -

          Not sure I follow. If a node goes down during delivery, sendMutation will timeout and deliverHintsToEndpointInternal will stop.

          Show
          jbellis Jonathan Ellis added a comment - Not sure I follow. If a node goes down during delivery, sendMutation will timeout and deliverHintsToEndpointInternal will stop.
          Hide
          azotcsit Alexey Zotov added a comment -

          Wnen sendMutation throws TimeouException it gives control to begining of the while loop ("break delivery;" line). So process starts again - it executes slice query from the last column (that was timed out on previous step).

          Show
          azotcsit Alexey Zotov added a comment - Wnen sendMutation throws TimeouException it gives control to begining of the while loop ("break delivery;" line). So process starts again - it executes slice query from the last column (that was timed out on previous step).
          Hide
          brandon.williams Brandon Williams added a comment -

          I think you're right, we'll loop forever trying the host without another FD.isAlive check.

          Show
          brandon.williams Brandon Williams added a comment - I think you're right, we'll loop forever trying the host without another FD.isAlive check.
          Hide
          brandon.williams Brandon Williams added a comment -

          Changed my mind, Jonathan is right. "break delivery" breaks the while loop, it doesn't 'jump' to the label.

          Show
          brandon.williams Brandon Williams added a comment - Changed my mind, Jonathan is right. "break delivery" breaks the while loop, it doesn't 'jump' to the label.
          Hide
          azotcsit Alexey Zotov added a comment -

          I agree with Jonathan too, sorry for the frictions
          New file has been attached.

          Show
          azotcsit Alexey Zotov added a comment - I agree with Jonathan too, sorry for the frictions New file has been attached.
          Hide
          jbellis Jonathan Ellis added a comment -

          Why does making it async require this code?

          .               // if the last column has been reached
                          if (!ByteBufferUtil.EMPTY_BYTE_BUFFER.equals(startColumn))
                          {
                              // start query from the first column
                              // it's needed to check if some hints were not delivered because of timeouts
                              startColumn = ByteBufferUtil.EMPTY_BYTE_BUFFER;
                              continue;
                          }
                          // we've started from the beginning and could not find anything (only maybe tombstones)
                          else
                          {
                              break;
                          }
          
          Show
          jbellis Jonathan Ellis added a comment - Why does making it async require this code? . // if the last column has been reached if (!ByteBufferUtil.EMPTY_BYTE_BUFFER.equals(startColumn)) { // start query from the first column // it's needed to check if some hints were not delivered because of timeouts startColumn = ByteBufferUtil.EMPTY_BYTE_BUFFER; continue ; } // we've started from the beginning and could not find anything (only maybe tombstones) else { break ; }
          Hide
          azotcsit Alexey Zotov added a comment -

          It's needed to handle situation when some hints were not delivered because of timeouts. I see only one way to do that - it's to make a new slice query and if it won't find anything - all hints have been successfully delivered.
          So when the first time we get "paging finished" condition (we've simply sent all hints without any checking) the startColumn is not equal to EMPTY. And then we'are trying to do the same query from the first column. If we find anything we'll continue hints delivery because found hints have been timed out and not deleted. If we don't find anything we'll exit because startColumn is equal to EMPTY.

          Do you have another thoughts about how to handle timeouts?

          Show
          azotcsit Alexey Zotov added a comment - It's needed to handle situation when some hints were not delivered because of timeouts. I see only one way to do that - it's to make a new slice query and if it won't find anything - all hints have been successfully delivered. So when the first time we get "paging finished" condition (we've simply sent all hints without any checking) the startColumn is not equal to EMPTY. And then we'are trying to do the same query from the first column. If we find anything we'll continue hints delivery because found hints have been timed out and not deleted. If we don't find anything we'll exit because startColumn is equal to EMPTY. Do you have another thoughts about how to handle timeouts?
          Hide
          jbellis Jonathan Ellis added a comment -

          LGTM, committed (and inlined the hint cleanup callback)

          Show
          jbellis Jonathan Ellis added a comment - LGTM, committed (and inlined the hint cleanup callback)
          Hide
          azotcsit Alexey Zotov added a comment -

          Thanks!

          Show
          azotcsit Alexey Zotov added a comment - Thanks!

            People

            • Assignee:
              azotcsit Alexey Zotov
              Reporter:
              azotcsit Alexey Zotov
              Reviewer:
              Jonathan Ellis
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development