Cassandra
  1. Cassandra
  2. CASSANDRA-2819

Split rpc timeout for read and write ops

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Given the vastly different latency characteristics of reads and writes, it makes sense for them to have independent rpc timeouts internally.

      1. 0001-additional-fix-for-2819.patch
        4 kB
        Vijay
      2. 2819-v4.txt
        5 kB
        Jonathan Ellis
      3. 2819-v5-rebased.txt
        12 kB
        Jonathan Ellis
      4. 2819-v8.txt
        24 kB
        Jonathan Ellis
      5. 2819-v9.txt
        26 kB
        Jonathan Ellis
      6. c2819.patch
        12 kB
        Melvin Wang
      7. c2819-v6
        11 kB
        Melvin Wang
      8. c2819-v7
        14 kB
        Melvin Wang
      9. rpc-jira.patch
        5 kB
        Melvin Wang

        Issue Links

          Activity

          Hide
          Stu Hood added a comment - - edited

          This issue relates to 959, which additionally proposed splitting out timeouts for multi-row operations, which is slightly more challenging. EDIT: And not worth doing here, imo.

          Show
          Stu Hood added a comment - - edited This issue relates to 959, which additionally proposed splitting out timeouts for multi-row operations, which is slightly more challenging. EDIT: And not worth doing here, imo.
          Hide
          Jonathan Ellis added a comment -
          • read_repair is a write and should use that timeout
          • should distinguish b/t multirow (range/index) reads, and single-row lookups
          • REQUEST_RESPONSE should drop based on what kind of query the request being responded to was
          • ExpiringMap should use expiration of max (read, read multirow, write)
          • which means the REQUEST_RESPONSE drop-messages block isn't entirely redundant wrt ExpiringMap, but if it's too difficult to look up what the message type was it's probably not a big deal to ignore
          • DD.getRpcTimeout should be removed and replace w/ the appropriate op timeout. If there are any internal operations that rely on rpctimeout (can't think of any that do) then we may want to add an "internal" timeout as well. (Or it may be evidence of a bug and we should fix it.)
          Show
          Jonathan Ellis added a comment - read_repair is a write and should use that timeout should distinguish b/t multirow (range/index) reads, and single-row lookups REQUEST_RESPONSE should drop based on what kind of query the request being responded to was ExpiringMap should use expiration of max (read, read multirow, write) which means the REQUEST_RESPONSE drop-messages block isn't entirely redundant wrt ExpiringMap, but if it's too difficult to look up what the message type was it's probably not a big deal to ignore DD.getRpcTimeout should be removed and replace w/ the appropriate op timeout. If there are any internal operations that rely on rpctimeout (can't think of any that do) then we may want to add an "internal" timeout as well. (Or it may be evidence of a bug and we should fix it.)
          Hide
          Melvin Wang added a comment -

          a better way to go would be that we make the timeout part of Message obj so that when adding callback in MessagingService, we can pass in proper timeout to ExpiringMap and also in MessageDeliveryTask we can drop the message based on its own timeout. Not sure how complicated this is would be though.

          OutboundTcpConnection uses getRpcTimeout during connection which is a surprise to me.

          Show
          Melvin Wang added a comment - a better way to go would be that we make the timeout part of Message obj so that when adding callback in MessagingService, we can pass in proper timeout to ExpiringMap and also in MessageDeliveryTask we can drop the message based on its own timeout. Not sure how complicated this is would be though. OutboundTcpConnection uses getRpcTimeout during connection which is a surprise to me.
          Hide
          Jonathan Ellis added a comment -

          Why not just do a lookup base on verb type which is already part of Message?

          OTC should probably hardcode something reasonable, no reason for it to be related to RPC calls.

          Show
          Jonathan Ellis added a comment - Why not just do a lookup base on verb type which is already part of Message? OTC should probably hardcode something reasonable, no reason for it to be related to RPC calls.
          Hide
          Melvin Wang added a comment -

          yep, a simple look up will do. I guess I was confused by Request_Response messages. When we create request_response we don't record which type of message it replies to so in MessageDeliveryTask it's difficult to figure it out. Thus I wanted some way to associate the info to the request_response messages so that we could better time them out in MessageDeliveryTask. This is certainly complicated.

          Also in OutboundTcpConnection we better honor the per-verb timeout as well by providing a way to set the timeout value before we try to connect. If we want to hardcode, then it needs to be the max of (read, range read, write) since it lies on the path of these operations.

          Show
          Melvin Wang added a comment - yep, a simple look up will do. I guess I was confused by Request_Response messages. When we create request_response we don't record which type of message it replies to so in MessageDeliveryTask it's difficult to figure it out. Thus I wanted some way to associate the info to the request_response messages so that we could better time them out in MessageDeliveryTask. This is certainly complicated. Also in OutboundTcpConnection we better honor the per-verb timeout as well by providing a way to set the timeout value before we try to connect. If we want to hardcode, then it needs to be the max of (read, range read, write) since it lies on the path of these operations.
          Hide
          Melvin Wang added a comment -

          scratch my comments about the OTC part. It is not true. What I want is a checkpoint after the server has processed the verb where we check if we are already timed out so that we won't bother to use the network bandwidth to send back the response. I will move this logic into MessagingService.

          Show
          Melvin Wang added a comment - scratch my comments about the OTC part. It is not true. What I want is a checkpoint after the server has processed the verb where we check if we are already timed out so that we won't bother to use the network bandwidth to send back the response. I will move this logic into MessagingService.
          Hide
          Jonathan Ellis added a comment -

          We check before processing to see if we can skip it entirely; probably not worth doing another check after processing too.

          Show
          Jonathan Ellis added a comment - We check before processing to see if we can skip it entirely; probably not worth doing another check after processing too.
          Hide
          Melvin Wang added a comment -

          sure. However, the current logic in MessageDeliveryTask doesnt seem right. It compares against the creation time of MessageDeliveryTask itself. I added the creation time to Message class to address this.

          Show
          Melvin Wang added a comment - sure. However, the current logic in MessageDeliveryTask doesnt seem right. It compares against the creation time of MessageDeliveryTask itself. I added the creation time to Message class to address this.
          Hide
          Jonathan Ellis added a comment -

          That is the wrong place for it because Message is used both to send and to receive. MDT creation time is effectively identical.

          Show
          Jonathan Ellis added a comment - That is the wrong place for it because Message is used both to send and to receive. MDT creation time is effectively identical.
          Hide
          Melvin Wang added a comment -

          how about add the creation timestamp to the header of the message? MDT is executed nearly immediately after it is created. Thus the construction time of MDT is too close to the checkpoint we have in run(). I am just concerned with the effectiveness of the current logic in MDT, although I am not sure about the consequences of adding 4 bytes to all the messages we created.

          Show
          Melvin Wang added a comment - how about add the creation timestamp to the header of the message? MDT is executed nearly immediately after it is created. Thus the construction time of MDT is too close to the checkpoint we have in run(). I am just concerned with the effectiveness of the current logic in MDT, although I am not sure about the consequences of adding 4 bytes to all the messages we created.
          Hide
          Jonathan Ellis added a comment -

          Let's keep the scope of this ticket to splitting the rpc timeout. We can open another for "make request dropping more accurate/aggressive."

          Show
          Jonathan Ellis added a comment - Let's keep the scope of this ticket to splitting the rpc timeout. We can open another for "make request dropping more accurate/aggressive."
          Hide
          Ryan King added a comment -
          Show
          Ryan King added a comment - Opened https://issues.apache.org/jira/browse/CASSANDRA-2819 for the followup.
          Hide
          Jonathan Ellis added a comment -

          (that's CASSANDRA-2858)

          Show
          Jonathan Ellis added a comment - (that's CASSANDRA-2858 )
          Hide
          Jonathan Ellis added a comment -

          Are you still working on this, Melvin?

          Show
          Jonathan Ellis added a comment - Are you still working on this, Melvin?
          Hide
          Stu Hood added a comment -

          He is, but he's out sick: we'll post something soon.

          Show
          Stu Hood added a comment - He is, but he's out sick: we'll post something soon.
          Hide
          Melvin Wang added a comment -

          Sorry for the waiting. I reverted the changes I made to Message.java and MessageDeliveryTask.java while keep the changes elsewhere. This is based on the previous discussions where we a separate ticket to make the timeout more accurate.

          Show
          Melvin Wang added a comment - Sorry for the waiting. I reverted the changes I made to Message.java and MessageDeliveryTask.java while keep the changes elsewhere. This is based on the previous discussions where we a separate ticket to make the timeout more accurate.
          Hide
          Jonathan Ellis added a comment -

          Thanks, Melvin!

          should we move MS.getMessageTimeout to an instance method of Message? This would make per-Message timeouts easier in the future as well.

          Let's make the default values in Config 10s as well, to avoid surprising people who upgrade but keep their old config file around.

          Otherwise LGTM.

          Show
          Jonathan Ellis added a comment - Thanks, Melvin! should we move MS.getMessageTimeout to an instance method of Message? This would make per-Message timeouts easier in the future as well. Let's make the default values in Config 10s as well, to avoid surprising people who upgrade but keep their old config file around. Otherwise LGTM.
          Hide
          Melvin Wang added a comment -

          getMessageTimeout moved to Message.java
          set the default value for the timeouts to be 10s in Config.java

          Show
          Melvin Wang added a comment - getMessageTimeout moved to Message.java set the default value for the timeouts to be 10s in Config.java
          Hide
          Jonathan Ellis added a comment -

          committed, thanks!

          Show
          Jonathan Ellis added a comment - committed, thanks!
          Hide
          Jonathan Ellis added a comment -

          Sorry, it applied (almost) cleanly to trunk so I thought we were good, but it creates problems with the CASSANDRA-2644 code.

          Show
          Jonathan Ellis added a comment - Sorry, it applied (almost) cleanly to trunk so I thought we were good, but it creates problems with the CASSANDRA-2644 code.
          Hide
          Jonathan Ellis added a comment -

          (reverted)

          Show
          Jonathan Ellis added a comment - (reverted)
          Hide
          Hudson added a comment -

          Integrated in Cassandra #988 (See https://builds.apache.org/job/Cassandra/988/)
          split rpc_timeout into read,write, scan, and "other" timeouts
          patch by Melvin Wang; reviewed by jbellis for CASSANDRA-2819

          jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152541
          Files :

          • /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java
          • /cassandra/trunk/CHANGES.txt
          • /cassandra/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
          • /cassandra/trunk/conf/cassandra.yaml
          • /cassandra/trunk/src/java/org/apache/cassandra/net/Message.java
          • /cassandra/trunk/src/java/org/apache/cassandra/config/Config.java
          • /cassandra/trunk/src/java/org/apache/cassandra/net/MessageDeliveryTask.java
          Show
          Hudson added a comment - Integrated in Cassandra #988 (See https://builds.apache.org/job/Cassandra/988/ ) split rpc_timeout into read,write, scan, and "other" timeouts patch by Melvin Wang; reviewed by jbellis for CASSANDRA-2819 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152541 Files : /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java /cassandra/trunk/CHANGES.txt /cassandra/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java /cassandra/trunk/conf/cassandra.yaml /cassandra/trunk/src/java/org/apache/cassandra/net/Message.java /cassandra/trunk/src/java/org/apache/cassandra/config/Config.java /cassandra/trunk/src/java/org/apache/cassandra/net/MessageDeliveryTask.java
          Hide
          Melvin Wang added a comment -

          fix the calling sendrr in bootstrapper.java

          need another integration?

          Show
          Melvin Wang added a comment - fix the calling sendrr in bootstrapper.java need another integration?
          Hide
          Jonathan Ellis added a comment -

          v4 attached (against trunk), with compilation fixes.

          Looks like it needs another check for uses of DD.getRpcTimeout. Here are some suspicious ones:

          • MS.timeoutReporter
          • ReadCallback
          • RepairCallback
          • StorageProxy.DroppableRunnable
          Show
          Jonathan Ellis added a comment - v4 attached (against trunk), with compilation fixes. Looks like it needs another check for uses of DD.getRpcTimeout. Here are some suspicious ones: MS.timeoutReporter ReadCallback RepairCallback StorageProxy.DroppableRunnable
          Hide
          Melvin Wang added a comment -

          In MS.timeoutReporter, we blindly add default rpc timeout as the latency. Since different messages may have different timeout now, in ExpiringMap.CacheMonitor, probably we need to pass the individual expiration time when we call apply().

          Show
          Melvin Wang added a comment - In MS.timeoutReporter, we blindly add default rpc timeout as the latency. Since different messages may have different timeout now, in ExpiringMap.CacheMonitor, probably we need to pass the individual expiration time when we call apply().
          Hide
          Melvin Wang added a comment -

          looks like readcallback, repaircallback, abstractWriteresponseHandler all assume the timeout value from getRpcTimeout(); however with this patch, ExpiringMap will timeout different messages with different values, so we may have situation where expiringMap already expires the message while the get() still blocks if we have different read_rpc_timeout/write_rpc_timeout with the 'default' rpc timeout. I'm wondering if we could modify the IAsyncCallback so that we expiringMap expires a message, it will signal the callback and mark it as a failure, and hence in get() we don't need to have the logic to track the timeout.

          Show
          Melvin Wang added a comment - looks like readcallback, repaircallback, abstractWriteresponseHandler all assume the timeout value from getRpcTimeout(); however with this patch, ExpiringMap will timeout different messages with different values, so we may have situation where expiringMap already expires the message while the get() still blocks if we have different read_rpc_timeout/write_rpc_timeout with the 'default' rpc timeout. I'm wondering if we could modify the IAsyncCallback so that we expiringMap expires a message, it will signal the callback and mark it as a failure, and hence in get() we don't need to have the logic to track the timeout.
          Hide
          Jonathan Ellis added a comment -

          Sounds reasonable to me.

          Show
          Jonathan Ellis added a comment - Sounds reasonable to me.
          Hide
          Jonathan Ellis added a comment -

          Melvin, are you still working on this?

          Show
          Jonathan Ellis added a comment - Melvin, are you still working on this?
          Hide
          Melvin Wang added a comment -

          I was on vacation since mid last week. Picking it up now.

          Show
          Melvin Wang added a comment - I was on vacation since mid last week. Picking it up now.
          Hide
          Melvin Wang added a comment -

          make ReadCallback, RepairCallback cancellable through timeoutReporter;
          make it possible to get the timeout value by verb so that DroppableRunnable can drop according to different verbs.

          Show
          Melvin Wang added a comment - make ReadCallback, RepairCallback cancellable through timeoutReporter; make it possible to get the timeout value by verb so that DroppableRunnable can drop according to different verbs.
          Hide
          Jonathan Ellis added a comment -

          rebased to trunk. (IAsyncCancellableCallback is not included in the patch, so I took my best guess at one.)

          Comments:

          • Writes still use the old rpc_timeout
          • would prefer to rename old rpc_timeout to other_rpc_timout and comment what it's used for (truncate, ?)
          • looks like most remaining uses of old rpc_timeout should really be max(different timeouts)
          • currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks)
          Show
          Jonathan Ellis added a comment - rebased to trunk. (IAsyncCancellableCallback is not included in the patch, so I took my best guess at one.) Comments: Writes still use the old rpc_timeout would prefer to rename old rpc_timeout to other_rpc_timout and comment what it's used for (truncate, ?) looks like most remaining uses of old rpc_timeout should really be max(different timeouts) currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks)
          Hide
          Melvin Wang added a comment -

          - currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks)

          yeah, i guess, calling getReadRpcTimeout/getWriteRpcTimeout explicitly in ReadCallback/RepairCallback would be simpler and more correct.

          Show
          Melvin Wang added a comment - - currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks) yeah, i guess, calling getReadRpcTimeout/getWriteRpcTimeout explicitly in ReadCallback/RepairCallback would be simpler and more correct.
          Hide
          Melvin Wang added a comment -

          Writes still use the old rpc_timeout

          Changed to getWriteRpcTimeout()

          looks like most remaining uses of old rpc_timeout should really be max(different timeouts)

          wanna modify the getRpcTimeout in DD so that it returns the max of the other get*RpcTimeout()?

          currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks)

          Not a so elegant solution in this patch. It uses the corresponding get*RpcTimeout in different callbacks.

          Show
          Melvin Wang added a comment - Writes still use the old rpc_timeout Changed to getWriteRpcTimeout() looks like most remaining uses of old rpc_timeout should really be max(different timeouts) wanna modify the getRpcTimeout in DD so that it returns the max of the other get*RpcTimeout()? currently, expiring map checks callbacks for expiration every 1/2 of default timeout. This means short timeouts could go much longer than desired, before being cancelled. I see two options: go back to direct await(timeout) in the callback/response handlers, or update EM to scan much more frequently (which could burn a lot of CPU for a system with many outstanding callbacks) Not a so elegant solution in this patch. It uses the corresponding get*RpcTimeout in different callbacks.
          Hide
          Jonathan Ellis added a comment -

          wanna modify the getRpcTimeout in DD so that it returns the max of the other get*RpcTimeout()?

          I think we need to split it into getGenericRpcTimeout (getMiscellaneousRpcTimeout?) and getMaxRpcTimeout.

          Not a so elegant solution in this patch. It uses the corresponding get*RpcTimeout in different callbacks.

          looks good to me.

          Show
          Jonathan Ellis added a comment - wanna modify the getRpcTimeout in DD so that it returns the max of the other get*RpcTimeout()? I think we need to split it into getGenericRpcTimeout (getMiscellaneousRpcTimeout?) and getMaxRpcTimeout. Not a so elegant solution in this patch. It uses the corresponding get*RpcTimeout in different callbacks. looks good to me.
          Hide
          Melvin Wang added a comment -

          Please take another look. Following changes made

          • CassandraServer.java, set the timeout when we schedule individual Rpcs
          • outboundtcpconnection uses per message timeout.

          there are still couple of places undetermined.

          • FailureDetector.java:249, we need a getGossipTimeout() here?
          • MessagingService.java, in expiringMap we probably want getMaxRpcTimeout()
          • StorageService.java:sendReplicationNotification()/truncateResponseHander/FileStreamTask, we use getGenericRpcTimeout()?
          • StorageProxy.java, 3 waitOnFutures calls for repairs, they should bear the timeout of the enclosing call?
          • StorageProxy.java, describeSchema and truncate should use getGenericRpcTimeout()?
          • OutboundTcpConnection.java, in connect(), shall we have a getSocketTimeout() or use getGenericRpcTimeout()?
          Show
          Melvin Wang added a comment - Please take another look. Following changes made CassandraServer.java, set the timeout when we schedule individual Rpcs outboundtcpconnection uses per message timeout. there are still couple of places undetermined. FailureDetector.java:249, we need a getGossipTimeout() here? MessagingService.java, in expiringMap we probably want getMaxRpcTimeout() StorageService.java:sendReplicationNotification()/truncateResponseHander/FileStreamTask, we use getGenericRpcTimeout()? StorageProxy.java, 3 waitOnFutures calls for repairs, they should bear the timeout of the enclosing call? StorageProxy.java, describeSchema and truncate should use getGenericRpcTimeout()? OutboundTcpConnection.java, in connect(), shall we have a getSocketTimeout() or use getGenericRpcTimeout()?
          Hide
          Chris Goffinet added a comment -

          We really want to get this in 1.1, can we get some feedback?

          Show
          Chris Goffinet added a comment - We really want to get this in 1.1, can we get some feedback?
          Hide
          Jonathan Ellis added a comment -

          v8 adds truncate timeout and rebases.

          I don't think separate gossip or socket timeouts are crucial. Happy to leave them using the generic one.

          Updated the repair waits to use write timeout.

          Still todo:

          • handle expiringmap somehow. max(timeouts) would be technically correct but with truncate taking a (much) longer timeout (since it has to wait for compaction to finish), that could leave a lot of garbage in the map
          Show
          Jonathan Ellis added a comment - v8 adds truncate timeout and rebases. I don't think separate gossip or socket timeouts are crucial. Happy to leave them using the generic one. Updated the repair waits to use write timeout. Still todo: handle expiringmap somehow. max(timeouts) would be technically correct but with truncate taking a (much) longer timeout (since it has to wait for compaction to finish), that could leave a lot of garbage in the map
          Hide
          Jonathan Ellis added a comment -

          I was thinking about this wrong... we actually want to set up the EM with min(timeouts); as long as most of the operations are reasonably close to that, we'll be fine. Which, since the main outlier for us are truncates which are relatively rare, matches our requirements nicely.

          v9 attached with that added.

          Show
          Jonathan Ellis added a comment - I was thinking about this wrong... we actually want to set up the EM with min(timeouts); as long as most of the operations are reasonably close to that, we'll be fine. Which, since the main outlier for us are truncates which are relatively rare, matches our requirements nicely. v9 attached with that added.
          Hide
          Vijay added a comment -

          +1

          Show
          Vijay added a comment - +1
          Hide
          Jonathan Ellis added a comment - - edited

          Committed to trunk. I'm okay with backporting to 1.1 should someone step up to do that for 1.1.2; my quick look is that it should be straightforward but not automatic (runs into things like the underscore removal).

          Show
          Jonathan Ellis added a comment - - edited Committed to trunk. I'm okay with backporting to 1.1 should someone step up to do that for 1.1.2; my quick look is that it should be straightforward but not automatic (runs into things like the underscore removal).
          Hide
          Brandon Williams added a comment -

          The target can be null when the host is killed before it fires:

          ERROR [EXPIRING-MAP-REAPER:1] 2012-06-21 15:55:27,708 AbstractCassandraDaemon.java (line 133) Exception in thread Thread[EXPIRING-MAP-REAPER:1,5,main]
          java.lang.NullPointerException
                  at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:326)
                  at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:322)
                  at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:99)
                  at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          
          Show
          Brandon Williams added a comment - The target can be null when the host is killed before it fires: ERROR [EXPIRING-MAP-REAPER:1] 2012-06-21 15:55:27,708 AbstractCassandraDaemon.java (line 133) Exception in thread Thread[EXPIRING-MAP-REAPER:1,5,main] java.lang.NullPointerException at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:326) at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:322) at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:99) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
          Hide
          Vijay added a comment -

          The target can be null when the host is killed before it fires

          I think the NPE is because CallbackInfo.sentMessage is null right?
          i think it is better to move store the timeout value in the CallbackInfo...

          Show
          Vijay added a comment - The target can be null when the host is killed before it fires I think the NPE is because CallbackInfo.sentMessage is null right? i think it is better to move store the timeout value in the CallbackInfo...
          Hide
          Jonathan Ellis added a comment -

          Right. This is broken, reverted.

          Show
          Jonathan Ellis added a comment - Right. This is broken, reverted.
          Hide
          Jonathan Ellis added a comment -

          We already have the timeout info in the EM entry, probably simplest to just pass that to the reporter method instead of storing it redundantly in the callback as well.

          Show
          Jonathan Ellis added a comment - We already have the timeout info in the EM entry, probably simplest to just pass that to the reporter method instead of storing it redundantly in the callback as well.
          Hide
          Vijay added a comment -

          Fix for NPE.

          Show
          Vijay added a comment - Fix for NPE.
          Hide
          Jonathan Ellis added a comment -

          lgtm, committed

          Show
          Jonathan Ellis added a comment - lgtm, committed
          Hide
          Shotaro Kamio added a comment -

          Can we expect backport of patches to 1.1.x?

          Show
          Shotaro Kamio added a comment - Can we expect backport of patches to 1.1.x?
          Hide
          Jonathan Ellis added a comment -

          Not at this point. 1.1 is very close to becoming "old-stable" with the release of 1.2.

          Show
          Jonathan Ellis added a comment - Not at this point. 1.1 is very close to becoming "old-stable" with the release of 1.2.

            People

            • Assignee:
              Jonathan Ellis
              Reporter:
              Stu Hood
              Reviewer:
              Vijay
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development