HBase
  1. HBase
  2. HBASE-3787

Increment is non-idempotent but client retries RPC

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.94.4, 0.95.2
    • Fix Version/s: 0.98.0, 0.99.0
    • Component/s: Client
    • Labels:
      None

      Description

      The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server.

      For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly.

      1. HBASE-3787-v9.patch
        202 kB
        Sergey Shelukhin
      2. HBASE-3787-v8.patch
        201 kB
        Sergey Shelukhin
      3. HBASE-3787-v7.patch
        199 kB
        Sergey Shelukhin
      4. HBASE-3787-v6.patch
        190 kB
        Sergey Shelukhin
      5. HBASE-3787-v5.patch
        119 kB
        Sergey Shelukhin
      6. HBASE-3787-v5.patch
        119 kB
        Sergey Shelukhin
      7. HBASE-3787-v4.patch
        117 kB
        Sergey Shelukhin
      8. HBASE-3787-v3.patch
        111 kB
        Sergey Shelukhin
      9. HBASE-3787-v2.patch
        111 kB
        Sergey Shelukhin
      10. HBASE-3787-v12.patch
        210 kB
        Sergey Shelukhin
      11. HBASE-3787-v11.patch
        209 kB
        Sergey Shelukhin
      12. HBASE-3787-v10.patch
        204 kB
        Sergey Shelukhin
      13. HBASE-3787-v1.patch
        108 kB
        Sergey Shelukhin
      14. HBASE-3787-v0.patch
        94 kB
        Sergey Shelukhin
      15. HBASE-3787-partial.patch
        64 kB
        Sergey Shelukhin

        Issue Links

          Activity

          dhruba borthakur created issue -
          stack made changes -
          Field Original Value New Value
          Link This issue relates to HBASE-2182 [ HBASE-2182 ]
          Hide
          Lars Hofhansl added a comment -

          This is an interesting one.
          To me the main goal of retries is to ride over a split or region move. If we can isolate that condition and avoid retrying for all "undecided" conditions (the various timeouts) we should be OK.

          Show
          Lars Hofhansl added a comment - This is an interesting one. To me the main goal of retries is to ride over a split or region move. If we can isolate that condition and avoid retrying for all "undecided" conditions (the various timeouts) we should be OK.
          Andrew Purtell made changes -
          Assignee dhruba borthakur [ dhruba ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Affects Version/s 0.94.4 [ 12323367 ]
          Affects Version/s 0.96.0 [ 12320040 ]
          Priority Major [ 3 ] Critical [ 2 ]
          Hide
          Andrew Purtell added a comment -

          I think this has come up again on the user list. See http://search-hadoop.com/m/3naHBtQZV51/. Since we can easily change the on wire RPC format of various operations for 0.96, now is a good time to look at this. A solution might be to introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. Not sure exactly how that would work server side.

          Show
          Andrew Purtell added a comment - I think this has come up again on the user list. See http://search-hadoop.com/m/3naHBtQZV51/ . Since we can easily change the on wire RPC format of various operations for 0.96, now is a good time to look at this. A solution might be to introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. Not sure exactly how that would work server side.
          Hide
          Ted Yu added a comment -

          So the server would apply Increment requests with the same nounce only once ?

          Show
          Ted Yu added a comment - So the server would apply Increment requests with the same nounce only once ?
          Hide
          Andrew Purtell added a comment -

          So the server would apply Increment requests with the same nounce only once ?

          Obviously.

          Show
          Andrew Purtell added a comment - So the server would apply Increment requests with the same nounce only once ? Obviously.
          Hide
          Enis Soztutar added a comment -

          This is a bit orthogonal, but we need a nounce-like mechanism for client -> master RPC's as well. Currently most of the master operations are semi-async, meaning that they initiate a task (disable table), but have no way of asking the master later, about the status of the request. This is different than using RPC callId, because the RPC call itself finishes, but not the side effect of the operation.

          Show
          Enis Soztutar added a comment - This is a bit orthogonal, but we need a nounce-like mechanism for client -> master RPC's as well. Currently most of the master operations are semi-async, meaning that they initiate a task (disable table), but have no way of asking the master later, about the status of the request. This is different than using RPC callId, because the RPC call itself finishes, but not the side effect of the operation.
          Hide
          Andrew Purtell added a comment -

          If we define the nonce as per-op instead of per-RPC nonce then it's no longer an orthogonal problem.

          Show
          Andrew Purtell added a comment - If we define the nonce as per-op instead of per-RPC nonce then it's no longer an orthogonal problem.
          Hide
          Ted Yu added a comment - - edited

          On region server, we need to keep a map from row key to set of nonce's, per table, which reflect client requests the server has seen.
          We can set time bound on how long such mapping should be kept so that this map doesn't consume too much heap. This implies associating nonce with the timestamp when request containing the nonce was received.

          Show
          Ted Yu added a comment - - edited On region server, we need to keep a map from row key to set of nonce's, per table, which reflect client requests the server has seen. We can set time bound on how long such mapping should be kept so that this map doesn't consume too much heap. This implies associating nonce with the timestamp when request containing the nonce was received.
          Enis Soztutar made changes -
          Link This issue is related to HADOOP-8394 [ HADOOP-8394 ]
          Hide
          Enis Soztutar added a comment -

          This is also the related to the problem we encountered during NN failover testing, where NN does not do create file operation in an idempotent way. If the create block succeeds, but response is lost due to NN crash, next RPC from retry will fail. Linking the Hadoop issue.

          Show
          Enis Soztutar added a comment - This is also the related to the problem we encountered during NN failover testing, where NN does not do create file operation in an idempotent way. If the create block succeeds, but response is lost due to NN crash, next RPC from retry will fail. Linking the Hadoop issue.
          Hide
          Andrew Purtell added a comment -

          On region server, we need to keep a map from row key to set of nounce's, per table, which reflect client requests the server has seen.

          Don't see the need for row key or per table state. Something like (client-address, nonce, timestamp) should be sufficient, right? Entry would be unique enough. Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException.

          As for minimizing heap, expired entries could be lazily collected during add or remove of new entries or reaped by a chore.

          Show
          Andrew Purtell added a comment - On region server, we need to keep a map from row key to set of nounce's, per table, which reflect client requests the server has seen. Don't see the need for row key or per table state. Something like (client-address, nonce, timestamp) should be sufficient, right? Entry would be unique enough. Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. As for minimizing heap, expired entries could be lazily collected during add or remove of new entries or reaped by a chore.
          Hide
          Ted Yu added a comment -

          (client-address, nonce, timestamp) should be sufficient, right?

          We haven't discussed how the nonce would be generated on client-side. Is it possible that same client generates the same nonce for different rows ? We should tell user not to do that.

          remove it when finished or failed

          When operation fails, do we need to keep the entry a bit longer ? Just in case client retries (and the retry succeeds).

          I agree with the strategy for expiring entries.

          Show
          Ted Yu added a comment - (client-address, nonce, timestamp) should be sufficient, right? We haven't discussed how the nonce would be generated on client-side. Is it possible that same client generates the same nonce for different rows ? We should tell user not to do that. remove it when finished or failed When operation fails, do we need to keep the entry a bit longer ? Just in case client retries (and the retry succeeds). I agree with the strategy for expiring entries.
          Hide
          stack added a comment -

          I like this idea.

          Show
          stack added a comment - I like this idea.
          Hide
          Andrew Purtell added a comment -

          We haven't discussed how the nonce would be generated on client-side.

          True, but I did say internally generated by the HBase client. I was thinking hash(IP address, row, timestamp).

          When operation fails, do we need to keep the entry a bit longer ? Just in case client retries (and the retry succeeds).

          No, because a retry is a new op, so a new nonce.

          Show
          Andrew Purtell added a comment - We haven't discussed how the nonce would be generated on client-side. True, but I did say internally generated by the HBase client. I was thinking hash(IP address, row, timestamp). When operation fails, do we need to keep the entry a bit longer ? Just in case client retries (and the retry succeeds). No, because a retry is a new op, so a new nonce.
          Hide
          Ted Yu added a comment -

          I was thinking hash(IP address, row, timestamp).

          Should table name be included in the hash ?

          Show
          Ted Yu added a comment - I was thinking hash(IP address, row, timestamp). Should table name be included in the hash ?
          Hide
          Anoop Sam John added a comment -

          HBASE-6390 is the same issue.

          Show
          Anoop Sam John added a comment - HBASE-6390 is the same issue.
          Hide
          Andrew Purtell added a comment -

          Closed HBASE-6390, older one wins.

          Show
          Andrew Purtell added a comment - Closed HBASE-6390 , older one wins.
          Hide
          Anoop Sam John added a comment -

          Andrew Purtell
          I am trying to understand the idea and try out implementing.

          Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException.

          Yes a retry from client side while in progress can be DNREed. What if the operation at server just completed and removed the entry and then a retry request comes from client. [Client has not received the response yet]
          Ted Yu

          Should table name be included in the hash ?

          Depends on what layer we keep nonce info. If it is at the HRS level we might need table name also I think.

          Show
          Anoop Sam John added a comment - Andrew Purtell I am trying to understand the idea and try out implementing. Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Yes a retry from client side while in progress can be DNREed. What if the operation at server just completed and removed the entry and then a retry request comes from client. [Client has not received the response yet] Ted Yu Should table name be included in the hash ? Depends on what layer we keep nonce info. If it is at the HRS level we might need table name also I think.
          Hide
          Enis Soztutar added a comment -

          BTW, having an in-memory windowing view of latest nonces only solves most of the issues. But imagine the case, where client does increment, it is processed successfully in region, appended to WAL, but server failed before returning response. Client retries on the new server, which does not know about this nonce, so happily accepts the duplicate increment. To do true idempotency, we should append the nonce to the WALEdit.

          Show
          Enis Soztutar added a comment - BTW, having an in-memory windowing view of latest nonces only solves most of the issues. But imagine the case, where client does increment, it is processed successfully in region, appended to WAL, but server failed before returning response. Client retries on the new server, which does not know about this nonce, so happily accepts the duplicate increment. To do true idempotency, we should append the nonce to the WALEdit.
          Hide
          Anoop Sam John added a comment -

          Enis Soztutar Yes you bring out an extreme case. We can decide upto what level we will make this idempotent.

          Show
          Anoop Sam John added a comment - Enis Soztutar Yes you bring out an extreme case. We can decide upto what level we will make this idempotent.
          Hide
          Anoop Sam John added a comment -

          To do true idempotency, we should append the nonce to the WALEdit.

          We should append the nonce to the WALEdit as well. (+ the in memory tracking) correct?

          Show
          Anoop Sam John added a comment - To do true idempotency, we should append the nonce to the WALEdit. We should append the nonce to the WALEdit as well. (+ the in memory tracking) correct?
          Hide
          Ted Yu added a comment -

          but server failed before returning response. Client retries on the new server

          HBaseClient should generate a new nonce when request is sent to new server.

          Show
          Ted Yu added a comment - but server failed before returning response. Client retries on the new server HBaseClient should generate a new nonce when request is sent to new server.
          Hide
          Andrew Purtell added a comment - - edited

          I think the above comments all taken together are a reasonable thing to try:

          • Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones.
          • nonce = hash(client address, table, row, timestamp)
          • HBaseClient should generate a new nonce whenever a new op is sent to new server. Reuse the nonce for any retry.
          • Server tracks nonces by (client address, nonce, timestamp). Expire entries after some grace period. Restart the expiration timer whenever the nonce is checked as part of op processing. Lazily clean up expired entries either as part of add/remove or via a chore.
          • Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side.
          • We should append the nonce to the WALEdit, and recover them along with the entry data.
          Show
          Andrew Purtell added a comment - - edited I think the above comments all taken together are a reasonable thing to try: Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. nonce = hash(client address, table, row, timestamp) HBaseClient should generate a new nonce whenever a new op is sent to new server. Reuse the nonce for any retry. Server tracks nonces by (client address, nonce, timestamp). Expire entries after some grace period. Restart the expiration timer whenever the nonce is checked as part of op processing. Lazily clean up expired entries either as part of add/remove or via a chore. Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. We should append the nonce to the WALEdit, and recover them along with the entry data.
          Hide
          Ted Yu added a comment -

          We should append the nonce to the WALEdit, and recover them along with the entry data.

          Is the above needed ?

          Show
          Ted Yu added a comment - We should append the nonce to the WALEdit, and recover them along with the entry data. Is the above needed ?
          Hide
          Andrew Purtell added a comment -

          Is the above needed ?

          I think Enis is right. A server accepts an op, it goes down mid flight, another takes over and is processing WAL entries, the client retries and is relocated to the new server, without having a nonce the increment would be accepted twice.

          Show
          Andrew Purtell added a comment - Is the above needed ? I think Enis is right. A server accepts an op, it goes down mid flight, another takes over and is processing WAL entries, the client retries and is relocated to the new server, without having a nonce the increment would be accepted twice.
          Hide
          Ted Yu added a comment -

          without having a nonce the increment would be accepted twice

          But there is this assumption:

          HBaseClient should generate a new nonce whenever a new op is sent to new server

          Show
          Ted Yu added a comment - without having a nonce the increment would be accepted twice But there is this assumption: HBaseClient should generate a new nonce whenever a new op is sent to new server
          Hide
          Andrew Purtell added a comment -

          Good point Ted. So then the client should not retry an increment or append (or other nonidempotent op) if it has been relocated. See LarsH's comment at the top of this issue, sorry I missed it. And follows the rest of your comment is valid too.

          Show
          Andrew Purtell added a comment - Good point Ted. So then the client should not retry an increment or append (or other nonidempotent op) if it has been relocated. See LarsH's comment at the top of this issue, sorry I missed it. And follows the rest of your comment is valid too.
          Hide
          Ted Yu added a comment -

          If client retries on region move, that would allow skipping the append of nonce to the WALEdit.

          I think that would reduce the complexity of the implementation.

          Show
          Ted Yu added a comment - If client retries on region move, that would allow skipping the append of nonce to the WALEdit. I think that would reduce the complexity of the implementation.
          Hide
          Andrew Purtell added a comment -

          What we really need as a different model for interaction. A bidirectional event stream between clients and servers. Clients issue requests. Servers (any server) acknowledges completion. Implies an async client.

          In the absence of that we can at least give the client an indication the op has been processed even through a retry as long as the region doesn't move. (Add to my OperationInProgressException also OperationAlreadyCompletedException.)

          If the region relocates, then we expose some uncertainty to the application by failing any additional retries. This will be less surprising than current behavior because we won't have silent application of the same op more than once, but punts to the app which isn't great either.

          Show
          Andrew Purtell added a comment - What we really need as a different model for interaction. A bidirectional event stream between clients and servers. Clients issue requests. Servers (any server) acknowledges completion. Implies an async client. In the absence of that we can at least give the client an indication the op has been processed even through a retry as long as the region doesn't move. (Add to my OperationInProgressException also OperationAlreadyCompletedException.) If the region relocates, then we expose some uncertainty to the application by failing any additional retries. This will be less surprising than current behavior because we won't have silent application of the same op more than once, but punts to the app which isn't great either.
          Hide
          Andrew Purtell added a comment - - edited

          Ted Yu From the client's point of view, it is still retrying the op even though the server handling the region has changed. So

          • HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry.

          Therefore if nonces are persisted to the WAL and recovered from it, the server will still do the right thing. Your concern is implementation complexity on the server. I think it is valid, but do you think this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? Would the app know if the op applied or not?

          Show
          Andrew Purtell added a comment - - edited Ted Yu From the client's point of view, it is still retrying the op even though the server handling the region has changed. So HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Therefore if nonces are persisted to the WAL and recovered from it, the server will still do the right thing. Your concern is implementation complexity on the server. I think it is valid, but do you think this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? Would the app know if the op applied or not?
          Hide
          Ted Yu added a comment -

          For statement #1:

          HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry.

          Agreed.

          this outweighs the application level uncertainty that would happen if a request fails because of a region relocation?

          I think statement #1 already achieves what persistence to WAL would achieve.

          Would the app know if the op applied or not?

          The app would know when the response for operation is not IOException.

          Thanks

          Show
          Ted Yu added a comment - For statement #1: HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Agreed. this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? I think statement #1 already achieves what persistence to WAL would achieve. Would the app know if the op applied or not? The app would know when the response for operation is not IOException. Thanks
          Hide
          Andrew Purtell added a comment -

          How does the client know if the op failed before or after it was persisted to the WAL without a way to check?

          Show
          Andrew Purtell added a comment - How does the client know if the op failed before or after it was persisted to the WAL without a way to check?
          Hide
          Ted Yu added a comment -

          clarification:

          HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry.

          Here the nonce is reused when retrying against new region server, right ?

          If so, we're on the same page - WALEdit needs to accommodate nonce.

          Show
          Ted Yu added a comment - clarification: HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Here the nonce is reused when retrying against new region server, right ? If so, we're on the same page - WALEdit needs to accommodate nonce.
          Hide
          Ted Yu added a comment -

          Maybe we can create subtasks for this JIRA, beginning with RPC and WALEdit changes.

          That way, 0.96 RC0 doesn't have to wait for the full implementation.

          Show
          Ted Yu added a comment - Maybe we can create subtasks for this JIRA, beginning with RPC and WALEdit changes. That way, 0.96 RC0 doesn't have to wait for the full implementation.
          Hide
          Andrew Purtell added a comment -

          Maybe we can create subtasks for this JIRA, beginning with RPC and WALEdit changes.

          IMO, those changes don't make sense independent of each other.

          Who is doing the implementation? Are you volunteering?

          Show
          Andrew Purtell added a comment - Maybe we can create subtasks for this JIRA, beginning with RPC and WALEdit changes. IMO, those changes don't make sense independent of each other. Who is doing the implementation? Are you volunteering?
          Hide
          Ted Yu added a comment -

          I am currently looking into test failures first reported by Jon in HBASE-7290.

          I don't know how long this will take.

          Show
          Ted Yu added a comment - I am currently looking into test failures first reported by Jon in HBASE-7290 . I don't know how long this will take.
          Hide
          Anoop Sam John added a comment -

          Andrew Purtell I will try with this on weekend.. Let me see how it go about.

          Show
          Anoop Sam John added a comment - Andrew Purtell I will try with this on weekend.. Let me see how it go about.
          Hide
          Lars Hofhansl added a comment -

          The discussions here all imply an RPC and WAL protocol/format change, so they are 0.96 only.
          Can we do something simpler in meantime for 0.94. For example:

          1. never retry an Increment/Append
          2. retry only if we get NSRE, to ride over a region-move/split

          ?

          Show
          Lars Hofhansl added a comment - The discussions here all imply an RPC and WAL protocol/format change, so they are 0.96 only. Can we do something simpler in meantime for 0.94. For example: never retry an Increment/Append retry only if we get NSRE, to ride over a region-move/split ?
          Hide
          Anoop Sam John added a comment -

          +1 on above suggestion.

          Show
          Anoop Sam John added a comment - +1 on above suggestion.
          Hide
          Anoop Sam John added a comment -

          Some questions on how it will behave
          1. When we get a request from client(retry) and old request is in progress, if we throw exception how the user will know whether the opeartion was success or not?
          2. When we get a request from client(retry) and old request is already completed and operation was successful, I think we need to make that request a successful one. Ultimately the operation was successful and the client has no knowledge abt that.
          3. When we get a request from client(retry) and old request is already completed and operation failed, we can go ahead with this retry request? Failure would have been because of some thing like WAL write failure etc which can be retried any way. So this way there is no need to keep the nonce for grace period when the operation failed.
          When it was success we need to keep for a grace period and then expire.

          Show
          Anoop Sam John added a comment - Some questions on how it will behave 1. When we get a request from client(retry) and old request is in progress, if we throw exception how the user will know whether the opeartion was success or not? 2. When we get a request from client(retry) and old request is already completed and operation was successful, I think we need to make that request a successful one. Ultimately the operation was successful and the client has no knowledge abt that. 3. When we get a request from client(retry) and old request is already completed and operation failed, we can go ahead with this retry request? Failure would have been because of some thing like WAL write failure etc which can be retried any way. So this way there is no need to keep the nonce for grace period when the operation failed. When it was success we need to keep for a grace period and then expire.
          Hide
          Andrew Purtell added a comment -

          Hey, Anoop, great!

          When we get a request from client(retry) and old request is in progress, if we throw exception how the user will know whether the operation was success or not?

          I suggest something like OperationInProgressException and OperationAlreadyCompletedException. The latter at least should inherit from DoNotRetryException. In lieu of a different protocol for client<->server communication.

          When we get a request from client(retry) and old request is already completed and operation was successful, I think we need to make that request a successful one. Ultimately the operation was successful and the client has no knowledge abt that.

          See above. The client will have enough knowledge to return success to the application.

          Show
          Andrew Purtell added a comment - Hey, Anoop, great! When we get a request from client(retry) and old request is in progress, if we throw exception how the user will know whether the operation was success or not? I suggest something like OperationInProgressException and OperationAlreadyCompletedException. The latter at least should inherit from DoNotRetryException. In lieu of a different protocol for client<->server communication. When we get a request from client(retry) and old request is already completed and operation was successful, I think we need to make that request a successful one. Ultimately the operation was successful and the client has no knowledge abt that. See above. The client will have enough knowledge to return success to the application.
          Hide
          Andrew Purtell added a comment -

          Can we do something simpler in meantime for 0.94. For example:
          1. never retry an Increment/Append
          2. retry only if we get NSRE, to ride over a region-move/split

          Sounds good: in 0.94, never retry a nonidempotent operation except in the case of NSRE.

          Show
          Andrew Purtell added a comment - Can we do something simpler in meantime for 0.94. For example: 1. never retry an Increment/Append 2. retry only if we get NSRE, to ride over a region-move/split Sounds good: in 0.94, never retry a nonidempotent operation except in the case of NSRE.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Anoop
          Can you explain why you feel client does not know that the operation is successful in #2.
          May be you had some scenario in mind?
          Is it like before the expiry time the client retries without knowing that the previous operation was successful?
          The usage of nonce should solve the problem right.

          Show
          ramkrishna.s.vasudevan added a comment - @Anoop Can you explain why you feel client does not know that the operation is successful in #2. May be you had some scenario in mind? Is it like before the expiry time the client retries without knowing that the previous operation was successful? The usage of nonce should solve the problem right.
          Hide
          Enis Soztutar added a comment -

          +1 to doing it with nonce + WAL change + temp window for nonces to the region in trunk. One question remains is that should we expose nonces to the client, and whether we accept custom defined nonces. The use case is smt like this:

          • Client picks up events from a reliable event queue, processes it and increments the counter. After the increment, we release the event, so that the queue will not schedule this again.
          • In case, the client fails after increment, but before releasing the event, the incr will happen, but the event queue will give time event to some other client after it detects that the client failed. Then for the same event, we will do another increment.

          For the above scenario, HBase is not at fault, but without allowing the client to provide the nonce, there is no easy solution to this.

          For 0.94, let's do not retry approach. Although this still won't solve the problem completely.

          Show
          Enis Soztutar added a comment - +1 to doing it with nonce + WAL change + temp window for nonces to the region in trunk. One question remains is that should we expose nonces to the client, and whether we accept custom defined nonces. The use case is smt like this: Client picks up events from a reliable event queue, processes it and increments the counter. After the increment, we release the event, so that the queue will not schedule this again. In case, the client fails after increment, but before releasing the event, the incr will happen, but the event queue will give time event to some other client after it detects that the client failed. Then for the same event, we will do another increment. For the above scenario, HBase is not at fault, but without allowing the client to provide the nonce, there is no easy solution to this. For 0.94, let's do not retry approach. Although this still won't solve the problem completely.
          Hide
          Anoop Sam John added a comment -

          Can you explain why you feel client does not know that the operation is successful in #2.

          You can see the incrementColumnValue operation returns the new value of the column and increment() returns a Result. So in this case how we can do this? The 1st trial was timed out at client side and retry request is generated. At server then 1st trial become successful and we wont allow the second operation to happen but throw a OperationAlreadyCompletedException. Based on the exception this time reached at client it can know the operation status but can not know the result which it need to return to app. This was my doubting point. Am I making it clear now.

          Show
          Anoop Sam John added a comment - Can you explain why you feel client does not know that the operation is successful in #2. You can see the incrementColumnValue operation returns the new value of the column and increment() returns a Result. So in this case how we can do this? The 1st trial was timed out at client side and retry request is generated. At server then 1st trial become successful and we wont allow the second operation to happen but throw a OperationAlreadyCompletedException. Based on the exception this time reached at client it can know the operation status but can not know the result which it need to return to app. This was my doubting point. Am I making it clear now.
          Hide
          Ted Yu added a comment -

          I think this implies that OperationAlreadyCompletedException needs to carry the result of the completed operation.

          Show
          Ted Yu added a comment - I think this implies that OperationAlreadyCompletedException needs to carry the result of the completed operation.
          Hide
          Anoop Sam John added a comment -

          Another option would be to make the 2nd attempt a success and just return the Result, without doing the operation.

          Also wrt OperationInProgress, we can throw the Exception but what the client need to do.(HBase client not the app). Still it is not sure abt the status of the operation , whether it will be success or failure.

          The whole point here coming is how to query, from client, the result of an old operation happened in server.

          Show
          Anoop Sam John added a comment - Another option would be to make the 2nd attempt a success and just return the Result, without doing the operation. Also wrt OperationInProgress, we can throw the Exception but what the client need to do.(HBase client not the app). Still it is not sure abt the status of the operation , whether it will be success or failure. The whole point here coming is how to query, from client, the result of an old operation happened in server.
          Hide
          Enis Soztutar added a comment -

          The whole point here coming is how to query, from client, the result of an old operation happened in server.

          That is why I advocate above whether we should make nonces visible to the client, and enable accepting client-defined nonces.

          Show
          Enis Soztutar added a comment - The whole point here coming is how to query, from client, the result of an old operation happened in server. That is why I advocate above whether we should make nonces visible to the client, and enable accepting client-defined nonces.
          Hide
          Ted Yu added a comment -

          enable accepting client-defined nonces

          +1 on above.

          We're expecting C++ based client(s) to be developed, right ?

          Show
          Ted Yu added a comment - enable accepting client-defined nonces +1 on above. We're expecting C++ based client(s) to be developed, right ?
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Hide
          Anoop Sam John added a comment -

          That is why I advocate above whether we should make nonces visible to the client, and enable accepting client-defined nonces.

          You mean the client app right?

          The whole point here coming is how to query, from client, the result of an old operation happened in server.

          For this what do you say? Expose APIs? I got your point with the events queue use case.

          Show
          Anoop Sam John added a comment - That is why I advocate above whether we should make nonces visible to the client, and enable accepting client-defined nonces. You mean the client app right? The whole point here coming is how to query, from client, the result of an old operation happened in server. For this what do you say? Expose APIs? I got your point with the events queue use case.
          stack made changes -
          Fix Version/s 0.95.1 [ 12324288 ]
          Fix Version/s 0.95.0 [ 12324094 ]
          Hide
          Sergey Shelukhin added a comment -

          One point: assuming client doesn't retry across complete client failures (where client memory is lost), can we make the default nonce the atomically incrementing number?
          Fuzzy idea; client sends this int.
          The server structure is - lower bound of operations per client that succeeded; list of operations that succeeded after that after some gaps in the sequence (or last nonce received from client + list of operations that failed and could be retries, and those that are in progress).
          The lower bound moves up as operations succeed. Gaps are recycled after some time by a chore. WAL has the ID of the nonce + client ID, so that the structure could be restored after restarts.
          Client has to change ID on restarts.

          Show
          Sergey Shelukhin added a comment - One point: assuming client doesn't retry across complete client failures (where client memory is lost), can we make the default nonce the atomically incrementing number? Fuzzy idea; client sends this int. The server structure is - lower bound of operations per client that succeeded; list of operations that succeeded after that after some gaps in the sequence (or last nonce received from client + list of operations that failed and could be retries, and those that are in progress). The lower bound moves up as operations succeed. Gaps are recycled after some time by a chore. WAL has the ID of the nonce + client ID, so that the structure could be restored after restarts. Client has to change ID on restarts.
          Hide
          Sergey Shelukhin added a comment -

          Other comments after discussion:
          1) Note that this scheme has different behavior in case of nonces collected by timeout chore - the hash one will make nonce available again, whereas this one will make nonce unavailable (which is probably better, especially when there's a possibility of very delayed "retry", such as during replication).
          2) Storage of gaps could be optimized by storing number ranges [from, to) instead of individual ops.
          3) With regard to what result to return - it depends a lot on whether we think "atomic return value" is important. I.e. should increment strictly return the incremented value, like AtomicLong, or is it allowed to return some value in future? Ditto for append. I'd argue that for the former case it will be an overkill to store return values of all append/increment operations in memory for minutes, so the server should respond something to the effect of "the operation succeeded, but I don't know what the result is". In the latter case the current value can be returned, so only "get" part of increment/append is done. Perhaps the operation should have a flag that determined which semantic to use.

          Show
          Sergey Shelukhin added a comment - Other comments after discussion: 1) Note that this scheme has different behavior in case of nonces collected by timeout chore - the hash one will make nonce available again, whereas this one will make nonce unavailable (which is probably better, especially when there's a possibility of very delayed "retry", such as during replication). 2) Storage of gaps could be optimized by storing number ranges [from, to) instead of individual ops. 3) With regard to what result to return - it depends a lot on whether we think "atomic return value" is important. I.e. should increment strictly return the incremented value, like AtomicLong, or is it allowed to return some value in future? Ditto for append. I'd argue that for the former case it will be an overkill to store return values of all append/increment operations in memory for minutes, so the server should respond something to the effect of "the operation succeeded, but I don't know what the result is". In the latter case the current value can be returned, so only "get" part of increment/append is done. Perhaps the operation should have a flag that determined which semantic to use.
          Hide
          Enis Soztutar added a comment -

          @Anoop,

          You mean the client app right?

          Yes, the client app.

          For this what do you say? Expose APIs? I got your point with the events queue use case.

          The client will be able to pass a custom nonce to the Mutate object.

          so the server should respond something to the effect of "the operation succeeded, but I don't know what the result is"

          I don't think we should bubble up this specific case to the client app unless we have a valid use case. As long as we document the semantics, it should be good.

          Show
          Enis Soztutar added a comment - @Anoop, You mean the client app right? Yes, the client app. For this what do you say? Expose APIs? I got your point with the events queue use case. The client will be able to pass a custom nonce to the Mutate object. so the server should respond something to the effect of "the operation succeeded, but I don't know what the result is" I don't think we should bubble up this specific case to the client app unless we have a valid use case. As long as we document the semantics, it should be good.
          Hide
          Sergey Shelukhin added a comment -

          I don't think we should bubble up this specific case to the client app unless we have a valid use case. As long as we document the semantics, it should be good.

          Currently, the book sayeth "See Increment in HTable.", with a link to javadoc, and javadoc says: "values of columns after the increment[/append] operation", which is subject to interpretation (but I'd read it as "after the operation and before any other operation").
          Granted, I cannot come up with non-stretch use case for AtomicLong-like return value semantics in HBase.
          I wonder what other people here think.
          Implementation would be simpler with no guarantees...

          Show
          Sergey Shelukhin added a comment - I don't think we should bubble up this specific case to the client app unless we have a valid use case. As long as we document the semantics, it should be good. Currently, the book sayeth "See Increment in HTable.", with a link to javadoc, and javadoc says: "values of columns after the increment [/append] operation", which is subject to interpretation (but I'd read it as "after the operation and before any other operation"). Granted, I cannot come up with non-stretch use case for AtomicLong-like return value semantics in HBase. I wonder what other people here think. Implementation would be simpler with no guarantees...
          Sergey Shelukhin made changes -
          Assignee Sergey Shelukhin [ sershe ]
          Hide
          Sergey Shelukhin added a comment -

          Here's a partial patch for an example. The nonce manager on server side is not implemented but operations are clear.
          WAL replay is not there, and it's going to be tricky (for any nonce scheme). We only replay WAL that is not already in store, and nonces will not be in store, so the state of the nonce cannot be reliably restored with regular WAL replay. It is especially interesting for new remote request based faster log replay, haven't checked yet how it works. Not yet exactly sure what to do about it without incurring too much overhead.
          I don't think not retrying to different server from client is a solution, too. If we don't retry when this problem arises what is the user supposed to do, how do they know if increment happened? They will probably retry too and get into the same situation.
          I wonder if we should have some system CFs for nonces almost...

          Show
          Sergey Shelukhin added a comment - Here's a partial patch for an example. The nonce manager on server side is not implemented but operations are clear. WAL replay is not there, and it's going to be tricky (for any nonce scheme). We only replay WAL that is not already in store, and nonces will not be in store, so the state of the nonce cannot be reliably restored with regular WAL replay. It is especially interesting for new remote request based faster log replay, haven't checked yet how it works. Not yet exactly sure what to do about it without incurring too much overhead. I don't think not retrying to different server from client is a solution, too. If we don't retry when this problem arises what is the user supposed to do, how do they know if increment happened? They will probably retry too and get into the same situation. I wonder if we should have some system CFs for nonces almost...
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-partial.patch [ 12578276 ]
          Hide
          ramkrishna.s.vasudevan added a comment -

          I think if we use WALEdit here, we should store the nonce and also the incremented value.
          If the RS goes down after adding this WALEdit, we should be able to just replay the edit and use the value from this Edit. And i think before the client retries using the same nonce we should ensure if that RS to which the previous nonce was issued was down.

          If this is the case the client's retry can be ignored.

          Show
          ramkrishna.s.vasudevan added a comment - I think if we use WALEdit here, we should store the nonce and also the incremented value. If the RS goes down after adding this WALEdit, we should be able to just replay the edit and use the value from this Edit. And i think before the client retries using the same nonce we should ensure if that RS to which the previous nonce was issued was down. If this is the case the client's retry can be ignored.
          Hide
          stack added a comment -

          Patch looking good.

          Add a comment that says what the result looks like when:

          generateClientId

          happens.

          Why not just a UUID rather than all these gyrations? Or do you want to make it so that looking at id, you can tell what client it came from? It looks like you throw away all this info when you create the SecureRandom? Creating a SecureRandom for this one time use is expensive.

          Client id should be long since in proto is uint64 in proto?

          Does ClientNonceManager have to be in top-level? Can it not be in client package and be made package private?

          Does it make sense putting clientid together w/ nonce making? Could you have a class that does noncemaking and then another to hold the clientid? Is clientid tied to Connection? Can you get connectionid? Or make a connectionid? Connections are keyed by Configuration already? Would the Connection key do as a clientid?

          Would it be easier or make it so you could shut down access on ClientNonceManager by passing in the id only rather than the whole nonce when you do this:

          MutateRequest request = RequestConverter.buildMutateRequest(

          • location.getRegionInfo().getRegionName(), append);
            + location.getRegionInfo().getRegionName(), append, clientId, nonce);

          So, you decided to not pass nonce in here:

          + r = region.append(append, append.getWriteToWAL()/, clientId2, nonce/);

          I like the way this works over on the server side.

          You dup code in append and increment.

          Good stuff Sergey.

          Show
          stack added a comment - Patch looking good. Add a comment that says what the result looks like when: generateClientId happens. Why not just a UUID rather than all these gyrations? Or do you want to make it so that looking at id, you can tell what client it came from? It looks like you throw away all this info when you create the SecureRandom? Creating a SecureRandom for this one time use is expensive. Client id should be long since in proto is uint64 in proto? Does ClientNonceManager have to be in top-level? Can it not be in client package and be made package private? Does it make sense putting clientid together w/ nonce making? Could you have a class that does noncemaking and then another to hold the clientid? Is clientid tied to Connection? Can you get connectionid? Or make a connectionid? Connections are keyed by Configuration already? Would the Connection key do as a clientid? Would it be easier or make it so you could shut down access on ClientNonceManager by passing in the id only rather than the whole nonce when you do this: MutateRequest request = RequestConverter.buildMutateRequest( location.getRegionInfo().getRegionName(), append); + location.getRegionInfo().getRegionName(), append, clientId, nonce); So, you decided to not pass nonce in here: + r = region.append(append, append.getWriteToWAL()/ , clientId2, nonce /); I like the way this works over on the server side. You dup code in append and increment. Good stuff Sergey.
          Hide
          Sergey Shelukhin added a comment -

          I think if we use WALEdit here, we should store the nonce and also the incremented value.

          If the RS goes down after adding this WALEdit, we should be able to just replay the edit and use the value from this Edit. And i think before the client retries using the same nonce we should ensure if that RS to which the previous nonce was issued was down.

          If this is the case the client's retry can be ignored.

          Under current (and good) logic if edit went to the store, we will not see the WAL edit (again especially when the log replay is remote w/o recovered edits).
          For standard nonces, it would mean that nonce would not be there on new server unless we widen WAL replay significantly and recover nonces for entries that are already in store.
          For incremental nonces, we could do the same, or we could play it safe and assume everything before the lowest nonce we see at replay is already in store, but I wonder how well this will actually work.
          It may happen that because successful operations take less than failed one, any client with more than one thread (or async APIs that may exist later) would be almost guaranteed to fail all such retries (because non-failing threads may have successes after the failing thread,
          and any success with higher nonce will invalidate the lower nonce during recovery). Perhaps that approach is feasible still, at least it won't produce dups.

          Why not just a UUID rather than all these gyrations? Or do you want to make it so that looking at id, you can tell what client it came from? It looks like you throw away all this info when you create the SecureRandom? Creating a SecureRandom for this one time use is expensive.

          Interesting question... Java UUIDs are just random numbers. To keep reasonable size we could just generate secure random long then; do you think that would be sufficient?
          I wanted the result to have deterministic and random part, so that for random number collision the additional condition would be either unrelated hash collision, or generating the same secure number at exact same IP, PID, TID and time

          Client id should be long since in proto is uint64 in proto?

          It is... the generated part is actually base for client id, renamed it.

          Does ClientNonceManager have to be in top-level? Can it not be in client package and be made package private?

          It has a constant that is used all over the place. Let me see if I can put it elsewhere...

          Does it make sense putting clientid together w/ nonce making? Could you have a class that does noncemaking and then another to hold the clientid?

          Yeah, that could be done... nonce-making class then would be just wrapper around atomiclong though

          Is clientid tied to Connection?

          It is static on connectionmanager. The only reason to not have less clientIds than any given number is if it makes it hard to control the incrementing number, so there's one per process.

          Can you get connectionid? Or make a connectionid? Connections are keybqed by Configuration already? Would the Connection key do as a clientid?

          I don't think so, they can be repeated (in fact probably will be) between restarts.

          So, you decided to not pass nonce in here:

          + r = region.append(append, append.getWriteToWAL()/, clientId2, nonce/);

          This is temporary, will be uncommented to write it into WAL.

          Show
          Sergey Shelukhin added a comment - I think if we use WALEdit here, we should store the nonce and also the incremented value. If the RS goes down after adding this WALEdit, we should be able to just replay the edit and use the value from this Edit. And i think before the client retries using the same nonce we should ensure if that RS to which the previous nonce was issued was down. If this is the case the client's retry can be ignored. Under current (and good) logic if edit went to the store, we will not see the WAL edit (again especially when the log replay is remote w/o recovered edits). For standard nonces, it would mean that nonce would not be there on new server unless we widen WAL replay significantly and recover nonces for entries that are already in store. For incremental nonces, we could do the same, or we could play it safe and assume everything before the lowest nonce we see at replay is already in store, but I wonder how well this will actually work. It may happen that because successful operations take less than failed one, any client with more than one thread (or async APIs that may exist later) would be almost guaranteed to fail all such retries (because non-failing threads may have successes after the failing thread, and any success with higher nonce will invalidate the lower nonce during recovery). Perhaps that approach is feasible still, at least it won't produce dups. Why not just a UUID rather than all these gyrations? Or do you want to make it so that looking at id, you can tell what client it came from? It looks like you throw away all this info when you create the SecureRandom? Creating a SecureRandom for this one time use is expensive. Interesting question... Java UUIDs are just random numbers. To keep reasonable size we could just generate secure random long then; do you think that would be sufficient? I wanted the result to have deterministic and random part, so that for random number collision the additional condition would be either unrelated hash collision, or generating the same secure number at exact same IP, PID, TID and time Client id should be long since in proto is uint64 in proto? It is... the generated part is actually base for client id, renamed it. Does ClientNonceManager have to be in top-level? Can it not be in client package and be made package private? It has a constant that is used all over the place. Let me see if I can put it elsewhere... Does it make sense putting clientid together w/ nonce making? Could you have a class that does noncemaking and then another to hold the clientid? Yeah, that could be done... nonce-making class then would be just wrapper around atomiclong though Is clientid tied to Connection? It is static on connectionmanager. The only reason to not have less clientIds than any given number is if it makes it hard to control the incrementing number, so there's one per process. Can you get connectionid? Or make a connectionid? Connections are keybqed by Configuration already? Would the Connection key do as a clientid? I don't think so, they can be repeated (in fact probably will be) between restarts. So, you decided to not pass nonce in here: + r = region.append(append, append.getWriteToWAL()/, clientId2, nonce/); This is temporary, will be uncommented to write it into WAL.
          Sergey Shelukhin made changes -
          Link This issue depends upon HBASE-7413 [ HBASE-7413 ]
          Hide
          Sergey Shelukhin added a comment -

          Need to make WAL PB. Doesn't strictly speaking block this, because given current WALEdit readFields code, old-code - new-wal compat will be broken by almost any change (other than extremely hacky one of hiding new fields in replication scope byte array that normally has cf, and detecting that magically, because version cannot be changed at all). If we do not do the hack and break compat, might as well go to PB.

          Show
          Sergey Shelukhin added a comment - Need to make WAL PB. Doesn't strictly speaking block this, because given current WALEdit readFields code, old-code - new-wal compat will be broken by almost any change (other than extremely hacky one of hiding new fields in replication scope byte array that normally has cf, and detecting that magically, because version cannot be changed at all). If we do not do the hack and break compat, might as well go to PB.
          Hide
          Ted Yu added a comment -

          Need to make WAL PB

          +1.

          Show
          Ted Yu added a comment - Need to make WAL PB +1.
          Hide
          Anoop Sam John added a comment -

          How the exception like OperationConflictException will be handled at client side?
          Have we concluded what we will do at different situations (like op in progress already at RS, op already over at RS).. Sorry I am not able to find any such conclusion comments above.

          Show
          Anoop Sam John added a comment - How the exception like OperationConflictException will be handled at client side? Have we concluded what we will do at different situations (like op in progress already at RS, op already over at RS).. Sorry I am not able to find any such conclusion comments above.
          Hide
          Sergey Shelukhin added a comment -

          That is an interesting question. Ideally we want to be able to respond for sure if operation completed successfully or not, to make the client sure he can (or cannot) retry; but this is not achievable unless we store all nonce-s and also recover them between restarts.
          So there will be some ambiguity.
          It also depends, as mentioned above, on whether client needs the answer to increment or append that is exactly after the op or at any time after the op.
          So there can be multiple outcomes when client sends retry:
          0. Nonce is marked as last attempt failed, or doesn't exist on server - normal retry.
          1. Nonce has been recycled due to age, or operation succeeded:
          a) Client needs "consistent" answer - exception that operation succeeded but no answer can be provided, no retry.
          b) Client doesn't need "consistent" answer - request does a "get" and client receives the answer.
          2. Nonce has not been seen during latest recovery but is before latest nonce seen - unknown result, retry at client's discretion.
          By making WAL recovery include more records, or some separate processing for nonces inside WAL records already in store, we can alleviate that.
          3. Nonce is marked as operation in progress. Wait on some sort of waiting construct attached, then 0 or 1.

          Show
          Sergey Shelukhin added a comment - That is an interesting question. Ideally we want to be able to respond for sure if operation completed successfully or not, to make the client sure he can (or cannot) retry; but this is not achievable unless we store all nonce-s and also recover them between restarts. So there will be some ambiguity. It also depends, as mentioned above, on whether client needs the answer to increment or append that is exactly after the op or at any time after the op. So there can be multiple outcomes when client sends retry: 0. Nonce is marked as last attempt failed, or doesn't exist on server - normal retry. 1. Nonce has been recycled due to age, or operation succeeded: a) Client needs "consistent" answer - exception that operation succeeded but no answer can be provided, no retry. b) Client doesn't need "consistent" answer - request does a "get" and client receives the answer. 2. Nonce has not been seen during latest recovery but is before latest nonce seen - unknown result, retry at client's discretion. By making WAL recovery include more records, or some separate processing for nonces inside WAL records already in store, we can alleviate that. 3. Nonce is marked as operation in progress. Wait on some sort of waiting construct attached, then 0 or 1.
          Hide
          Sergey Shelukhin added a comment -

          With purely hash nonces, case 1 "recycled due to age" part, as well as case 2, become "there's duplicate operation", otherwise it's more or less unchanged.

          Show
          Sergey Shelukhin added a comment - With purely hash nonces, case 1 "recycled due to age" part, as well as case 2, become "there's duplicate operation", otherwise it's more or less unchanged.
          Hide
          Anoop Sam John added a comment -

          b) Client doesn't need "consistent" answer - request does a "get" and client receives the answer.

          This should be fine I guess.

          Show
          Anoop Sam John added a comment - b) Client doesn't need "consistent" answer - request does a "get" and client receives the answer. This should be fine I guess.
          Hide
          Sergey Shelukhin added a comment - - edited

          Well, I've spent today writing some code, and then we discussed it with Enis Soztutar. There is a huge number of special cases all over the place...

          First, on which nonces to use. Unfortunately my proposed approach is not as simple as it seems due to special cases
          In case of hash nonces:

          • you have an epic hashmap of all nonces over the past expiration interval (say, one hour)
          • on common successful operation path, for every op you do a putIfAbsent into this map, and later take an uncontested lock to check for waiting conflicts
          • cleanup iterates the map and removes expired entries
          • code is simple
          • retries over expiration time will result in duplicate operations
          • increasing expiration time is limited by total (map) size
            In case of sequential nonces:
          • you have a small map by client, some smaller nonce structures inside; easier to cleanup so probably smaller/much smaller in total
          • however, in common path, you do small hashmap lookup, one smaller structure (say another hashmap) putIfAbsent, take the same uncontested lock as in case 1, and do 2 interlocked ops
          • cleanup is still very simple
          • however, to make cleanup and main path simple, the code for starting/ending nonce operation has to have a lot of interlocked/volatile/etc. cleverness which will almost never execute, to handle special cases
          • retries over expiration time #1 will result in rejection - no dup, /importance of which depends on WAL design below/
          • increasing expiration time #1 is limited by total size
          • retries over expiration time #2 will result in duplicate change
          • increasing expiration time #2 is only limited by number of clients (not nonces), so it's a much larger margin of safety (which may or may not be worth it)
            Finally, to make full use of range collapsing as suggested above, you need to factor region into clientId on client (not on server, for server clientId is opaque), so that the sequence of numbers following each other goes into the same region.
            That will allow one to use structure better than hashmap for nonces above, reduce memory footprint a lot, and increase the time for nonce expiration.
          TL;DR1 I am not sure the complexity of the sequential nonces is worth it, at least for the first cut.

          Then; there are some easy-to-handle special cases like splits and merges.

          Main problem for any case is WAL recovery. First, we will have to read nonces from entire WAL, not just the records we recover, because otherwise nonces won't work for records that got flushed (we don't recover WAL below some watermark, for records are already in store).
          Second, even if we do read records for entire WAL, WAL can go away very quickly after all records make it to FS, so we won't have nonces from it, at all.
          One option is to have is to keep WAL around for nonce recovery period.
          Alternatively, we can have separate additional "log" file for nonces. It will just contain bunch of numbers. Flushing it will not be on main path - because WAL itself also needs to contain nonces (for replication at least), we can flush the nonce log only before memstore flush.
          So when we recover, for a given nonce we will either see it in the WAL (if it was never flushed to disk, it will be part of recovery), or we'll see it as part of the nonce log (or both occasionally, which doesn't matter).
          Nonce log can be rolled independently and nuked after a file is at least expiration-time old.

          A radically different solution that Enis Soztutar proposed is to output increments and appends as separate markers (like delete markers), containing nonces as Cell tags or shadow columns, and coalesce them on reads, and during compactions after some time.
          When we coalesce them to get final value we will throw away the extra ones.
          This way we get rid of all the above complexity because the nonce management is just part of normal KV management.
          However, we may introduce a lot of other special case around out of order puts/deletes, number of versions to keep (increments/appends will need special accounting to keep version semantics). Plus coalescing the value from some often-incremented field may be expensive.
          It will also allow us to support out-of-order increments and appends! Just kidding.

          TL;DR2 My current plan will be as such. I will wait until Mon-Tue for feedback, doing other things (or until enough feedback accumulates ). Then, I will stash my sequential nonces code, and do the simplest hash nonces patch possible, including sending a summary of nonces to server during WAL recovery from whatever WAL we are currently reading, including below watermark, without actually replaying the KVs. It will not be bulletproof, but a first step if there are no objections
          Show
          Sergey Shelukhin added a comment - - edited Well, I've spent today writing some code, and then we discussed it with Enis Soztutar . There is a huge number of special cases all over the place... First, on which nonces to use. Unfortunately my proposed approach is not as simple as it seems due to special cases In case of hash nonces: you have an epic hashmap of all nonces over the past expiration interval (say, one hour) on common successful operation path, for every op you do a putIfAbsent into this map, and later take an uncontested lock to check for waiting conflicts cleanup iterates the map and removes expired entries code is simple retries over expiration time will result in duplicate operations increasing expiration time is limited by total (map) size In case of sequential nonces: you have a small map by client, some smaller nonce structures inside; easier to cleanup so probably smaller/much smaller in total however, in common path, you do small hashmap lookup, one smaller structure (say another hashmap) putIfAbsent, take the same uncontested lock as in case 1, and do 2 interlocked ops cleanup is still very simple however, to make cleanup and main path simple, the code for starting/ending nonce operation has to have a lot of interlocked/volatile/etc. cleverness which will almost never execute, to handle special cases retries over expiration time #1 will result in rejection - no dup, /importance of which depends on WAL design below/ increasing expiration time #1 is limited by total size retries over expiration time #2 will result in duplicate change increasing expiration time #2 is only limited by number of clients (not nonces), so it's a much larger margin of safety (which may or may not be worth it) Finally, to make full use of range collapsing as suggested above, you need to factor region into clientId on client (not on server, for server clientId is opaque), so that the sequence of numbers following each other goes into the same region. That will allow one to use structure better than hashmap for nonces above, reduce memory footprint a lot, and increase the time for nonce expiration. TL;DR1 I am not sure the complexity of the sequential nonces is worth it, at least for the first cut. Then; there are some easy-to-handle special cases like splits and merges. Main problem for any case is WAL recovery. First, we will have to read nonces from entire WAL, not just the records we recover, because otherwise nonces won't work for records that got flushed (we don't recover WAL below some watermark, for records are already in store). Second, even if we do read records for entire WAL, WAL can go away very quickly after all records make it to FS, so we won't have nonces from it, at all. One option is to have is to keep WAL around for nonce recovery period. Alternatively, we can have separate additional "log" file for nonces. It will just contain bunch of numbers. Flushing it will not be on main path - because WAL itself also needs to contain nonces (for replication at least), we can flush the nonce log only before memstore flush. So when we recover, for a given nonce we will either see it in the WAL (if it was never flushed to disk, it will be part of recovery), or we'll see it as part of the nonce log (or both occasionally, which doesn't matter). Nonce log can be rolled independently and nuked after a file is at least expiration-time old. A radically different solution that Enis Soztutar proposed is to output increments and appends as separate markers (like delete markers), containing nonces as Cell tags or shadow columns, and coalesce them on reads, and during compactions after some time. When we coalesce them to get final value we will throw away the extra ones. This way we get rid of all the above complexity because the nonce management is just part of normal KV management. However, we may introduce a lot of other special case around out of order puts/deletes, number of versions to keep (increments/appends will need special accounting to keep version semantics). Plus coalescing the value from some often-incremented field may be expensive. It will also allow us to support out-of-order increments and appends! Just kidding. TL;DR2 My current plan will be as such. I will wait until Mon-Tue for feedback, doing other things (or until enough feedback accumulates ). Then, I will stash my sequential nonces code, and do the simplest hash nonces patch possible, including sending a summary of nonces to server during WAL recovery from whatever WAL we are currently reading, including below watermark, without actually replaying the KVs. It will not be bulletproof, but a first step if there are no objections
          Hide
          Devaraj Das added a comment -

          Hmm.... I wonder how much mileage will we get by just doing a simple implementation of handling the following exceptions:

          ConnectException, NoRouteToHostException, UnknownHostException

          (similar to how Hadoop handles idempotent operations; look at RetryPolicies.FailoverOnNetworkExceptionRetry.shouldRetry in http://bit.ly/10mc4PG)

          In HDFS, methods are annotated with "@Idempotent" annotations, and client retries the operation when the above exceptions are encountered, and for other exceptions like RemoteException, the retry is conditional on the method being idempotent. In HBase, if we know that only a few methods are non-idempotent, we can further simplify the implementation by doing custom checks for only those methods (as opposed to introducing annotations).

          Maybe, this has been discussed before, but I have a feeling that this will solve the vast majority of the problem cases, without introducing a lot of complexity in the codebase.

          Show
          Devaraj Das added a comment - Hmm.... I wonder how much mileage will we get by just doing a simple implementation of handling the following exceptions: ConnectException, NoRouteToHostException, UnknownHostException (similar to how Hadoop handles idempotent operations; look at RetryPolicies.FailoverOnNetworkExceptionRetry.shouldRetry in http://bit.ly/10mc4PG ) In HDFS, methods are annotated with "@Idempotent" annotations, and client retries the operation when the above exceptions are encountered, and for other exceptions like RemoteException, the retry is conditional on the method being idempotent. In HBase, if we know that only a few methods are non-idempotent, we can further simplify the implementation by doing custom checks for only those methods (as opposed to introducing annotations). Maybe, this has been discussed before, but I have a feeling that this will solve the vast majority of the problem cases, without introducing a lot of complexity in the codebase.
          Hide
          Sergey Shelukhin added a comment -

          Yeah, we could also just do this. This won't solve the JIRA as stated though... Maybe we can do that for 0.95 and keep this jira as major for future effort.

          Show
          Sergey Shelukhin added a comment - Yeah, we could also just do this. This won't solve the JIRA as stated though... Maybe we can do that for 0.95 and keep this jira as major for future effort.
          Hide
          Enis Soztutar added a comment -

          So, Hadoop seems to be trying to solve the problem in some basic cases by inspecting the exceptions, but in the end, if it cannot reason about whether the operation has already succeeded it will throw the exception to the user. Basically, throwing ItsNotMyProblemItsYourProblemException™.
          Depending on how critical to have correct idempotent semantics (especially for distributed counters), I would be ok with going with the basic hash, or not-retrying across-servers solution for 0.96, and implement one of the proposals above (or some other solution) on a longer timeframe.

          Show
          Enis Soztutar added a comment - So, Hadoop seems to be trying to solve the problem in some basic cases by inspecting the exceptions, but in the end, if it cannot reason about whether the operation has already succeeded it will throw the exception to the user. Basically, throwing ItsNotMyProblemItsYourProblemException™. Depending on how critical to have correct idempotent semantics (especially for distributed counters), I would be ok with going with the basic hash, or not-retrying across-servers solution for 0.96, and implement one of the proposals above (or some other solution) on a longer timeframe.
          Hide
          Devaraj Das added a comment -

          Yeah, we could also just do this. This won't solve the JIRA as stated though... Maybe we can do that for 0.95 and keep this jira as major for future effort.

          I think this will address the vast majority of cases... I'd say let's do this and see if we still see this issue.

          Show
          Devaraj Das added a comment - Yeah, we could also just do this. This won't solve the JIRA as stated though... Maybe we can do that for 0.95 and keep this jira as major for future effort. I think this will address the vast majority of cases... I'd say let's do this and see if we still see this issue.
          Hide
          Sergey Shelukhin added a comment -

          Ok, finally I got around to this
          Preliminary patch, it's more or less ready but w/o any tests, I will add some once general approach is agreed upon.
          I kept the nonce group-nonce distinction for future improvement if it were to come, however with hashes, group is never used and never serialized.
          The server-side code is forward-compatible with groups, though.

          Added it to WAL replay, and it's replayed without regard to max store seqnums, only writeTime is taken into account.
          Some changes will have to be made for improved remote-WAL-replay that is coming... some decoration on Increments and Appends to have them treated specially, so that some 2-hours-apart collision doesn't fail log replay.

          I will add tests tomorrow or later this week, feedback is appreciated.

          Show
          Sergey Shelukhin added a comment - Ok, finally I got around to this Preliminary patch, it's more or less ready but w/o any tests, I will add some once general approach is agreed upon. I kept the nonce group-nonce distinction for future improvement if it were to come, however with hashes, group is never used and never serialized. The server-side code is forward-compatible with groups, though. Added it to WAL replay, and it's replayed without regard to max store seqnums, only writeTime is taken into account. Some changes will have to be made for improved remote-WAL-replay that is coming... some decoration on Increments and Appends to have them treated specially, so that some 2-hours-apart collision doesn't fail log replay. I will add tests tomorrow or later this week, feedback is appreciated.
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v0.patch [ 12581464 ]
          Sergey Shelukhin made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Sergey Shelukhin added a comment -

          added unittests and an case in TestMultiParallel. Also made OperationConflict DoNotRetryException

          Show
          Sergey Shelukhin added a comment - added unittests and an case in TestMultiParallel. Also made OperationConflict DoNotRetryException
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v1.patch [ 12581965 ]
          Show
          Sergey Shelukhin added a comment - r at https://reviews.apache.org/r/10965/
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12581965/HBASE-3787-v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 1 zombie test(s):

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581965/HBASE-3787-v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 14 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5562//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Sergey:
          Can you give high level description for the role of nonce group ?

          Thanks

          Show
          Ted Yu added a comment - @Sergey: Can you give high level description for the role of nonce group ? Thanks
          Hide
          Sergey Shelukhin added a comment -

          Nonce group is basically a client ID, where what client is, is not very well defined. It can be used as a separate field as suggested in some comments above, or for stuff like incrementing nonces it can be the id per client, or per region (for the cutoff to work better).

          Show
          Sergey Shelukhin added a comment - Nonce group is basically a client ID, where what client is, is not very well defined. It can be used as a separate field as suggested in some comments above, or for stuff like incrementing nonces it can be the id per client, or per region (for the cutoff to work better).
          Hide
          Sergey Shelukhin added a comment -

          Address r feedback

          Show
          Sergey Shelukhin added a comment - Address r feedback
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v2.patch [ 12582127 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12582127/HBASE-3787-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582127/HBASE-3787-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 14 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5578//console This message is automatically generated.
          Hide
          Anoop Sam John added a comment -

          When the retry request finds the op is already completed, the server throws OperationConflictException and client layer reports its a op error to user? Actually the increment/append happened in the system. How user can know this?

          Show
          Anoop Sam John added a comment - When the retry request finds the op is already completed, the server throws OperationConflictException and client layer reports its a op error to user? Actually the increment/append happened in the system. How user can know this?
          Hide
          Sergey Shelukhin added a comment -

          It reports a very specific exception, which means op already completed. Do you think we should do get instead?

          Show
          Sergey Shelukhin added a comment - It reports a very specific exception, which means op already completed. Do you think we should do get instead?
          Hide
          Sergey Shelukhin added a comment -

          There are some design questions on r. Perhaps we should flesh out the design before I make any major changes.

          1) Should we add actual usage of nonceGroup/client ID?
          We can do that. Depends also on (2). I will probably change the server manager to lump nonce group and nonce into array wrapper and store these in the map,
          instead of using pair. Pair is simpler but worse, right now I only added it for forward compat.
          Map of maps is pain to clean up without tricks or epic lock, I have added that for sequential nonces but I wonder if it's worth it for simple nonces.
          Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup.

          2) Is 8 bytes enough to avoid collisions?
          The answer is "maybe". It depends on the number of requests overall in the cluster and for how long we store nonces.
          We can alleviate this by adding client ID I guess, which will make it 16 bytes, 8 unique per client and 8 random.

          3) What random should we use?
          Java uses SecureRandom to generate UUIDs. We can use some other Random, they claim to produce uniformly distributed numbers.

          4) Will too many nonces be stored?
          If we keep nonces for an hour, and do 10k increments per second per server, we will have stored 36000000 nonces on a server.
          With map overhead, 2 object overheads, 2 primitive longs and an enum value, it's probably in excess of 120 bytes per entry (without clientId). So yeah it's a lot of memory.
          Time to store nonces is configurable, though, and with default retry setting as little as 5 minutes could provide sufficient safety.
          With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is not totally horrible (especially for 10k QPS ).
          Some solutions were proposed in the r, such as storing the mutation creation time and rejecting after certain time.
          However that relies on synchronized clocks, and also doesn't solve the problem in a sense that client has no idea about the original problem - should he retry?
          What do you think?
          If you think it's realistic workload I can rework the sequential nonce patch instead, and there nonces would be collapsed. If clientId is used and incorporates the region,
          requests arriving for the same region will generally go to the same server for some time, and in sequential order so a lot can be collapsed.
          However it will add complexity.
          What do you think?

          Show
          Sergey Shelukhin added a comment - There are some design questions on r. Perhaps we should flesh out the design before I make any major changes. 1) Should we add actual usage of nonceGroup/client ID? We can do that. Depends also on (2). I will probably change the server manager to lump nonce group and nonce into array wrapper and store these in the map, instead of using pair. Pair is simpler but worse, right now I only added it for forward compat. Map of maps is pain to clean up without tricks or epic lock, I have added that for sequential nonces but I wonder if it's worth it for simple nonces. Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup. 2) Is 8 bytes enough to avoid collisions? The answer is "maybe". It depends on the number of requests overall in the cluster and for how long we store nonces. We can alleviate this by adding client ID I guess, which will make it 16 bytes, 8 unique per client and 8 random. 3) What random should we use? Java uses SecureRandom to generate UUIDs. We can use some other Random, they claim to produce uniformly distributed numbers. 4) Will too many nonces be stored? If we keep nonces for an hour, and do 10k increments per second per server, we will have stored 36000000 nonces on a server. With map overhead, 2 object overheads, 2 primitive longs and an enum value, it's probably in excess of 120 bytes per entry (without clientId). So yeah it's a lot of memory. Time to store nonces is configurable, though, and with default retry setting as little as 5 minutes could provide sufficient safety. With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is not totally horrible (especially for 10k QPS ). Some solutions were proposed in the r, such as storing the mutation creation time and rejecting after certain time. However that relies on synchronized clocks, and also doesn't solve the problem in a sense that client has no idea about the original problem - should he retry? What do you think? If you think it's realistic workload I can rework the sequential nonce patch instead, and there nonces would be collapsed. If clientId is used and incorporates the region, requests arriving for the same region will generally go to the same server for some time, and in sequential order so a lot can be collapsed. However it will add complexity. What do you think?
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v3.patch [ 12582229 ]
          Hide
          Sergey Shelukhin added a comment -

          Here's the more simple CR feedback. Depending on feedback or lack thereof I will wait, or just make a patch deciding on the above, and update it

          Show
          Sergey Shelukhin added a comment - Here's the more simple CR feedback. Depending on feedback or lack thereof I will wait, or just make a patch deciding on the above, and update it
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12582229/HBASE-3787-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestZooKeeper

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582229/HBASE-3787-v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 14 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5585//console This message is automatically generated.
          Hide
          stack added a comment -

          There are some design questions on r. Perhaps we should flesh out the design before I make any major changes.

          Sounds good. Suggest doing it up in a one page doc rather than inline in issue in a comment field because will be hard to fine otherwise.

          ...I have added that for sequential nonces but I wonder if it's worth it for simple nonces.

          What is the 'that' in above ('tricks or epic locks'?)

          Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup.

          If opaque, just uuid it? You'd still have to make a long of it.

          Aside: Reading, tripped over this comment on securerandom "....initialize SecureRandom (this may be a lengthy operation)" – the lengthy operation part.

          There is also http://docs.oracle.com/javase/6/docs/api/java/rmi/server/UID.html which would be more lightweight than uuid. Still > 64bits though.

          With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is not totally horrible (especially for 10k QPS ).

          Agree

          However that relies on synchronized clocks....

          Yeah. How bad is that though? Currently if a regionserver clock is out of sync w/ master we'll reject it (IIRC). We'd do something similar for a client that is trying to do a nonce-operation? If its mutation is outside of our nonce-keeping window (because we dropped our server-side record, we can't be sure it not a retry coming in outside of our bounds). Would have to > long GC though (smile).

          Is the attempt at sequential ids attached here?

          Good stuff Sergey.

          Show
          stack added a comment - There are some design questions on r. Perhaps we should flesh out the design before I make any major changes. Sounds good. Suggest doing it up in a one page doc rather than inline in issue in a comment field because will be hard to fine otherwise. ...I have added that for sequential nonces but I wonder if it's worth it for simple nonces. What is the 'that' in above ('tricks or epic locks'?) Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup. If opaque, just uuid it? You'd still have to make a long of it. Aside: Reading, tripped over this comment on securerandom "....initialize SecureRandom (this may be a lengthy operation)" – the lengthy operation part. There is also http://docs.oracle.com/javase/6/docs/api/java/rmi/server/UID.html which would be more lightweight than uuid. Still > 64bits though. With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is not totally horrible (especially for 10k QPS ). Agree However that relies on synchronized clocks.... Yeah. How bad is that though? Currently if a regionserver clock is out of sync w/ master we'll reject it (IIRC). We'd do something similar for a client that is trying to do a nonce-operation? If its mutation is outside of our nonce-keeping window (because we dropped our server-side record, we can't be sure it not a retry coming in outside of our bounds). Would have to > long GC though (smile). Is the attempt at sequential ids attached here? Good stuff Sergey.
          Hide
          Sergey Shelukhin added a comment -

          ...I have added that for sequential nonces but I wonder if it's worth it for simple nonces.

          What is the 'that' in above ('tricks or epic locks'?)

          If you have map, by client ID, of structures (say, containing nonce hashmap), you have to clean up parent map when the clients go away, presumably on timer.
          However what if some request arrives for the client you are about to remove and grabs it from the map, to use?
          Either global R/W lock is needed that will block requests for cleanup, or, for example, DrainBarrier in each client record, so in request code op will be started and ended (2 interlocked ops),
          and in cleanup it will "stop" client before removal. If request gets client record and fails to start op due to cleanup it will just retry getting/creating client record.

          Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup.

          If opaque, just uuid it? You'd still have to make a long of it.

          Aside: Reading, tripped over this comment on securerandom "....initialize SecureRandom (this may be a lengthy operation)" – the lengthy operation part.

          That's only done once.

          There is also http://docs.oracle.com/javase/6/docs/api/java/rmi/server/UID.html which would be more lightweight than uuid. Still > 64bits though.

          That is actually a deterministic value which can easily be duplicated by moving system clock back as far as I see
          I will resurrect the client hash generation code.

          Yeah. How bad is that though? Currently if a regionserver clock is out of sync w/ master we'll reject it (IIRC). We'd do something similar for a client that is trying to do a nonce-operation? If its mutation is outside of our nonce-keeping window (because we dropped our server-side record, we can't be sure it not a retry coming in outside of our bounds). Would have to > long GC though (smile).

          Hmm... that is possible, however still leaves the client guessing with regard to operation's success.

          Is the attempt at sequential ids attached here?

          No.

          Show
          Sergey Shelukhin added a comment - ...I have added that for sequential nonces but I wonder if it's worth it for simple nonces. What is the 'that' in above ('tricks or epic locks'?) If you have map, by client ID, of structures (say, containing nonce hashmap), you have to clean up parent map when the clients go away, presumably on timer. However what if some request arrives for the client you are about to remove and grabs it from the map, to use? Either global R/W lock is needed that will block requests for cleanup, or, for example, DrainBarrier in each client record, so in request code op will be started and ended (2 interlocked ops), and in cleanup it will "stop" client before removal. If request gets client record and fails to start op due to cleanup it will just retry getting/creating client record. Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed to 8 bytes and written into nonceGroup. If opaque, just uuid it? You'd still have to make a long of it. Aside: Reading, tripped over this comment on securerandom "....initialize SecureRandom (this may be a lengthy operation)" – the lengthy operation part. That's only done once. There is also http://docs.oracle.com/javase/6/docs/api/java/rmi/server/UID.html which would be more lightweight than uuid. Still > 64bits though. That is actually a deterministic value which can easily be duplicated by moving system clock back as far as I see I will resurrect the client hash generation code. Yeah. How bad is that though? Currently if a regionserver clock is out of sync w/ master we'll reject it (IIRC). We'd do something similar for a client that is trying to do a nonce-operation? If its mutation is outside of our nonce-keeping window (because we dropped our server-side record, we can't be sure it not a retry coming in outside of our bounds). Would have to > long GC though (smile). Hmm... that is possible, however still leaves the client guessing with regard to operation's success. Is the attempt at sequential ids attached here? No.
          Hide
          stack added a comment -

          Hmm... that is possible, however still leaves the client guessing with regard to operation's success.

          How? We reject on the way in before any operation has happened so client knows its attempt failed?

          That is actually a deterministic value which can easily be duplicated by moving system clock back as far as I see

          Yes. Presumes non-malicious clients.

          Good on you Sergey.

          Show
          stack added a comment - Hmm... that is possible, however still leaves the client guessing with regard to operation's success. How? We reject on the way in before any operation has happened so client knows its attempt failed? That is actually a deterministic value which can easily be duplicated by moving system clock back as far as I see Yes. Presumes non-malicious clients. Good on you Sergey.
          Hide
          Sergey Shelukhin added a comment -

          added usage of clientId and using bytes instead of Pair as hashmap key; some CR feedback from r and some minor changes

          Show
          Sergey Shelukhin added a comment - added usage of clientId and using bytes instead of Pair as hashmap key; some CR feedback from r and some minor changes
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v4.patch [ 12582375 ]
          Hide
          Sergey Shelukhin added a comment -

          How? We reject on the way in before any operation has happened so client knows its attempt failed?

          If the client keeps retrying without response, and eventually server says "your timestamp is too old", it is equivalent to the original problem where client doesn't know if he should retry more after retrying once without response, just that there were more retries.

          Show
          Sergey Shelukhin added a comment - How? We reject on the way in before any operation has happened so client knows its attempt failed? If the client keeps retrying without response, and eventually server says "your timestamp is too old", it is equivalent to the original problem where client doesn't know if he should retry more after retrying once without response, just that there were more retries.
          Hide
          Sergey Shelukhin added a comment -

          I actually just realized that this model doesn't appear to work for normal region move (and none would, in fact), because there's no WAL replay in that case.
          The more I think about it the more I wonder if we should have separate nonce WAL with lazy sync for proper solution. It needs only be synced on memstore flush; if memstore wasn't flushed, the data recovery will happen and nonces will be picked from regular WAL.
          This patch is better than nothing I guess, for errors against single server of failing server. I will address little feedback remaining on r...

          Show
          Sergey Shelukhin added a comment - I actually just realized that this model doesn't appear to work for normal region move (and none would, in fact), because there's no WAL replay in that case. The more I think about it the more I wonder if we should have separate nonce WAL with lazy sync for proper solution. It needs only be synced on memstore flush; if memstore wasn't flushed, the data recovery will happen and nonces will be picked from regular WAL. This patch is better than nothing I guess, for errors against single server of failing server. I will address little feedback remaining on r...
          Hide
          Ted Yu added a comment -

          if we should have separate nonce WAL with lazy sync for proper solution.

          We should do the above.

          Show
          Ted Yu added a comment - if we should have separate nonce WAL with lazy sync for proper solution. We should do the above.
          Hide
          Sergey Shelukhin added a comment -

          Feedback from r, made configurable (on by default). About nonce WAL - I do not have information about how important this JIRA is. It seems like a lot of moving parts for such a rare occurance. I'd like 2nd opinion (or more than one ). stack what do you think

          Show
          Sergey Shelukhin added a comment - Feedback from r, made configurable (on by default). About nonce WAL - I do not have information about how important this JIRA is. It seems like a lot of moving parts for such a rare occurance. I'd like 2nd opinion (or more than one ). stack what do you think
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v5.patch [ 12583058 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v5.patch [ 12583060 ]
          Hide
          stack added a comment -

          Sergey Shelukhin Duh. I should have caught that in review if I was any good. Agree a nonce WAL is OTT. Been trying to think through this and chatted some w/ Elliott but it is a thorny issue.

          Nonce should be a primitive in our rpc – all you should have to do is say if it an idempotent or not.

          Would a nonce having a 'mint' time – the time of creation help? If nonce is N ms old – server could do relative time? – then we fail the operation.

          Does nonce have to have a destination server stamped on it? It is only good for a particular server and if it shows up on another server because region moved, again we fail the op.

          Elliott was talking about a region not moving till outstanding requests are handled and no new ones are accepted in the meantime (We then went off into the weeds where the region that is closing could still take reads... and even more extreme... we could open the region in the new location and it could take writes... just not reads..... We stopped before going any further because we thought you'd just slap us).

          Show
          stack added a comment - Sergey Shelukhin Duh. I should have caught that in review if I was any good. Agree a nonce WAL is OTT. Been trying to think through this and chatted some w/ Elliott but it is a thorny issue. Nonce should be a primitive in our rpc – all you should have to do is say if it an idempotent or not. Would a nonce having a 'mint' time – the time of creation help? If nonce is N ms old – server could do relative time? – then we fail the operation. Does nonce have to have a destination server stamped on it? It is only good for a particular server and if it shows up on another server because region moved, again we fail the op. Elliott was talking about a region not moving till outstanding requests are handled and no new ones are accepted in the meantime (We then went off into the weeds where the region that is closing could still take reads... and even more extreme... we could open the region in the new location and it could take writes... just not reads..... We stopped before going any further because we thought you'd just slap us).
          Hide
          Sergey Shelukhin added a comment -

          The mint time I commented above... we could do this via sequential nonces actually, server would just gradually roll the sequence up. But they will need to be per-region to ensure gaps don't exist.
          The server name in nonce is equivalent to "never retry on different server" There will be many op failures with this approach and user code will probably add retries.

          Show
          Sergey Shelukhin added a comment - The mint time I commented above... we could do this via sequential nonces actually, server would just gradually roll the sequence up. But they will need to be per-region to ensure gaps don't exist. The server name in nonce is equivalent to "never retry on different server" There will be many op failures with this approach and user code will probably add retries.
          Hide
          Sergey Shelukhin added a comment -

          Actually with new distributed log replay situation is even worse because nonces for things after cutoff will not be sent to remote server, so they won't be replayed. Perhaps nonce WAL-like recorder is actually the simplest solution to both...

          Show
          Sergey Shelukhin added a comment - Actually with new distributed log replay situation is even worse because nonces for things after cutoff will not be sent to remote server, so they won't be replayed. Perhaps nonce WAL-like recorder is actually the simplest solution to both...
          Hide
          Andrew Purtell added a comment -

          There are tough issues here and I haven't thought them through nearly as much as Sergey. My advice nonetheless is to be as simple as possible and focus on the core problem: We don't want clients to accidentally (from their perspective) submit and execute a non idempotent op more than once. Having a nonce capability in our RPC that handles all of the corner cases would be awesome but that wide scope strikes me as follow on work after solving the immediate issue at hand? If the client gets a failure (even if more often and if a region moves normally) instead of an accidentally duplicated non idempotent op, we have made progress.

          Show
          Andrew Purtell added a comment - There are tough issues here and I haven't thought them through nearly as much as Sergey. My advice nonetheless is to be as simple as possible and focus on the core problem: We don't want clients to accidentally (from their perspective) submit and execute a non idempotent op more than once. Having a nonce capability in our RPC that handles all of the corner cases would be awesome but that wide scope strikes me as follow on work after solving the immediate issue at hand? If the client gets a failure (even if more often and if a region moves normally) instead of an accidentally duplicated non idempotent op, we have made progress.
          Hide
          Sergey Shelukhin added a comment -

          We can do that... the problem with the approach is that region move is the normal occurence so if users get these failures frequently they will probably retry in their own code instead.

          Show
          Sergey Shelukhin added a comment - We can do that... the problem with the approach is that region move is the normal occurence so if users get these failures frequently they will probably retry in their own code instead.
          Hide
          Andrew Purtell added a comment -

          the problem with the approach is that region move is the normal occurence so if users get these failures frequently they will probably retry in their own code instead.

          Right, that's implied, the tradeoff I see is basically between pushing retries up to the client or taking on a lot of corner cases. Either is fine. The former seems simpler to do as the first step.

          Show
          Andrew Purtell added a comment - the problem with the approach is that region move is the normal occurence so if users get these failures frequently they will probably retry in their own code instead. Right, that's implied, the tradeoff I see is basically between pushing retries up to the client or taking on a lot of corner cases. Either is fine. The former seems simpler to do as the first step.
          Hide
          Sergey Shelukhin added a comment -

          Sorry, got distracted and didn't post continuation comment.
          Another concern is that if you receive smth like a network timeout you don't know whether it's safe to retry even to the same server, the server could have processed the request after all. With memory-only nonces I guess we could retry to the same server and significantly reduce the potential of duplicates. Otherwise on some errors we cannot retry at all.
          But it will be much simpler, no WAL involved. For proper implementation it seems we'd need separate WAL-like thingie for nonces and all this stuff.
          As I said it's hard for me to judge because I don't know how important or prevalent the problem is. It doesn't seem like it would be very frequent. Maybe then we could indeed just say "(almost) no retries for increment/append"

          By the way, I just had the random idea that we could use instead. Please shoot it down or expand it
          Keep current increment as is. Add idempotent(atomic? checked?)Increment for those who want it...
          First, send a request to the server that creates a nonce on server-side, stores it (in-memory only) and returns it.
          Then, send increment with the nonce; server-side, that removes the nonce and does the increment.
          If increment arrives and nonce is not there the server responds that the operation is done (or expired).
          If getting nonce fails non-trivially, it can be retried with no harm, old nonce will be expired by the server (see below).
          If increment fails with trivial error (region moved/not there etc.), nonce is removed from server and retries are unrestricted; client has to go back to requesting nonce.
          If increment fails non-trivially, we cannot go to a different server unless we get some definite error from current one later, after which retries are unrestricted again; or unless we preserve the nonces in-flight during recovery (which is less data than nonces of successful operations and can be added later). However we can still retry to the same server safely.
          Requested nonces are expired by time.
          Because server does nonce issuing sequential nonces per region the old ones can be stored for longer, they are easy to roll.

          Show
          Sergey Shelukhin added a comment - Sorry, got distracted and didn't post continuation comment. Another concern is that if you receive smth like a network timeout you don't know whether it's safe to retry even to the same server, the server could have processed the request after all. With memory-only nonces I guess we could retry to the same server and significantly reduce the potential of duplicates. Otherwise on some errors we cannot retry at all. But it will be much simpler, no WAL involved. For proper implementation it seems we'd need separate WAL-like thingie for nonces and all this stuff. As I said it's hard for me to judge because I don't know how important or prevalent the problem is. It doesn't seem like it would be very frequent. Maybe then we could indeed just say "(almost) no retries for increment/append" By the way, I just had the random idea that we could use instead. Please shoot it down or expand it Keep current increment as is. Add idempotent(atomic? checked?)Increment for those who want it... First, send a request to the server that creates a nonce on server-side, stores it (in-memory only) and returns it. Then, send increment with the nonce; server-side, that removes the nonce and does the increment. If increment arrives and nonce is not there the server responds that the operation is done (or expired). If getting nonce fails non-trivially, it can be retried with no harm, old nonce will be expired by the server (see below). If increment fails with trivial error (region moved/not there etc.), nonce is removed from server and retries are unrestricted; client has to go back to requesting nonce. If increment fails non-trivially, we cannot go to a different server unless we get some definite error from current one later, after which retries are unrestricted again; or unless we preserve the nonces in-flight during recovery (which is less data than nonces of successful operations and can be added later). However we can still retry to the same server safely. Requested nonces are expired by time. Because server does nonce issuing sequential nonces per region the old ones can be stored for longer, they are easy to roll.
          Hide
          Anoop Sam John added a comment -

          Interesting Sergey..

          If increment arrives and nonce is not there the server responds that the operation is done (or expired).

          How we can distinguish btw done or expired case? This will be needed back at client/app side.

          Show
          Anoop Sam John added a comment - Interesting Sergey.. If increment arrives and nonce is not there the server responds that the operation is done (or expired). How we can distinguish btw done or expired case? This will be needed back at client/app side.
          Hide
          Sergey Shelukhin added a comment -

          Without storing things forever there's no way to tell them apart in any scheme. So the question is about storing for some period of time efficiently.

          Show
          Sergey Shelukhin added a comment - Without storing things forever there's no way to tell them apart in any scheme. So the question is about storing for some period of time efficiently.
          Hide
          Sergey Shelukhin added a comment -

          (by any scheme I mean any nonce or similar scheme, not say if operations are made intrinsically idempotent or proper transactions are implemented...)

          Show
          Sergey Shelukhin added a comment - (by any scheme I mean any nonce or similar scheme, not say if operations are made intrinsically idempotent or proper transactions are implemented...)
          Hide
          Nick Dimiduk added a comment -

          Sergey Shelukhin I like the idea of giving users the option to sacrifice throughput in exchange for reliability.

          Show
          Nick Dimiduk added a comment - Sergey Shelukhin I like the idea of giving users the option to sacrifice throughput in exchange for reliability.
          Hide
          stack added a comment -

          HDFS trying to solve similar issue HDFS-4849. Sergey, you are not first to have this issue it seems (smile). It even has a 'name' http://www.freesoft.org/CIE/RFC/1813/47.htm (via our Todd). Some "ok" suggested improvements in here: https://www.kernel.org/doc/ols/2009/ols2009-pages-95-100.pdf

          Show
          stack added a comment - HDFS trying to solve similar issue HDFS-4849 . Sergey, you are not first to have this issue it seems (smile). It even has a 'name' http://www.freesoft.org/CIE/RFC/1813/47.htm (via our Todd). Some "ok" suggested improvements in here: https://www.kernel.org/doc/ols/2009/ols2009-pages-95-100.pdf
          Hide
          Sergey Shelukhin added a comment -

          MRU XID list in the paper actually looks somewhat similar to incrementing nonces above. We don't have the problem of clients with single vs. multiple threads etc. because we control the client, incrementing nonces would be per region per client to roll forward easier.
          I'd say response cache may not be feasible if response contains KVs, that's too much to cache imho, especially given the nature of distributed retries (they can take longer time between retries).
          Main problem here as far as I see it is doing things between servers. For that, somewhat complicated things will have to happen. I don't see design consensus here... if we think this is important, after HBaseCon, based on current patch, I can implement one of the complex schemes to see what it looks like. What do you think it should be? Incrementing nonces + "nonce WAL" as described above sound good?

          Show
          Sergey Shelukhin added a comment - MRU XID list in the paper actually looks somewhat similar to incrementing nonces above. We don't have the problem of clients with single vs. multiple threads etc. because we control the client, incrementing nonces would be per region per client to roll forward easier. I'd say response cache may not be feasible if response contains KVs, that's too much to cache imho, especially given the nature of distributed retries (they can take longer time between retries). Main problem here as far as I see it is doing things between servers. For that, somewhat complicated things will have to happen. I don't see design consensus here... if we think this is important, after HBaseCon, based on current patch, I can implement one of the complex schemes to see what it looks like. What do you think it should be? Incrementing nonces + "nonce WAL" as described above sound good?
          Hide
          stack added a comment -

          Yeah, can't cache KV.

          Can we have something for one server first?

          Show
          stack added a comment - Yeah, can't cache KV. Can we have something for one server first?
          Hide
          Sergey Shelukhin added a comment -

          refer to the attached patch I can remove the WAL part

          Show
          Sergey Shelukhin added a comment - refer to the attached patch I can remove the WAL part
          stack made changes -
          Fix Version/s 0.95.2 [ 12320040 ]
          Fix Version/s 0.95.1 [ 12324288 ]
          Hide
          Enis Soztutar added a comment -

          Sergey asked me to elaborate a bit more on my earlier candidate proposal. This is still light on details, and just for some food for thought to be considered for later.

          The idea for this proposal will only work with append and increment type operations, since it will be operation specific rather than a generic solution. This also relies on assumptions that distributed counters are the main use case for increment operation, and these counters are mostly written to and less-frequently read.

          We will introduce two KeyValue.Type's: Put_Inc and Put_App, and rely on cell tags to keep nonces around. These sort before Puts. We can make the cell tag nonce a part of sort order as well, if it is set (otherwise we can append nonce to the row_key). With this we don't need any specific handling of nonces on the write side, since writes with the same nonce will eclipse each other since they will sort the same. Also we do not have to keep anything in memory, and regions can be moved freely in between servers. Put_Inc and Put_App will not count against version, so that we keep those around until they expire.

          We can build a grouping KV scanner which collapses Put_Inc's with the underlying Puts. Since every get is already a scan, when client wants to read the value back, it is computed on the fly (until we see a base Put, the versions will not increase, so we will keep on scanning and buffering up). On compactions, we can also use this grouping to collapse nonces that have been expired.

          The data might be sorted as:
          Put,r1,cf1:q1,ts3,val4
          Put_Inc,r1,cf1:q1,ts2,val3 (tag:nonce)
          Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce)
          Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) => idempotent rpc, second try
          Put,r1,cf1:q1,ts1,val1

          Get -> will return val4.
          Get (ts <= ts2) will return val3 + val2 + val1

          Show
          Enis Soztutar added a comment - Sergey asked me to elaborate a bit more on my earlier candidate proposal. This is still light on details, and just for some food for thought to be considered for later. The idea for this proposal will only work with append and increment type operations, since it will be operation specific rather than a generic solution. This also relies on assumptions that distributed counters are the main use case for increment operation, and these counters are mostly written to and less-frequently read. We will introduce two KeyValue.Type's: Put_Inc and Put_App, and rely on cell tags to keep nonces around. These sort before Puts. We can make the cell tag nonce a part of sort order as well, if it is set (otherwise we can append nonce to the row_key). With this we don't need any specific handling of nonces on the write side, since writes with the same nonce will eclipse each other since they will sort the same. Also we do not have to keep anything in memory, and regions can be moved freely in between servers. Put_Inc and Put_App will not count against version, so that we keep those around until they expire. We can build a grouping KV scanner which collapses Put_Inc's with the underlying Puts. Since every get is already a scan, when client wants to read the value back, it is computed on the fly (until we see a base Put, the versions will not increase, so we will keep on scanning and buffering up). On compactions, we can also use this grouping to collapse nonces that have been expired. The data might be sorted as: Put,r1,cf1:q1,ts3,val4 Put_Inc,r1,cf1:q1,ts2,val3 (tag:nonce) Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) Put_Inc,r1,cf1:q1,ts1,val2 (tag:nonce) => idempotent rpc, second try Put,r1,cf1:q1,ts1,val1 Get -> will return val4. Get (ts <= ts2) will return val3 + val2 + val1
          Hide
          Sergey Shelukhin added a comment -

          My 2 concerns about this are as follows.
          First, if there are a lot of clients incrementing one counter, gets will have to read 100s-1000s-... KVs and will be slower.
          Second, actually increments and appends are currently versions, because internally they are just puts. That, plus # of versions restriction, plus out-of-order puts and deletes, will make version management... interesting.

          However it does make operations inherently idempotent.

          Show
          Sergey Shelukhin added a comment - My 2 concerns about this are as follows. First, if there are a lot of clients incrementing one counter, gets will have to read 100s-1000s-... KVs and will be slower. Second, actually increments and appends are currently versions, because internally they are just puts. That, plus # of versions restriction, plus out-of-order puts and deletes, will make version management... interesting. However it does make operations inherently idempotent.
          Hide
          Andrew Purtell added a comment -

          We can make the cell tag nonce a part of sort order as well,

          Cells can have arbitrary tags. Do we really want them to be part of the sort order of KeyValues? How does that work? Do we sort first by tag type, then by tag value? Really should avoid going this way if possible.

          Show
          Andrew Purtell added a comment - We can make the cell tag nonce a part of sort order as well, Cells can have arbitrary tags. Do we really want them to be part of the sort order of KeyValues? How does that work? Do we sort first by tag type, then by tag value? Really should avoid going this way if possible.
          Hide
          Enis Soztutar added a comment -

          Cells can have arbitrary tags. Do we really want them to be part of the sort order of KeyValues?

          Agreed that we have to think carefully about this. Obviously, we do not want arbitrary tags to be part of sort order. AFAIK, accumulo's acl's are in sort order. We can have system level tags that are sortable vs user level tags, or a boolean per tag indicating whether the value is sortable.

          Show
          Enis Soztutar added a comment - Cells can have arbitrary tags. Do we really want them to be part of the sort order of KeyValues? Agreed that we have to think carefully about this. Obviously, we do not want arbitrary tags to be part of sort order. AFAIK, accumulo's acl's are in sort order. We can have system level tags that are sortable vs user level tags, or a boolean per tag indicating whether the value is sortable.
          Hide
          Andrew Purtell added a comment -

          accumulo's acl's are in sort order

          Accumulo has visibility expressions, not ACLs.

          Considering the visibility expression part of the KV was the cause of the data inflation discussed in Himanshu's proposal on HBASE-6222.

          We can have system level tags that are sortable vs user level tags, or a boolean per tag indicating whether the value is sortable.

          How do we distinguish system level tags? I want my coprocessor to have system level tags. It will be registered dynamically on a per table basis.

          So if there's a boolean per tag, then the comparator is skipping a range, checking a byte, including a range, skipping another range, testing another byte, and so on.

          I don't think tags should be considered by KeyValue or Cell comparators at all.

          Show
          Andrew Purtell added a comment - accumulo's acl's are in sort order Accumulo has visibility expressions, not ACLs. Considering the visibility expression part of the KV was the cause of the data inflation discussed in Himanshu's proposal on HBASE-6222 . We can have system level tags that are sortable vs user level tags, or a boolean per tag indicating whether the value is sortable. How do we distinguish system level tags? I want my coprocessor to have system level tags. It will be registered dynamically on a per table basis. So if there's a boolean per tag, then the comparator is skipping a range, checking a byte, including a range, skipping another range, testing another byte, and so on. I don't think tags should be considered by KeyValue or Cell comparators at all.
          Hide
          Enis Soztutar added a comment -

          I don't think tags should be considered by KeyValue or Cell comparators at all.

          Agreed. Thinking more about it that would be very complex in terms of semantics and implementation.
          I guess we can alter my proposal above to make it so that Incr records count towards versions, but the cells should have unlimited number of versions to work.

          Show
          Enis Soztutar added a comment - I don't think tags should be considered by KeyValue or Cell comparators at all. Agreed. Thinking more about it that would be very complex in terms of semantics and implementation. I guess we can alter my proposal above to make it so that Incr records count towards versions, but the cells should have unlimited number of versions to work.
          Hide
          Sergey Shelukhin added a comment -

          FWIW: this paper (I don't remember where I obtained the pdf frankly, I cannot find it in normal access) http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5697970&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5685607%2F5697799%2F05697970.pdf%3Farnumber%3D5697970 relies on well-defined return value for increments, as well as on tables with very large volume of increments to the same value (transaction ids of various sorts). Granted, HBase is probably not the best tool for the job in this case...

          Show
          Sergey Shelukhin added a comment - FWIW: this paper (I don't remember where I obtained the pdf frankly, I cannot find it in normal access) http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5697970&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5685607%2F5697799%2F05697970.pdf%3Farnumber%3D5697970 relies on well-defined return value for increments, as well as on tables with very large volume of increments to the same value (transaction ids of various sorts). Granted, HBase is probably not the best tool for the job in this case...
          stack made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          Hide
          stack added a comment -

          What is current thinking on this one?

          I gave it a reskim. It is straight-forward (even where we write the WAL). It is off by default. Should we commit or punt till later, till we have an interested user?

          Show
          stack added a comment - What is current thinking on this one? I gave it a reskim. It is straight-forward (even where we write the WAL). It is off by default. Should we commit or punt till later, till we have an interested user?
          Hide
          stack added a comment -

          Moving out. Sergey Shelukhin has done bunch of research and has proposed patch up. Needs more review and testing. Won't make the 0.96 cut and can come in later anyways.

          Show
          stack added a comment - Moving out. Sergey Shelukhin has done bunch of research and has proposed patch up. Needs more review and testing. Won't make the 0.96 cut and can come in later anyways.
          stack made changes -
          Fix Version/s 0.95.2 [ 12320040 ]
          Hide
          Sergey Shelukhin added a comment -

          What is the future for this patch? If we want this feature I can rebase and commit in trunk

          Show
          Sergey Shelukhin added a comment - What is the future for this patch? If we want this feature I can rebase and commit in trunk
          Hide
          stack added a comment -

          I think we need this patch. Just needs review before commit.

          Show
          stack added a comment - I think we need this patch. Just needs review before commit.
          Hide
          Sergey Shelukhin added a comment -

          Volunteers to review?

          Show
          Sergey Shelukhin added a comment - Volunteers to review?
          Hide
          Devaraj Das added a comment -

          I'll get to it...

          Show
          Devaraj Das added a comment - I'll get to it...
          Hide
          Devaraj Das added a comment -

          Sergey Shelukhin, please bear with me - (1) could you please upload a rebased patch. The patch is horribly out-of-date. (2) HDFS-4979 introduced RetryCache in the NameNode. Not sure if it is reasonable - is it possible to use the classes from HDFS-4979 here.

          Show
          Devaraj Das added a comment - Sergey Shelukhin , please bear with me - (1) could you please upload a rebased patch. The patch is horribly out-of-date. (2) HDFS-4979 introduced RetryCache in the NameNode. Not sure if it is reasonable - is it possible to use the classes from HDFS-4979 here.
          Hide
          Sergey Shelukhin added a comment -

          there's a monumental number of conflicts... let me try to rebase this.

          Show
          Sergey Shelukhin added a comment - there's a monumental number of conflicts... let me try to rebase this.
          Hide
          Sergey Shelukhin added a comment -

          trying to grok rpc changes, sent some email to Nicolas. Will maybe update by EOW

          Show
          Sergey Shelukhin added a comment - trying to grok rpc changes, sent some email to Nicolas. Will maybe update by EOW
          Hide
          Sergey Shelukhin added a comment -

          The rebased patch.
          Given the new retry paths, the new batch RPC endpoints, and the new log replay large parts of the patch were rewritten.
          The manager/generator obviously didn't change.

          There were some suggestions that may be added here or in separate jira:
          1) Also use nonce in checkAndPut.
          2) Look at HDFS stuff. That applies mostly to manager implementation, and is unlikely to affect RPC/HTable/etc. plumbing. It appeared later than the original patch here, so there was nothing to look at when this was first submitted. I talked a little bit to HDFS ppl here about it, will take a look next week. But the model is very similar so plumbing can be reviewed regardless.
          3) Store MVCC and instead of throwing OpConflictEx, do a read with that mvcc. Will need to work w/low watermark to not discard MVCC below earliest stored nonce. Definitely separate JIRA.
          4) Probably the HTable test needs to be extended to at least one batch op.

          Let me also run some larger scale tests on it...

          Show
          Sergey Shelukhin added a comment - The rebased patch. Given the new retry paths, the new batch RPC endpoints, and the new log replay large parts of the patch were rewritten. The manager/generator obviously didn't change. There were some suggestions that may be added here or in separate jira: 1) Also use nonce in checkAndPut. 2) Look at HDFS stuff. That applies mostly to manager implementation, and is unlikely to affect RPC/HTable/etc. plumbing. It appeared later than the original patch here, so there was nothing to look at when this was first submitted. I talked a little bit to HDFS ppl here about it, will take a look next week. But the model is very similar so plumbing can be reviewed regardless. 3) Store MVCC and instead of throwing OpConflictEx, do a read with that mvcc. Will need to work w/low watermark to not discard MVCC below earliest stored nonce. Definitely separate JIRA. 4) Probably the HTable test needs to be extended to at least one batch op. Let me also run some larger scale tests on it...
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v6.patch [ 12611722 ]
          Hide
          Sergey Shelukhin added a comment -

          Sorry, one thing missing is special handling for new wal recovery

          Show
          Sergey Shelukhin added a comment - Sorry, one thing missing is special handling for new wal recovery
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12611722/HBASE-3787-v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 20 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12611722/HBASE-3787-v6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 20 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7709//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          In all the failed tests, I saw:

          2013-11-02 02:30:15,355 ERROR [RpcServer.handler=4,port=35527] ipc.RpcServer(2020): Unexpected throwable object 
          java.lang.AssertionError
          	at org.apache.hadoop.hbase.regionserver.HRegion$ReplayBatch.getMutationsForCoprocs(HRegion.java:1970)
          	at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2244)
          	at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2040)
          	at org.apache.hadoop.hbase.regionserver.HRegion.batchReplay(HRegion.java:2004)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.doReplayBatchOp(HRegionServer.java:4234)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.replay(HRegionServer.java:3915)
          	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19809)
          	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983)
          	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
          	at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
          	at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
          

          which was due to this in ReplayBatch:

          +    public Mutation[] getMutationsForCoprocs() {
          +      assert false;
          +      throw new RuntimeException("Should not be called for replay batch");
          +    }
          
          Show
          Ted Yu added a comment - In all the failed tests, I saw: 2013-11-02 02:30:15,355 ERROR [RpcServer.handler=4,port=35527] ipc.RpcServer(2020): Unexpected throwable object java.lang.AssertionError at org.apache.hadoop.hbase.regionserver.HRegion$ReplayBatch.getMutationsForCoprocs(HRegion.java:1970) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2244) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2040) at org.apache.hadoop.hbase.regionserver.HRegion.batchReplay(HRegion.java:2004) at org.apache.hadoop.hbase.regionserver.HRegionServer.doReplayBatchOp(HRegionServer.java:4234) at org.apache.hadoop.hbase.regionserver.HRegionServer.replay(HRegionServer.java:3915) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:19809) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) which was due to this in ReplayBatch: + public Mutation[] getMutationsForCoprocs() { + assert false ; + throw new RuntimeException( "Should not be called for replay batch" ); + }
          Hide
          Sergey Shelukhin added a comment -

          yeah, wrong value is passed from replaybatch. There is another issue, I need to look at behavior of nonces for new replay... for old one, the patch has special time handling, so that after replay you don't get bunch of replayed nonces stored as new, with full recovery period. Let me see how to do it for new one.

          Show
          Sergey Shelukhin added a comment - yeah, wrong value is passed from replaybatch. There is another issue, I need to look at behavior of nonces for new replay... for old one, the patch has special time handling, so that after replay you don't get bunch of replayed nonces stored as new, with full recovery period. Let me see how to do it for new one.
          Hide
          Sergey Shelukhin added a comment -

          Fix the bug; finish support for distributed log replay, add test to ensure nonces work across log replay

          Show
          Sergey Shelukhin added a comment - Fix the bug; finish support for distributed log replay, add test to ensure nonces work across log replay
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v7.patch [ 12612055 ]
          Hide
          Sergey Shelukhin added a comment -

          Woops, forgot rb feedback. I will respond and make the requisite fixes in v8

          Show
          Sergey Shelukhin added a comment - Woops, forgot rb feedback. I will respond and make the requisite fixes in v8
          Hide
          Sergey Shelukhin added a comment -

          RB feedback, a little bit space saving for nonces.
          I looked at hadoop-common retrycache, it seems to be specific to hadoop RPC and depends on some static methods there.

          Show
          Sergey Shelukhin added a comment - RB feedback, a little bit space saving for nonces. I looked at hadoop-common retrycache, it seems to be specific to hadoop RPC and depends on some static methods there.
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v8.patch [ 12612070 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12612055/HBASE-3787-v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612055/HBASE-3787-v7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait org.apache.hadoop.hbase.regionserver.TestAtomicOperation Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7730//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12612070/HBASE-3787-v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestHRegion
          org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612070/HBASE-3787-v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7731//console This message is automatically generated.
          Sergey Shelukhin made changes -
          Link This issue blocks HBASE-9899 [ HBASE-9899 ]
          Hide
          Sergey Shelukhin added a comment -

          btw, the patch is ready to review. I filed HBASE-9899 for follow-up work

          Show
          Sergey Shelukhin added a comment - btw, the patch is ready to review. I filed HBASE-9899 for follow-up work
          Hide
          Sergey Shelukhin added a comment -

          any takers for review? This is a huge pita to rebase when it becomes stale

          Show
          Sergey Shelukhin added a comment - any takers for review? This is a huge pita to rebase when it becomes stale
          Hide
          stack added a comment -

          Sergey Shelukhin Is latest on RB boss?

          Show
          stack added a comment - Sergey Shelukhin Is latest on RB boss?
          Hide
          Sergey Shelukhin added a comment -

          yeah, I updated https://reviews.apache.org/r/10965/ with latest patch

          Show
          Sergey Shelukhin added a comment - yeah, I updated https://reviews.apache.org/r/10965/ with latest patch
          Hide
          Sergey Shelukhin added a comment -

          Nick Dimiduk asked me... some review guideline (of course you can review in any order).
          1) Client nonce generator, as well as additions to testMultiParallel and test log replay to get the idea of the feature.
          2) Test of the nonce manager and nonce manager to see server nonce handling and how it works.
          3) Plumbing (most of the patch), unfortunately there isn't any good order to review plumbing... perhaps:
          a) protobuf and client changes.
          b) server and log replay changes.

          Show
          Sergey Shelukhin added a comment - Nick Dimiduk asked me... some review guideline (of course you can review in any order). 1) Client nonce generator, as well as additions to testMultiParallel and test log replay to get the idea of the feature. 2) Test of the nonce manager and nonce manager to see server nonce handling and how it works. 3) Plumbing (most of the patch), unfortunately there isn't any good order to review plumbing... perhaps: a) protobuf and client changes. b) server and log replay changes.
          Hide
          Sergey Shelukhin added a comment -

          rebase the patch again, and address current CR feedback

          Show
          Sergey Shelukhin added a comment - rebase the patch again, and address current CR feedback
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v9.patch [ 12612969 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12612969/HBASE-3787-v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612969/HBASE-3787-v9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7805//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          some CR feedback, some comments. I saw some questions about the "holes" in the approach that were mentioned in the JIRA much earlier and someone couldn't find them, but I cannot find the question now
          I put the details in comments in the code

          Show
          Sergey Shelukhin added a comment - some CR feedback, some comments. I saw some questions about the "holes" in the approach that were mentioned in the JIRA much earlier and someone couldn't find them, but I cannot find the question now I put the details in comments in the code
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v10.patch [ 12613314 ]
          Hide
          Nick Dimiduk added a comment -

          RB is broken today. Let me attempt to provide context and comment here.

          From NonceGenerator.

          Random generation is just bunch of arithmetic and one CAS. Do you suspect there will be contention from many threads running the increments?
          Can you provide more details for impact from benchmark?

          I'm thinking less of contention and more about the CPU time necessary for the arithmetic. I haven't investigated the implementation to understand why, so perhaps my comment is ungrounded. Specifically I'm referring to this comment in PerfEval#generateData method

            /*
             * This method takes some time and is done inline uploading data.  For
             * example, doing the mapfile test, generation of the key and value
             * consumes about 30% of CPU time.
             * @return Generated random value to insert into a table cell.
             */
          

          In that case, the system is generating ~1k random ints per loop iteration. You're generating far fewer, so maybe it's a non-issue?

          From TestMultiParallel.

          yes, they are. It's assertTrue(false) essentially

          Right. As I understand it, JUnit asserts are represented as thrown Exceptions. Because the source of the exception is within a running thread, those exceptions must be aggregated and re-thrown in order for the calling context to respect them.

          Show
          Nick Dimiduk added a comment - RB is broken today. Let me attempt to provide context and comment here. From NonceGenerator. Random generation is just bunch of arithmetic and one CAS. Do you suspect there will be contention from many threads running the increments? Can you provide more details for impact from benchmark? I'm thinking less of contention and more about the CPU time necessary for the arithmetic. I haven't investigated the implementation to understand why, so perhaps my comment is ungrounded. Specifically I'm referring to this comment in PerfEval#generateData method /* * This method takes some time and is done inline uploading data. For * example, doing the mapfile test, generation of the key and value * consumes about 30% of CPU time. * @return Generated random value to insert into a table cell. */ In that case, the system is generating ~1k random ints per loop iteration. You're generating far fewer, so maybe it's a non-issue? From TestMultiParallel. yes, they are. It's assertTrue(false) essentially Right. As I understand it, JUnit asserts are represented as thrown Exceptions. Because the source of the exception is within a running thread, those exceptions must be aggregated and re-thrown in order for the calling context to respect them.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613314/HBASE-3787-v10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613314/HBASE-3787-v10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7824//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          changed to interface to separate client id impl from the interface; fixed tests. One question about perf remains on the JIRA. I don't think some arithmetic and CAS to generate 2 random ints (for a long) will have significant impact. CASes can have contention I guess if there are many threads, but other than a threadlocal there's no much better way to do it.

          Show
          Sergey Shelukhin added a comment - changed to interface to separate client id impl from the interface; fixed tests. One question about perf remains on the JIRA. I don't think some arithmetic and CAS to generate 2 random ints (for a long) will have significant impact. CASes can have contention I guess if there are many threads, but other than a threadlocal there's no much better way to do it.
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v11.patch [ 12613489 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613489/HBASE-3787-v11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613489/HBASE-3787-v11.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7834//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          I will be going off the grid next week... the patch is ready, if there's no other feedback it's ok if someone else commits then

          Show
          Sergey Shelukhin added a comment - I will be going off the grid next week... the patch is ready, if there's no other feedback it's ok if someone else commits then
          Hide
          stack added a comment -

          What is this change?

          "...changed to interface to separate client id impl from the interface; "

          I skimmed the patch (RB is down). EIther you fixed it or I was confused. I see a clientidnonce thingy now and it has mention of client id internal... that is fine (if that was how it always was, discount my RB comment . If you changed stuff, thanks).

          +1 on commit. Important fix. Thanks for the persistence Sergey. Work it out w/ Nick Dimiduk so you can commit this before you leave so it doesn't rot again.

          Show
          stack added a comment - What is this change? "...changed to interface to separate client id impl from the interface; " I skimmed the patch (RB is down). EIther you fixed it or I was confused. I see a clientidnonce thingy now and it has mention of client id internal... that is fine (if that was how it always was, discount my RB comment . If you changed stuff, thanks). +1 on commit. Important fix. Thanks for the persistence Sergey. Work it out w/ Nick Dimiduk so you can commit this before you leave so it doesn't rot again.
          Hide
          Nick Dimiduk added a comment -

          I agree, this is a very important feature. I'd rather have it and fix any perf impact later than let is get stale again.

          This review is based on `interdiff v10 v11`, so please pardon any confused that comes about from a rebase.

          +    /** Dummy nonce generator for disabled nonces. */
          +    private static class NoNonceGenerator implements NonceGenerator {
          +      @Override
          +      public long getNonceGroup() {
          +        return HConstants.NO_NONCE;
          +      }
          +      @Override
          +      public long newNonce() {
          +        return HConstants.NO_NONCE;
          +      }
          +    }
          

          Love it. This is a great way to abstract this component.

          +import java.sql.Date;
          +import java.text.SimpleDateFormat;
          

          Did you mean

          {java.util.Date}

          ?

          TestRunnable is excellent.

          +1

          Show
          Nick Dimiduk added a comment - I agree, this is a very important feature. I'd rather have it and fix any perf impact later than let is get stale again. This review is based on `interdiff v10 v11`, so please pardon any confused that comes about from a rebase. + /** Dummy nonce generator for disabled nonces. */ + private static class NoNonceGenerator implements NonceGenerator { + @Override + public long getNonceGroup() { + return HConstants.NO_NONCE; + } + @Override + public long newNonce() { + return HConstants.NO_NONCE; + } + } Love it. This is a great way to abstract this component. +import java.sql.Date; +import java.text.SimpleDateFormat; Did you mean {java.util.Date} ? TestRunnable is excellent. +1
          Hide
          Sergey Shelukhin added a comment -

          rebase the patch yet again due to conflicts. Change date type. I would assume +1s stand unless there are objections, and will commit evening-ish

          Show
          Sergey Shelukhin added a comment - rebase the patch yet again due to conflicts. Change date type. I would assume +1s stand unless there are objections, and will commit evening-ish
          Sergey Shelukhin made changes -
          Attachment HBASE-3787-v12.patch [ 12613942 ]
          Hide
          Nick Dimiduk added a comment -

          Interdiff choked this time. Can we try RB once again?

          Anyway, if that's all you changed, I retain my +1.

          Show
          Nick Dimiduk added a comment - Interdiff choked this time. Can we try RB once again? Anyway, if that's all you changed, I retain my +1.
          Hide
          stack added a comment -

          My +1 stands across a rebase.

          Show
          stack added a comment - My +1 stands across a rebase.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12613942/HBASE-3787-v12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 23 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12613942/HBASE-3787-v12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 23 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7866//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          PerClientRandomNonceGenerator.java needs license.

          Show
          Ted Yu added a comment - PerClientRandomNonceGenerator.java needs license.
          Hide
          Sergey Shelukhin added a comment -

          committed in 2 parts (added license)

          Show
          Sergey Shelukhin added a comment - committed in 2 parts (added license)
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #838 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/838/)
          HBASE-3787 Increment is non-idempotent but client retries RPC ADDENDUM add licence (sershe: rev 1542169)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java
            HBASE-3787 Increment is non-idempotent but client retries RPC (sershe: rev 1542168)
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientIdGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/NonceGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/OperationConflictException.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Triple.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MultiRowMutationProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RowProcessorProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/MultiRowMutation.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/RowProcessor.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ServerNonceManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerNonceManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #838 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/838/ ) HBASE-3787 Increment is non-idempotent but client retries RPC ADDENDUM add licence (sershe: rev 1542169) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java HBASE-3787 Increment is non-idempotent but client retries RPC (sershe: rev 1542168) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientIdGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/NonceGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/OperationConflictException.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Triple.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MultiRowMutationProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RowProcessorProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto /hbase/trunk/hbase-protocol/src/main/protobuf/MultiRowMutation.proto /hbase/trunk/hbase-protocol/src/main/protobuf/RowProcessor.proto /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ServerNonceManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerNonceManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4681 (See https://builds.apache.org/job/HBase-TRUNK/4681/)
          HBASE-3787 Increment is non-idempotent but client retries RPC ADDENDUM add licence (sershe: rev 1542169)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java
            HBASE-3787 Increment is non-idempotent but client retries RPC (sershe: rev 1542168)
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientIdGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/NonceGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/OperationConflictException.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Triple.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MultiRowMutationProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RowProcessorProtos.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/MultiRowMutation.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/RowProcessor.proto
          • /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ServerNonceManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerNonceManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4681 (See https://builds.apache.org/job/HBase-TRUNK/4681/ ) HBASE-3787 Increment is non-idempotent but client retries RPC ADDENDUM add licence (sershe: rev 1542169) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java HBASE-3787 Increment is non-idempotent but client retries RPC (sershe: rev 1542168) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientIdGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/NonceGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/PerClientRandomNonceGenerator.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/OperationConflictException.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Triple.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MultiRowMutationProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RowProcessorProtos.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto /hbase/trunk/hbase-protocol/src/main/protobuf/MultiRowMutation.proto /hbase/trunk/hbase-protocol/src/main/protobuf/RowProcessor.proto /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/CoprocessorHConnection.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRowProcessorEndpoint.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MultiRowMutationEndpoint.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ServerNonceManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerNonceManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
          Hide
          Sergey Shelukhin added a comment -

          This was actually committed some time ago (before branching 0.98 I think)

          Show
          Sergey Shelukhin added a comment - This was actually committed some time ago (before branching 0.98 I think)
          Sergey Shelukhin made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Fix Version/s 0.99.0 [ 12325675 ]
          Resolution Fixed [ 1 ]
          Sergey Shelukhin made changes -
          Link This issue breaks HBASE-10069 [ HBASE-10069 ]

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development