HBase
  1. HBase
  2. HBASE-10256

Handling of value argument being null for checkAndDelete

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Tags:
      Phoenix

      Description

      IMO, it would be more useful if when a null is supplied as the value for checkAndDelete, that it add the Delete marker if the row exists rather than if it doesn't exist (i.e. the opposite of checkAndPut). I think the most common use case for checkAndDelete is to delete a row if it exists and tell me that you deleted it. With the current implementation, it adds a Delete marker only if the row does not exist when you supply a null value which IMO is not very useful.

      My workaround is to add another "known" KeyValue to my row with a fixed value and then call checkAndDelete passing in this known value. Also, FWIW, if an empty byte array is passed through (a valid value), it treats it the same as if null was passed through, which is another separate bug.

        Activity

        Hide
        James Taylor added a comment -

        Lars Hofhansl - another one for you that came up in my sequence implementation for Phoenix. This one is more of a nice-to-have, as I have a reasonable workaround.

        Show
        James Taylor added a comment - Lars Hofhansl - another one for you that came up in my sequence implementation for Phoenix. This one is more of a nice-to-have, as I have a reasonable workaround.
        Hide
        Lars Hofhansl added a comment -

        Not sure I follow. If you can issue a checkAndDelete you do have a row-key, right? In that case how is a checkAndDelete with a null value different from just a Delete?
        In this case you have a row-key, but you do now know whether it actually exists?

        Show
        Lars Hofhansl added a comment - Not sure I follow. If you can issue a checkAndDelete you do have a row-key, right? In that case how is a checkAndDelete with a null value different from just a Delete? In this case you have a row-key, but you do now know whether it actually exists?
        Hide
        James Taylor added a comment -

        Here's my use case:

        • I have a row key
        • I want to delete the row
        • I want to know that the row existed when deleted so I can notify the client if it didn't.
        • I want to be able to do this with a single atomic call so I don't have a race condition.

        I thought that if I provide a null for the value argument of a checkAndDelete, it would do the above, but it doesn't. I think this would be a better default behavior, as with the current behavior specifying a null for the value argument is not very useful. This can probably not be solved with the current API as it would break b/w compat, but perhaps a new API could be added down the road.

        Show
        James Taylor added a comment - Here's my use case: I have a row key I want to delete the row I want to know that the row existed when deleted so I can notify the client if it didn't. I want to be able to do this with a single atomic call so I don't have a race condition. I thought that if I provide a null for the value argument of a checkAndDelete, it would do the above, but it doesn't. I think this would be a better default behavior, as with the current behavior specifying a null for the value argument is not very useful. This can probably not be solved with the current API as it would break b/w compat, but perhaps a new API could be added down the road.
        Hide
        Lars Hofhansl added a comment -

        I see. I think that could be accomplished if in addition to the value to be compared we'd pass the compareOp, and then use != as compareOp, right?

        Show
        Lars Hofhansl added a comment - I see. I think that could be accomplished if in addition to the value to be compared we'd pass the compareOp, and then use != as compareOp, right?
        Hide
        James Taylor added a comment -

        Yes, that would work too.

        Show
        James Taylor added a comment - Yes, that would work too.
        Hide
        Lars Hofhansl added a comment -

        I had started a long time ago on this in HBASE-5923. Then ran into problems with the API and lost interest

        Show
        Lars Hofhansl added a comment - I had started a long time ago on this in HBASE-5923 . Then ran into problems with the API and lost interest
        Hide
        James Taylor added a comment -

        Another deficiency of this API - there's no way to specify the TimeRange that is used for the Get. Any operation that does a Get underneath (checkAndDelete, checkAndPut, Increment, IncrementColumnValue), should allow for a TimeRange to be passed through (Increment already supports this).

        Show
        James Taylor added a comment - Another deficiency of this API - there's no way to specify the TimeRange that is used for the Get. Any operation that does a Get underneath (checkAndDelete, checkAndPut, Increment, IncrementColumnValue), should allow for a TimeRange to be passed through (Increment already supports this).
        Hide
        Lars Hofhansl added a comment -

        Maybe the right solution is to just hand over a Get request. That can be setup with timerange, Filter, etc. If the Get returns anything update that row otherwise don't.

        (And now I'm on my way to a new year's party )

        Show
        Lars Hofhansl added a comment - Maybe the right solution is to just hand over a Get request. That can be setup with timerange, Filter, etc. If the Get returns anything update that row otherwise don't. (And now I'm on my way to a new year's party )
        Hide
        James Taylor added a comment -

        One other change that would be useful. Instead of returning a boolean return a Long with null meaning that check failed and otherwise the Long represents the timestamp of the Delete (or Put). This conveys more information, as Phoenix caches based on the timestamp of the mutation. If we can get the actual timestamp (in the case where HConstants.LATEST_TIMESTAMP was used in the Delete/Put), then knowing the server timestamp is useful information. Without this, an extra round trip ends up being required later.

        FWIW, I've been able to work around all these issues with region observers which is really nice. It's just more work than it needs to be, though. Plus I'm not sure perf-wise if my Get/Put in the region observer will perform as well as the other methods (I suspect it's probably equivalent, though).

        Happy New Years!

        Show
        James Taylor added a comment - One other change that would be useful. Instead of returning a boolean return a Long with null meaning that check failed and otherwise the Long represents the timestamp of the Delete (or Put). This conveys more information, as Phoenix caches based on the timestamp of the mutation. If we can get the actual timestamp (in the case where HConstants.LATEST_TIMESTAMP was used in the Delete/Put), then knowing the server timestamp is useful information. Without this, an extra round trip ends up being required later. FWIW, I've been able to work around all these issues with region observers which is really nice. It's just more work than it needs to be, though. Plus I'm not sure perf-wise if my Get/Put in the region observer will perform as well as the other methods (I suspect it's probably equivalent, though). Happy New Years!

          People

          • Assignee:
            Unassigned
            Reporter:
            James Taylor
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development