HBase
  1. HBase
  2. HBASE-4999

Constraints - Enhance checkAndPut to do atomic arbitrary constraint checks

    Details

    • Tags:
      constraints, CAS

      Description

      Related work: HBASE-4605

      It would be great if checkAndPut (CAS) can be enhanced to not just use a value comparison as a gating factor for the put, but rather have the capability of doing arbitrary constraint checks on the column value (where the current comparinator approach is a subset of possible constraints that can be checked). Commonly used constraints (like comparisons) can be provided out of the box and we should have the ability to accept custom constraints set by the client for the checkAndPut call.

      One use-case would be the ability to implement something like the below in HBase.
      Pseudo sql:
      update table-name
      set column-name = new-value
      where (column-value - new-value) > threshold-value

      ... where the mutation would go through only if the specified constraint in the where clause is true.

      Current options include using a co-processor to do preCheckAndPut/postCheckAndPut constraint checks - but this is not atomic. i.e. the row lock needs to be released by the co-processor before the real checkAndPut call, thus not meeting the atomic requirement.

      Everything above is still meant to be at row level (so, no cross-row constraint checking is implied here).

      And ideal end result would be that an HBase client would be able to specify a set of constraints on multiple column qualifiers as part of the checkAndPut call. The call goes through if all the constraints are satisfied or doesn't if any of the constraints fail. And the above checkAndPut should be atomically executed (just like current checkAndPut semantics).

        Issue Links

          Activity

          Suraj Varma created issue -
          Jesse Yates made changes -
          Field Original Value New Value
          Link This issue is blocked by HBASE-4605 [ HBASE-4605 ]
          Hide
          Jesse Yates added a comment -

          Looking through the architecture for checkAndPut (as well as following up the discussion here: http://search-hadoop.com/m/SgP0l1gb0TD), I think we can support this fairly easily.

          My thought would be that we essentially have a CheckCurrentConstraint that looks something like:

          public class CheckCurrentConstraint{
          	public abstract void check(Put p, Result r) throws ConstraintException;
          }
          

          So you have the current Put we want to make and then the actual row that we pull from the table in doing the check.

          This would be run in preCheckAndPut (or just preCheck). There may need to be a little jiggering in the HRegion around when this is is actually run, to ensure that we actually obtain the row lock.

          However, since the row lock would be taken for the row we are checking, no other puts are going to interfere and since we can use the MVCC to get concurrent reads out of the DB, the most up-to-date version should be retrievable without a problem.

          I'm not a big fan of passing in the constraint on the client side and then running it on the server - that seems to break a lot of the intended functionality of constraints which should essentially act as a safeguard on your table. They should be something always run to make sure bad things aren't put into your table. Right now, they are able to use the configuration to make them highly maleable to running on different CFs and CQs (or not), but these should be things over the lifetime of the table. I can see a use case where client-side specification might be useful occasionally, but IMHO the general case is that it should be far more common to just set up the constraints once on the table according to organization policy and then modify them as necessary.

          Common constraints should be added later when we actually figure out what common use cases are for constraints - as a new feature we want to make sure we don't cowboy in and start adding excessive code willy-nilly. It tends to be a lot harder to remove code once its in, rather than add it later.

          Show
          Jesse Yates added a comment - Looking through the architecture for checkAndPut (as well as following up the discussion here: http://search-hadoop.com/m/SgP0l1gb0TD ), I think we can support this fairly easily. My thought would be that we essentially have a CheckCurrentConstraint that looks something like: public class CheckCurrentConstraint{ public abstract void check(Put p, Result r) throws ConstraintException; } So you have the current Put we want to make and then the actual row that we pull from the table in doing the check. This would be run in preCheckAndPut (or just preCheck). There may need to be a little jiggering in the HRegion around when this is is actually run, to ensure that we actually obtain the row lock. However, since the row lock would be taken for the row we are checking, no other puts are going to interfere and since we can use the MVCC to get concurrent reads out of the DB, the most up-to-date version should be retrievable without a problem. I'm not a big fan of passing in the constraint on the client side and then running it on the server - that seems to break a lot of the intended functionality of constraints which should essentially act as a safeguard on your table. They should be something always run to make sure bad things aren't put into your table. Right now, they are able to use the configuration to make them highly maleable to running on different CFs and CQs (or not), but these should be things over the lifetime of the table. I can see a use case where client-side specification might be useful occasionally, but IMHO the general case is that it should be far more common to just set up the constraints once on the table according to organization policy and then modify them as necessary. Common constraints should be added later when we actually figure out what common use cases are for constraints - as a new feature we want to make sure we don't cowboy in and start adding excessive code willy-nilly. It tends to be a lot harder to remove code once its in, rather than add it later.
          Hide
          Suraj Varma added a comment -

          I see this being used similar to a Filter which is specified client side and executes on the server. Just like we have some out of the box filters (added over time as well), I see some being provided out of the box. The ability to add client specified custom constraints would be akin to custom server side filters.

          I think specifying this completely on the server side via configuration (e.g. a constraint like column_name > threshold-value) may work & be sufficient for some cases. I'm thinking of cases where the column names are dynamic (i.e. have ids or attribute values embedded in them) and cannot be specified purely server side via configuration. For these, we will need client specified constraints. Similarly, if I have multiple constraints on different column qualifiers that are a subset of all the qualifiers, the ability to specify this client side is much more flexible.

          So - again - this would work similar to how filters work, in my mind.

          Show
          Suraj Varma added a comment - I see this being used similar to a Filter which is specified client side and executes on the server. Just like we have some out of the box filters (added over time as well), I see some being provided out of the box. The ability to add client specified custom constraints would be akin to custom server side filters. I think specifying this completely on the server side via configuration (e.g. a constraint like column_name > threshold-value) may work & be sufficient for some cases. I'm thinking of cases where the column names are dynamic (i.e. have ids or attribute values embedded in them) and cannot be specified purely server side via configuration. For these, we will need client specified constraints. Similarly, if I have multiple constraints on different column qualifiers that are a subset of all the qualifiers, the ability to specify this client side is much more flexible. So - again - this would work similar to how filters work, in my mind.
          Hide
          Jesse Yates added a comment -

          this would work similar to how filters work, in my mind.

          Yup, I can see that. Maybe that is going to be a rather big piece - we should consider moving that to its own ticket.

          Show
          Jesse Yates added a comment - this would work similar to how filters work, in my mind. Yup, I can see that. Maybe that is going to be a rather big piece - we should consider moving that to its own ticket.
          Suraj Varma made changes -
          Fix Version/s 0.94.0 [ 12316419 ]
          Affects Version/s 0.92.0 [ 12314223 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Suraj Varma
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development