Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
      None

      Description

      Some features like TTLs or access control lists have use cases that call for per-value configurability.

      Currently in HBase TTLs are set per column family. This leads to potentially awkward "bucketing" of values into column families set up to accommodate the common desired TTLs for all values within – an unnecessarily wide schema, with resulting unnecessary reduction in I/O locality in access patterns, more store files than otherwise, and so on.

      Over in HBASE-1697 we're considering setting ACLs on column families. However, we are aware of other BT-like systems which support per-value ACLs. This allows for multitenancy in a single table as opposed to really requiring tables for each customer (or, at least column families). The scale out properties for a single table are better than alternatives. I think supporting per-row ACLs would be generally sufficient: customer ID could be part of the row key. We can still plan to maintain column-family level ACLs. We would therefore not have to bloat the store with per-row ACLs for the normal case – but it would be highly useful to support overrides for particular rows. So how to do that?

      I propose to introduce metacolumns.

      A metacolumn would be a column family intrinsic to every table, created by the system at table create time. It would be accessible like any other column family, but we expect a default ACL that only allows access by the system and operator principals, and would function like any other, except administrative actions such as renaming or deletion would not be allowed. Into the metacolumn would be stored per-row overrides for such things as ACLs and TTLs. The metacolumn therefore would be as sparse as possible; no storage would required for any overrides if a value is committed with defaults. A reasonably sparse metacolumn for a region may fit entirely within blockcache. It may be possible for all metacolumns on a RS to fit within blockcache without undue pressure on other users. We can aim design effort at this target.

      The scope of changes required to support this is:

      • Introduce metacolumn concept in the code and into the security model (default ACL): A flag in HCD, a default ACL, and a few additional checks for rejecting disallowed administrative actions.
      • Automatically create metacolumns at table create time.
      • Consult metacolumn as part of processing reads or mutations, perhaps using a bloom filter to shortcut lookups for rows with no metaentries, and apply configuration or security policy overrides if found.

        Issue Links

          Activity

          Hide
          Jonathan Gray added a comment -

          This sounds really interesting andy. I'm a little concerned that this would be rather disruptive to the code but used by a very small portion of users.

          So the default behavior would be to always create the metacolumn family and the read path will always have these checks in it? Maybe this feature itself should be a table-level setting and should try to get all the logic related to this into new classes with just a hook or two into the existing read-time checks.

          The current QueryMatcher/Tracker code paths are starting to get a little messy and I'm a little worried about adding a bunch of new checks to every KV for this or any other feature (there's some work going into some of the seek/reseek optimizations and it's hard to move it forward because adding another couple row checks can be significant if done on every kv).

          In addition, this would break with the pattern of each family able to be processed in isolation. Now, reading of each family will require an additional scanner against the metacolumn family. So, if reading from a 5 family table (+1 for meta), you'd end up reading the metacolumn 5 times, once for each user family? Things like the bloom filter check would have to happen during the read, so at a different level than it's currently done.

          Would this check be first, last, or scattered throughout the read checks? I would guess first but not sure if there are other things desired besides TTL and ACLs that might require some of the existing checks first. I'm not quite sure I understand the TTL use case, this seems like an extremely rare use case where you'd have TTLs applied at row granularity? I suppose this kind of fine-grained policy setting is desirable but I guess it's less clear why you couldn't break stuff up into separate tables for varied TTLs or multi-tenancy. Or if you have these very specific and fine-grained settings like variable TTL you would implement them in your application.

          When do you set this stuff? Would inserts be augmented? Would there be special types of KVs that you could write at the same time you insert the actual data? Above description addresses where it is stored and when it is looked up, but not how it is set. Would Put be extended with per-row setTTL, setACL methods now?

          Out of curiosity, which BT-like systems support per-value ACLs? I don't think I've seen this in any DBs I've worked with.

          Show
          Jonathan Gray added a comment - This sounds really interesting andy. I'm a little concerned that this would be rather disruptive to the code but used by a very small portion of users. So the default behavior would be to always create the metacolumn family and the read path will always have these checks in it? Maybe this feature itself should be a table-level setting and should try to get all the logic related to this into new classes with just a hook or two into the existing read-time checks. The current QueryMatcher/Tracker code paths are starting to get a little messy and I'm a little worried about adding a bunch of new checks to every KV for this or any other feature (there's some work going into some of the seek/reseek optimizations and it's hard to move it forward because adding another couple row checks can be significant if done on every kv). In addition, this would break with the pattern of each family able to be processed in isolation. Now, reading of each family will require an additional scanner against the metacolumn family. So, if reading from a 5 family table (+1 for meta), you'd end up reading the metacolumn 5 times, once for each user family? Things like the bloom filter check would have to happen during the read, so at a different level than it's currently done. Would this check be first, last, or scattered throughout the read checks? I would guess first but not sure if there are other things desired besides TTL and ACLs that might require some of the existing checks first. I'm not quite sure I understand the TTL use case, this seems like an extremely rare use case where you'd have TTLs applied at row granularity? I suppose this kind of fine-grained policy setting is desirable but I guess it's less clear why you couldn't break stuff up into separate tables for varied TTLs or multi-tenancy. Or if you have these very specific and fine-grained settings like variable TTL you would implement them in your application. When do you set this stuff? Would inserts be augmented? Would there be special types of KVs that you could write at the same time you insert the actual data? Above description addresses where it is stored and when it is looked up, but not how it is set. Would Put be extended with per-row setTTL, setACL methods now? Out of curiosity, which BT-like systems support per-value ACLs? I don't think I've seen this in any DBs I've worked with.
          Hide
          Andrew Purtell added a comment -

          I'm a little concerned that this would be rather disruptive to the code but used by a very small portion of users.

          We anticipate that access control will be a widely used feature if available.

          Impetus for this issue started with considerations for implementing access control.

          However, the adjustable TTL case comes for free if metacolumns are implemented in a more general manner, and is something that would make life easier for some dev groups I am working with.

          One way to address concerns regarding disruption would be to build this – therefore perhaps much of security (HBASE-1697 and subtasks) – on top of coprocessor-style server side extensions (HBASE-2000 and subtasks). I have been considering this approach. It is compelling to consider pulling up all of the functional and performance impact to an extension which can be dynamically loaded per table. The core code is only touched by coprocessor framework changes and the user has full choice in the matter when taking on anything else. On the other hand, support is more challenging, perhaps a lot more. First question: "What extensions do you have loaded, in what combination?" So on balance I recommend if we agree that HBASE-1697 is a core concern, then it and related changes such as this issue should be in core, not an extension.

          this would break with the pattern of each family able to be processed in isolation

          Ack.

          So, if reading from a 5 family table (+1 for meta), you'd end up reading the metacolumn 5 times, once for each user family?

          No, only one time, anything in the metacolumn for the row retrieved in one read.

          Things like the bloom filter check would have to happen during the read, so at a different level than it's currently done.

          At the Region level, yes, for the metacolumn case. So an access to a row in the Region would trigger a read of the metacolumn and then caching of the result to be passed around. Exactly how this would be passed around is unsettled. One option is thread locals.

          This is part of a larger issue related to the security work, that of creating a context (for access control) and then referencing it wherever an authoritative decision must be made. We have been debating if to use JAAS or instead sprinkle around access checks by hand. The issue of building context and passing it around must be dealt with to implement security. If we have it, then passing around KVs read from metacolumns is straightforward.

          Or if you have these very specific and fine-grained settings like variable TTL you would implement them in your application.

          ... and then lose a feature – automatic TTL based expiration and garbage collection with single-table scale out properties – that makes use of HBase compelling as opposed to something else. (Not sure what, if anything, that something else would be.)

          I guess it's less clear why you couldn't break stuff up into separate tables for varied TTLs or multi-tenancy.

          Yes, that's the problem. Single table multitenancy has better scale out properties than per-user tables, and in the HBase case, 1M+ tables for 1M+ users is not tenable.

          For the variable TTL case, consider an event logging application designed to archive data for long periods of time, but the different event types have different lifetimes, and lifetimes may be adjusted over time (updated system design). If a bunch of tables, this requires a join, which HBase does not support. So what you would do is set up column families to each serve as a TTL bucket ("join" over column families). Events could only have the TTL of one of the buckets. Application would store into appropriate column family according to TTL. But this then results in a wide schema, with resulting unnecessary reduction in I/O locality in access patterns, more store files than otherwise, and so on. Design changes require adding or modifying column families, taking the table offline, at least for now. Not necessarily a fatal problem if we can avoid taking the table offline ever after the master rewrite, but if we already have per-row overrides for ACLs then this straightforwardly extends to the TTL case (at least) and that's enough I think to make this problem go away.

          Would Put be extended with per-row setTTL, setACL methods now?

          I would recommend that, yes. The metacolumn is a column family like any other; to set stuff, put values as KVs into the Put to be stored directly. Convenience functions on Put are desirable so the user doesn't have to learn about the value formatting for various overrides.

          So add to this:

          It would be accessible like any other column family, but we expect a default ACL that only allows access by the system and operator principals

          and any principal the table creator adds to the ACL.

          Out of curiosity, which BT-like systems support per-value ACLs?

          It's a rumor. I'll try to find out more.

          Show
          Andrew Purtell added a comment - I'm a little concerned that this would be rather disruptive to the code but used by a very small portion of users. We anticipate that access control will be a widely used feature if available. Impetus for this issue started with considerations for implementing access control. However, the adjustable TTL case comes for free if metacolumns are implemented in a more general manner, and is something that would make life easier for some dev groups I am working with. One way to address concerns regarding disruption would be to build this – therefore perhaps much of security ( HBASE-1697 and subtasks) – on top of coprocessor-style server side extensions ( HBASE-2000 and subtasks). I have been considering this approach. It is compelling to consider pulling up all of the functional and performance impact to an extension which can be dynamically loaded per table. The core code is only touched by coprocessor framework changes and the user has full choice in the matter when taking on anything else. On the other hand, support is more challenging, perhaps a lot more. First question: "What extensions do you have loaded, in what combination?" So on balance I recommend if we agree that HBASE-1697 is a core concern, then it and related changes such as this issue should be in core, not an extension. this would break with the pattern of each family able to be processed in isolation Ack. So, if reading from a 5 family table (+1 for meta), you'd end up reading the metacolumn 5 times, once for each user family? No, only one time, anything in the metacolumn for the row retrieved in one read. Things like the bloom filter check would have to happen during the read, so at a different level than it's currently done. At the Region level, yes, for the metacolumn case. So an access to a row in the Region would trigger a read of the metacolumn and then caching of the result to be passed around. Exactly how this would be passed around is unsettled. One option is thread locals. This is part of a larger issue related to the security work, that of creating a context (for access control) and then referencing it wherever an authoritative decision must be made. We have been debating if to use JAAS or instead sprinkle around access checks by hand. The issue of building context and passing it around must be dealt with to implement security. If we have it, then passing around KVs read from metacolumns is straightforward. Or if you have these very specific and fine-grained settings like variable TTL you would implement them in your application. ... and then lose a feature – automatic TTL based expiration and garbage collection with single-table scale out properties – that makes use of HBase compelling as opposed to something else. (Not sure what, if anything, that something else would be.) I guess it's less clear why you couldn't break stuff up into separate tables for varied TTLs or multi-tenancy. Yes, that's the problem. Single table multitenancy has better scale out properties than per-user tables, and in the HBase case, 1M+ tables for 1M+ users is not tenable. For the variable TTL case, consider an event logging application designed to archive data for long periods of time, but the different event types have different lifetimes, and lifetimes may be adjusted over time (updated system design). If a bunch of tables, this requires a join, which HBase does not support. So what you would do is set up column families to each serve as a TTL bucket ("join" over column families). Events could only have the TTL of one of the buckets. Application would store into appropriate column family according to TTL. But this then results in a wide schema, with resulting unnecessary reduction in I/O locality in access patterns, more store files than otherwise, and so on. Design changes require adding or modifying column families, taking the table offline, at least for now. Not necessarily a fatal problem if we can avoid taking the table offline ever after the master rewrite, but if we already have per-row overrides for ACLs then this straightforwardly extends to the TTL case (at least) and that's enough I think to make this problem go away. Would Put be extended with per-row setTTL, setACL methods now? I would recommend that, yes. The metacolumn is a column family like any other; to set stuff, put values as KVs into the Put to be stored directly. Convenience functions on Put are desirable so the user doesn't have to learn about the value formatting for various overrides. So add to this: It would be accessible like any other column family, but we expect a default ACL that only allows access by the system and operator principals and any principal the table creator adds to the ACL. Out of curiosity, which BT-like systems support per-value ACLs? It's a rumor. I'll try to find out more.
          Hide
          Jonathan Gray added a comment -

          I would be for trying to get this and stuff like it into a coprocessor-style implementation where we have a constant overhead moving forward to check for the existence of these things, but then any overhead introduced by these new features does not impact non-users.

          I'm sure DAC is something people will use, just like it's something people use in mysql, but I imagine it will be used less-so than it is in that context (given how many applications are not multi-tenant or user-facing). And then with those users that do take advantage, I doubt they will have 1M users or will need this granularity of security. So while I definitely see lots of value in DAC I don't necessarily see this specific feature as a requirement for most of that value to most users. Having said that, I think this is cool and worth exploring, just seems significantly more disruptive to implement DAC via metacolumns than just through family meta data.

          Would the plan be to do the DAC/ACL stuff without this and then add it? Or would this be a required piece of any implementation?

          Show
          Jonathan Gray added a comment - I would be for trying to get this and stuff like it into a coprocessor-style implementation where we have a constant overhead moving forward to check for the existence of these things, but then any overhead introduced by these new features does not impact non-users. I'm sure DAC is something people will use, just like it's something people use in mysql, but I imagine it will be used less-so than it is in that context (given how many applications are not multi-tenant or user-facing). And then with those users that do take advantage, I doubt they will have 1M users or will need this granularity of security. So while I definitely see lots of value in DAC I don't necessarily see this specific feature as a requirement for most of that value to most users. Having said that, I think this is cool and worth exploring, just seems significantly more disruptive to implement DAC via metacolumns than just through family meta data. Would the plan be to do the DAC/ACL stuff without this and then add it? Or would this be a required piece of any implementation?
          Hide
          Andrew Purtell added a comment -

          I would be for trying to get this and stuff like it into a coprocessor-style implementation

          I do like that idea too, if HBASE-1697 is not a core concern. It sounds like that is your opinion, that HBASE-1697 is not, correct?

          It just seems significantly more disruptive to implement DAC via metacolumns than just through family meta data.

          Would be both implementing DAC via metacolumns and via family meta data, so the metacolumn can be sparse as possible, empty for the normal case, at least providing this to the designer.

          Would the plan be to do the DAC/ACL stuff without this and then add it? Or would this be a required piece of any implementation?

          Not required if maintaining ACLs on column families only. Not required if maintaining current situation with per-column family TTLs. Something like this would be necessary for per-row granularity I think.

          I doubt they will have 1M users

          Not just 1M users, I can envision probable applications with 100M+ users, actually. Can't have 100M tables, can't have 100M column families.

          Show
          Andrew Purtell added a comment - I would be for trying to get this and stuff like it into a coprocessor-style implementation I do like that idea too, if HBASE-1697 is not a core concern. It sounds like that is your opinion, that HBASE-1697 is not, correct? It just seems significantly more disruptive to implement DAC via metacolumns than just through family meta data. Would be both implementing DAC via metacolumns and via family meta data, so the metacolumn can be sparse as possible, empty for the normal case, at least providing this to the designer. Would the plan be to do the DAC/ACL stuff without this and then add it? Or would this be a required piece of any implementation? Not required if maintaining ACLs on column families only. Not required if maintaining current situation with per-column family TTLs. Something like this would be necessary for per-row granularity I think. I doubt they will have 1M users Not just 1M users, I can envision probable applications with 100M+ users, actually. Can't have 100M tables, can't have 100M column families.
          Hide
          Jonathan Gray added a comment -

          HBASE-1697 is not a core concern for me personally. But I definitely want to see this stuff and I think it would be awesome if HBase could support DAC for 100M+ users... I just don't think these use cases are necessarily core concerns. I'm a bit worried about this touching a lot of code if done one-off.

          Coprocessors can give us the opportunity to create all of the necessary hooks for these types of applications and then not have to deal with mucking up core server code every time we want to add features. We might actually be able to re-introduce contrib modules if we had a versioned coprocessor API they could hook into.

          Similarly to this, right now on the read side of things, once we get the seek/reseek optimizations in place we are going to add a new method to the filter interface so that any filter can pass seeking hints in. We have a few very specific queries that we want to build specialized filters for but don't want to keep touching core code. I think this is a good pattern. I could see adding another hook to the filter to get the metacolumn data at the start of each row as well

          The combination of persisted, row-level meta data and coprocessors is a pretty awesome one.

          I have a few other high-pri items I need to finish up but I'm hoping to get my hands dirty with coprocessors soon. These metacolumns could be used for all sorts of stuff and I'd definitely be interested in helping out on this.

          Show
          Jonathan Gray added a comment - HBASE-1697 is not a core concern for me personally. But I definitely want to see this stuff and I think it would be awesome if HBase could support DAC for 100M+ users... I just don't think these use cases are necessarily core concerns. I'm a bit worried about this touching a lot of code if done one-off. Coprocessors can give us the opportunity to create all of the necessary hooks for these types of applications and then not have to deal with mucking up core server code every time we want to add features. We might actually be able to re-introduce contrib modules if we had a versioned coprocessor API they could hook into. Similarly to this, right now on the read side of things, once we get the seek/reseek optimizations in place we are going to add a new method to the filter interface so that any filter can pass seeking hints in. We have a few very specific queries that we want to build specialized filters for but don't want to keep touching core code. I think this is a good pattern. I could see adding another hook to the filter to get the metacolumn data at the start of each row as well The combination of persisted, row-level meta data and coprocessors is a pretty awesome one. I have a few other high-pri items I need to finish up but I'm hoping to get my hands dirty with coprocessors soon. These metacolumns could be used for all sorts of stuff and I'd definitely be interested in helping out on this.
          Hide
          Todd Lipcon added a comment -

          I agree with Jonathan's sentiment that we should try to fit this kind of thing into a framework rather than core if possible.

          Regarding the use case of per-cell ACLs, it is a requirement for a lot of government users, where each piece of information may have a different security clearance, and clearance is very granularly controlled. I could see implementing this, though, by using a coprocessor which intercepts all reads/writes and for every column cf:foo first checks a cf:_acl_foo before returning results or passing through the write

          Regarding the multitenancy use case, I imagine an infrastructure-as-a-service deployment of HBase would probably be going through some intermediary layer anyway to give users the illusion that they aren't on a shared deployment. EG any access would have "user_foo_" prepended to all row keys. Having security integration is important to authenticate the user, but per-row ACLs seems expensive for that use case.

          Show
          Todd Lipcon added a comment - I agree with Jonathan's sentiment that we should try to fit this kind of thing into a framework rather than core if possible. Regarding the use case of per-cell ACLs, it is a requirement for a lot of government users, where each piece of information may have a different security clearance, and clearance is very granularly controlled. I could see implementing this, though, by using a coprocessor which intercepts all reads/writes and for every column cf:foo first checks a cf:_acl_foo before returning results or passing through the write Regarding the multitenancy use case, I imagine an infrastructure-as-a-service deployment of HBase would probably be going through some intermediary layer anyway to give users the illusion that they aren't on a shared deployment. EG any access would have "user_foo_" prepended to all row keys. Having security integration is important to authenticate the user, but per-row ACLs seems expensive for that use case.
          Hide
          Andrew Purtell added a comment -

          Regarding the multitenancy use case, I imagine an infrastructure-as-a-service deployment of HBase would probably be going through some intermediary layer anyway to give users the illusion that they aren't on a shared deployment. EG any access would have "user_foo_" prepended to all row keys.

          Yes, this is what "multiuser mode" in Stargate (the version up on GitHub) does, for example.

          Looking at some hypothetical IaaS use case may be overly limiting.

          For example, we (as an "enterprise" HBase user) would like to consider multitenancy in our own private infrastructure with direct (Java) API access for performance – secure RPC, of course.

          Having security integration is important to authenticate the user, but per-row ACLs seems expensive for that use case.

          Per-row overrides. And not just of ACLs.

          I could see implementing this, though, by using a coprocessor which intercepts all reads/writes and for every column cf:foo first checks a cf:_acl_foo before returning results or passing through the write

          I wouldn't want to mix metadata with data in the same CF so as to not limit application keyspace, but that's just a personal design preference of mine.

          Show
          Andrew Purtell added a comment - Regarding the multitenancy use case, I imagine an infrastructure-as-a-service deployment of HBase would probably be going through some intermediary layer anyway to give users the illusion that they aren't on a shared deployment. EG any access would have "user_foo_" prepended to all row keys. Yes, this is what "multiuser mode" in Stargate (the version up on GitHub) does, for example. Looking at some hypothetical IaaS use case may be overly limiting. For example, we (as an "enterprise" HBase user) would like to consider multitenancy in our own private infrastructure with direct (Java) API access for performance – secure RPC, of course. Having security integration is important to authenticate the user, but per-row ACLs seems expensive for that use case. Per-row overrides . And not just of ACLs. I could see implementing this, though, by using a coprocessor which intercepts all reads/writes and for every column cf:foo first checks a cf:_acl_foo before returning results or passing through the write I wouldn't want to mix metadata with data in the same CF so as to not limit application keyspace, but that's just a personal design preference of mine.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development