HBase
  1. HBase
  2. HBASE-2000 Coprocessors
  3. HBASE-1002

Coprocessors: Support small query language as filter on server side

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Filters
    • Labels:
      None

      Description

      Improve the usability of filters by making them specifiable or executable using a little query language.

      For example:

      col("entry:price") > 3 && (col("entry:name") = "ABC" || col("entry:name") = "XYZ")

      Can be implemented as a little language compiler that takes filter specifications as input and builds the requisite hierarchy of filter API classes and actions as emitted java code.

      Compiler can be a utility class, something like:

      Scan scan = new Scan();
      scan.addFamily(Bytes.toBytes("entry"));
      // ...
      scan.setFilter(Filter.compile("col(\"entry:price\") > 3 && 
      (col(\"entry:name\") =  \"ABC\" || col(\"entry:name\") = \"XYZ\""));
      // ...
      

      or even something like

      Scan scan = Filter.compileScan("col(\"entry:price\") > 3 && 
      (col(\"entry:name\") =  \"ABC\" || col(\"entry:name\") = \"XYZ\""));
      // ...
      

      Can also be implemented using JRuby snippets sent to the regionserver for execution, but this has troublesome security implications.

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          Here is a conversation about this on IRC:

          (10:41:16 AM) ffgeek200: are there any thoughts regarding a future query languages for row filters?
          (10:42:43 AM) st^Ack_: ffgeek200: how do you mean?
          (10:44:06 AM) ffgeek200: st^Ack_: ie give me all rows where "(int(col("entry:price")) > 3 && col("entry:name")=="ABC" || col("entry:name")="XYZ"
          (10:45:28 AM) apurtell: ffgeek200: filter spec -> little language compiler -> specialized bytecode -> execution on regionserver during scanner traversal ?
          (10:46:11 AM) ffgeek200: apurtell: yes
          (10:46:56 AM) apurtell: ffgeek200: what about filter spec -> little language compiler -> code to build existing (maybe modified a little) filter class heirarchies -> send to regionserver in the current manner?
          (10:49:30 AM) ffgeek200: apurtell: it could be implemented in many ways yes. That is another way. What about something crazy like writing java code that implements a RowFilterInterface method "boolean isFiltered(Row row)", then serialize that class over the network... let Java deal with compilation since it does that well.
          (10:50:01 AM) st^Ack_: or apurtell, how about a jruby filter? You pass it jruby code, and it runs it on every row?
          (10:50:51 AM) ffgeek200: jruby would work. I remember reading about a similar database and they used server-side javascript for this purpose.
          (10:52:16 AM) apurtell: stack,ffgeek200: jruby snippit is good. was going to reply that java serialization only works if the classes are available at each endpoint (java serialization does not ship code afaik).
          (10:54:37 AM) ffgeek200: apurtell: true. I think that would be cleaner than how it is currently done, trying to munge your row filter to do what you want.
          (10:55:12 AM) st^Ack_: apurtell: yes that the classes would have to be on CLASSPATH on either end of the serialization. jruby script would be better (this jruby suggestion is just your filter spec -> little language compiler -> etc. suggestion generalized)
          (10:56:09 AM) apurtell: ffgeek200,stack: downside to jruby snippet is it is an untrusted code upload to regionserver. that's why i suggested using existing classes, which cause only restricted/controlled actions to happen in the regionserver. on the other hand jruby snippets can be managed when access control is added in a manner similar to how rdbms controls stored procedures.
          (10:56:52 AM) st^Ack_: apurtell: you are right
          (10:57:08 AM) st^Ack_: very hard preventing jruby snippet running riot
          (10:57:15 AM) apurtell: stack: indeed
          (10:58:13 AM) ffgeek200: apurtell: postgres allows for sprocs to be in pretty much all popular languages, but I'm not sure if it restricts the sprocs.
          (11:00:53 AM) ffgeek200: example of how they do it with ruby: http://www.april-child.com/blog/2007/05/10/running-ruby-in-postgresql-on-mac-os-x/
          (11:01:42 AM) apurtell: ffeek200: stored procedure access control is rwx by user plus setuid typically, to use a fs metaphor.
          (11:02:02 AM) apurtell: ffgeek200: at least that was what i was referring to.
          (11:03:11 AM) ffgeek200: apurtell: I think it would definitely open a can of worms security-wise. For me it's fine since I'm in control of everything over here, but others may want restrictions on its usage, maybe they would choose to not compile it in.
          (11:04:25 AM) ffgeek200: no matter what security restrictions you impose, they can of course always sit in a while loop and burn CPU.
          (11:05:33 AM) jgray: ffgeek200: postgres has safe and unsafe integration with other languages for stored procedures
          (11:05:43 AM) apurtell: it does seem to me that a little language compiler that builds hierarchies of filters in the current form is a desirable feature. can be some kind of contrib. common query operators can be supported, and the class implementation server side maintains safety. and anything the "compiler" might do can be constructed by hand as desired (no api changes).
          (11:10:34 AM) ffgeek200: jgray: thanks I forgot about that. apurtell: sounds awesome. I'm biased re: postgres since I think it does a good job of this. What if that little language compiler was done for now, calling it something like hbaseql then later on other languages could be implemented, but the default one is hbaseql.

          Show
          Andrew Purtell added a comment - Here is a conversation about this on IRC: (10:41:16 AM) ffgeek200: are there any thoughts regarding a future query languages for row filters? (10:42:43 AM) st^Ack_: ffgeek200: how do you mean? (10:44:06 AM) ffgeek200: st^Ack_: ie give me all rows where "(int(col("entry:price")) > 3 && col("entry:name")=="ABC" || col("entry:name")="XYZ" (10:45:28 AM) apurtell: ffgeek200: filter spec -> little language compiler -> specialized bytecode -> execution on regionserver during scanner traversal ? (10:46:11 AM) ffgeek200: apurtell: yes (10:46:56 AM) apurtell: ffgeek200: what about filter spec -> little language compiler -> code to build existing (maybe modified a little) filter class heirarchies -> send to regionserver in the current manner? (10:49:30 AM) ffgeek200: apurtell: it could be implemented in many ways yes. That is another way. What about something crazy like writing java code that implements a RowFilterInterface method "boolean isFiltered(Row row)", then serialize that class over the network... let Java deal with compilation since it does that well. (10:50:01 AM) st^Ack_: or apurtell, how about a jruby filter? You pass it jruby code, and it runs it on every row? (10:50:51 AM) ffgeek200: jruby would work. I remember reading about a similar database and they used server-side javascript for this purpose. (10:52:16 AM) apurtell: stack,ffgeek200: jruby snippit is good. was going to reply that java serialization only works if the classes are available at each endpoint (java serialization does not ship code afaik). (10:54:37 AM) ffgeek200: apurtell: true. I think that would be cleaner than how it is currently done, trying to munge your row filter to do what you want. (10:55:12 AM) st^Ack_: apurtell: yes that the classes would have to be on CLASSPATH on either end of the serialization. jruby script would be better (this jruby suggestion is just your filter spec -> little language compiler -> etc. suggestion generalized) (10:56:09 AM) apurtell: ffgeek200,stack: downside to jruby snippet is it is an untrusted code upload to regionserver. that's why i suggested using existing classes, which cause only restricted/controlled actions to happen in the regionserver. on the other hand jruby snippets can be managed when access control is added in a manner similar to how rdbms controls stored procedures. (10:56:52 AM) st^Ack_: apurtell: you are right (10:57:08 AM) st^Ack_: very hard preventing jruby snippet running riot (10:57:15 AM) apurtell: stack: indeed (10:58:13 AM) ffgeek200: apurtell: postgres allows for sprocs to be in pretty much all popular languages, but I'm not sure if it restricts the sprocs. (11:00:53 AM) ffgeek200: example of how they do it with ruby: http://www.april-child.com/blog/2007/05/10/running-ruby-in-postgresql-on-mac-os-x/ (11:01:42 AM) apurtell: ffeek200: stored procedure access control is rwx by user plus setuid typically, to use a fs metaphor. (11:02:02 AM) apurtell: ffgeek200: at least that was what i was referring to. (11:03:11 AM) ffgeek200: apurtell: I think it would definitely open a can of worms security-wise. For me it's fine since I'm in control of everything over here, but others may want restrictions on its usage, maybe they would choose to not compile it in. (11:04:25 AM) ffgeek200: no matter what security restrictions you impose, they can of course always sit in a while loop and burn CPU. (11:05:33 AM) jgray: ffgeek200: postgres has safe and unsafe integration with other languages for stored procedures (11:05:43 AM) apurtell: it does seem to me that a little language compiler that builds hierarchies of filters in the current form is a desirable feature. can be some kind of contrib. common query operators can be supported, and the class implementation server side maintains safety. and anything the "compiler" might do can be constructed by hand as desired (no api changes). (11:10:34 AM) ffgeek200: jgray: thanks I forgot about that. apurtell: sounds awesome. I'm biased re: postgres since I think it does a good job of this. What if that little language compiler was done for now, calling it something like hbaseql then later on other languages could be implemented, but the default one is hbaseql.
          Hide
          Jeff Hammerbacher added a comment -

          Note that Google has something similar according to this 2008 article from the SIGMOD Record: http://turing.cs.washington.edu/papers/dataprojects-google-sigmodrecord08.pdf

          Show
          Jeff Hammerbacher added a comment - Note that Google has something similar according to this 2008 article from the SIGMOD Record: http://turing.cs.washington.edu/papers/dataprojects-google-sigmodrecord08.pdf
          Hide
          Lars George added a comment -

          BTW, I always wondered if a "RhinoFilter" would make sense, i.e. use the Rhino JavaScript engine to interpret the filter rule. I assumed JS is good as far as security is concerned as the context can be limited to just what is needed. I had looked into other Java based solutions like Groovy or Janino. Are they now viable option?

          Show
          Lars George added a comment - BTW, I always wondered if a "RhinoFilter" would make sense, i.e. use the Rhino JavaScript engine to interpret the filter rule. I assumed JS is good as far as security is concerned as the context can be limited to just what is needed. I had looked into other Java based solutions like Groovy or Janino. Are they now viable option?
          Hide
          Jeff Hammerbacher added a comment -

          Groovy was used in Cassandra for this purpose. I don't think making people learn Groovy is worthwhile. JavaScript does seem more likely to be used by the web dev crowd.

          Show
          Jeff Hammerbacher added a comment - Groovy was used in Cassandra for this purpose. I don't think making people learn Groovy is worthwhile. JavaScript does seem more likely to be used by the web dev crowd.
          Hide
          Andrew Purtell added a comment -

          This issue is subsumed by HBASE-2396 and others.

          Show
          Andrew Purtell added a comment - This issue is subsumed by HBASE-2396 and others.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development