HBase
  1. HBase
  2. HBASE-2396

Coprocessors: Server side dynamic scripting language execution

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
    • Tags:
      beginner

      Description

      There are a lot of use cases where users want to perform some simple operations on the region server. For example, a row may represent a Set and users want append/search/remove style operations within the row without having to perform the work on the client side. One possible solution is to embed a small language something like PL/SQL (not necessarily in syntax) which restricts users to a safe set of operations.

        Issue Links

          Activity

          Hide
          stack added a comment -

          This looks related to HBASE-1002

          Show
          stack added a comment - This looks related to HBASE-1002
          Hide
          Andrew Purtell added a comment -

          In the context of filters (HBASE-1002) we mentioned shipping up jruby scriptlets as an option for that. Given that HBase already carries the jruby jar for the shell, this seems a natural choice for that and more. If so it would be better if after receiving initially a String to execute an intermediate/compiled form can be cached for savings over repeated invocations, like http://kenai.com/projects/jruby/pages/JRubyCompiler.

          Show
          Andrew Purtell added a comment - In the context of filters ( HBASE-1002 ) we mentioned shipping up jruby scriptlets as an option for that. Given that HBase already carries the jruby jar for the shell, this seems a natural choice for that and more. If so it would be better if after receiving initially a String to execute an intermediate/compiled form can be cached for savings over repeated invocations, like http://kenai.com/projects/jruby/pages/JRubyCompiler .
          Hide
          Andrew Purtell added a comment -

          One possible solution is to embed a small language something like PL/SQL (not necessarily in syntax)

          I made a minor edit of summary to broaden scope.

          Show
          Andrew Purtell added a comment - One possible solution is to embed a small language something like PL/SQL (not necessarily in syntax) I made a minor edit of summary to broaden scope.
          Hide
          Andrew Purtell added a comment -

          As discussed at the hackathon, the first cut of this should be an Endpoint that accepts code to execute as a string and sends back results. As Result?

          Later we can optimize through a facility for registering stored scripts and referencing them by name in RPCs.

          Show
          Andrew Purtell added a comment - As discussed at the hackathon, the first cut of this should be an Endpoint that accepts code to execute as a string and sends back results. As Result? Later we can optimize through a facility for registering stored scripts and referencing them by name in RPCs.
          Hide
          stack added a comment -

          Talking in the hallway later, thought was that we'd pass two Strings – the name of the interpreter and the String we want to run. The first cut at a coprocessor would use java scripting – http://download.oracle.com/javase/6/docs/technotes/guides/scripting/programmer_guide/index.html – and do something like this:

                  ScriptEngine engine = factory.getEngineByName("JavaScript");
                  // evaluate JavaScript code from String
                  engine.eval("print('Hello, World')");
          

          Yeah, I suppose we'd return a Result? Maybe we should return a String? JSON String?

          Show
          stack added a comment - Talking in the hallway later, thought was that we'd pass two Strings – the name of the interpreter and the String we want to run. The first cut at a coprocessor would use java scripting – http://download.oracle.com/javase/6/docs/technotes/guides/scripting/programmer_guide/index.html – and do something like this: ScriptEngine engine = factory.getEngineByName( "JavaScript" ); // evaluate JavaScript code from String engine.eval( "print('Hello, World')" ); Yeah, I suppose we'd return a Result? Maybe we should return a String? JSON String?
          Hide
          Andrew Purtell added a comment -

          Coprocessors provide two extension surfaces, Observers (triggers) and Endpoints (stored procedures). We can provide access to both in a first cut via a system coprocessor that manages scriptlet execution. Consider:

          • Ruby embedding by default, since the JRuby jar is already available.
          • JavaScript embedding, since this will be a very popular request if it is not available as an option, and since packaging Rhino into the scripting coprocessor artifact with Maven should be easy enough.
          • Support storing scriptlets for trigger-style execution at table, column[:qualifier], or row scope.
            • User should be able to specify if the scriptlet should run at read time or write time or both.
            • Store scriptlet state in a metacolumn, similar to HBASE-2893, but privately managed to punt on issues of cross coprocessor dependencies and API invocation.
            • The scriptlet execution host can wrap every Get or Scan with a custom filter that transforms or generates values according to entries in the metacolumn scanned internally at setup time. Implies that wherever the user specifies the location of a generator instead of a real value we must still store a placeholder.
            • We also need to consider how this wrapper will interact with the AccessController's RegionScanner wrapper: Because the AccessController is first in any CP chain by priority it will already be filtering out placeholders the current subject doesn't have read or write access to, but how to handle EXEC permission may need some thought.
          • Restrict scriptlets as observers to DML operations.
            • We can expose a callback interface in the scripting environment on region operations with a small and familiar Document Object Model. Set up the DOM in the scripting environment(s) when the scriptlet host initializes. Call up into the DOM from Observer hooks at the Java level. See JRuby embedding and Rhino embedding.
          • Provide the Endpoint interface Stack mentioned in the above comment.
            • The first cut Exec API could be String execute(String language, String script)
          Show
          Andrew Purtell added a comment - Coprocessors provide two extension surfaces, Observers (triggers) and Endpoints (stored procedures). We can provide access to both in a first cut via a system coprocessor that manages scriptlet execution. Consider: Ruby embedding by default, since the JRuby jar is already available. JavaScript embedding, since this will be a very popular request if it is not available as an option, and since packaging Rhino into the scripting coprocessor artifact with Maven should be easy enough. Support storing scriptlets for trigger-style execution at table, column [:qualifier] , or row scope. User should be able to specify if the scriptlet should run at read time or write time or both. Store scriptlet state in a metacolumn, similar to HBASE-2893 , but privately managed to punt on issues of cross coprocessor dependencies and API invocation. The scriptlet execution host can wrap every Get or Scan with a custom filter that transforms or generates values according to entries in the metacolumn scanned internally at setup time. Implies that wherever the user specifies the location of a generator instead of a real value we must still store a placeholder. We also need to consider how this wrapper will interact with the AccessController's RegionScanner wrapper: Because the AccessController is first in any CP chain by priority it will already be filtering out placeholders the current subject doesn't have read or write access to, but how to handle EXEC permission may need some thought. Restrict scriptlets as observers to DML operations. We can expose a callback interface in the scripting environment on region operations with a small and familiar Document Object Model. Set up the DOM in the scripting environment(s) when the scriptlet host initializes. Call up into the DOM from Observer hooks at the Java level. See JRuby embedding and Rhino embedding . Provide the Endpoint interface Stack mentioned in the above comment. The first cut Exec API could be String execute(String language, String script)
          Hide
          stack added a comment -

          This'd be a fun/crazy noob project.

          Show
          stack added a comment - This'd be a fun/crazy noob project.
          Hide
          Andrew Purtell added a comment -

          I keep threatening occasionslly my to do list with this. If so I'll assign to myself and drop the noob tag.

          Show
          Andrew Purtell added a comment - I keep threatening occasionslly my to do list with this. If so I'll assign to myself and drop the noob tag.

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:

                Development