[HIVE-24230] Integrate HPL/SQL into HiveServer2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: HiveServer2, hpl/sql
Labels:
- pull-request-available

Target Version/s:

4.0.0

Description

HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since ~~HIVE-24217~~). Currently HPL/SQL depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs.

SELECT * from myStoredProcedure(1, 2);

HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI.

To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter.

jdbc:hive2://localhost:10000/default;hplsqlMode=true

The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures.

Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used.

In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses the hplsql interpreter to execute arbitrary scripts. This operation might spawns new SQLOpertions.

For example consider the following statement:

FOR i in 1..10 LOOP   
  SELECT * FROM table 
END LOOP;

We send this to beeline while we'er in hplsql mode. Hive will create a hplsql interpreter and store it in the session state. A new HplSqlOperation is created to run the script on the interpreter.

HPLSQL knows how to execute the for loop, but i'll call Hive to run the select expression. The HplSqlOperation is notified when the select reads a row and accumulates the rows into a RowSet (memory consumption need to be considered here) which can be retrieved via thrift from the client side.

Attachments

Issue Links

links to

GitHub Pull Request #1633

Activity

People

Assignee:: Attila Magyar

Reporter:: Attila Magyar

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Oct/20 14:35

Updated:: 17/Nov/22 08:48

Resolved:: 03/Dec/20 14:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: