[HIVE-15277] Teach Hive how to create/delete Druid segments - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: Druid integration
Labels:
- TODOC2.2

Description

We want to extend the DruidStorageHandler to support CTAS queries.
In this implementation Hive will generate druid segment files and insert the metadata to signal the handoff to druid.

The syntax will be as follows:

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "datasourcename")
AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, `metric2`....>;

This statement stores the results of query <input_query> in a Druid datasource named 'datasourcename'. One of the columns of the query needs to be the time dimension, which is mandatory in Druid. In particular, we use the same convention that it is used for Druid: there needs to be a the column named '__time' in the result of the executed query, which will act as the time dimension column in Druid. Currently, the time column dimension needs to be a 'timestamp' type column.
metrics can be of type long, double and float while dimensions are strings. Keep in mind that druid has a clear separation between dimensions and metrics, therefore if you have a column in hive that contains number and need to be presented as dimension use the cast operator to cast as string.
This initial implementation interacts with Druid Meta data storage to add/remove the table in druid, user need to supply the meta data config as --hiveconf hive.druid.metadata.password=XXX --hiveconf hive.druid.metadata.username=druid --hiveconf hive.druid.metadata.uri=jdbc:mysql://host/druid

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

file.patch
24/Nov/16 00:05
177 kB
Slim Bouguerra
HIVE-15277.2.patch
28/Nov/16 19:41
246 kB
Slim Bouguerra
HIVE-15277.patch
15/Dec/16 18:37
259 kB
Slim Bouguerra
HIVE-15277.patch
15/Dec/16 13:46
259 kB
Slim Bouguerra
HIVE-15277.patch
14/Dec/16 20:41
258 kB
Slim Bouguerra
HIVE-15277.patch
14/Dec/16 19:30
260 kB
Slim Bouguerra
HIVE-15277.patch
14/Dec/16 19:23
259 kB
Slim Bouguerra
HIVE-15277.patch
14/Dec/16 02:35
255 kB
Slim Bouguerra
HIVE-15277.patch
28/Nov/16 19:34
177 kB
Slim Bouguerra

Issue Links

contains

HIVE-15303 Upgrade to Druid 0.9.2

Resolved

is related to

HIVE-15809 Typo in the PostgreSQL database name for druid service

Resolved

HIVE-18196 Druid Mini Cluster to run Qtests integrations tests.

Closed

supercedes

HIVE-14474 Create datasource in Druid from Hive

Resolved

links to

GitHub Pull Request #120

Activity

People

Assignee:: Slim Bouguerra

Reporter:: Slim Bouguerra

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Nov/16 00:02

Updated:: 27/Feb/24 22:24

Resolved:: 15/Dec/16 20:52