[HIVE-14474] Create datasource in Druid from Hive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Druid integration
Labels:
None

Description

We want to extend the DruidStorageHandler to support CTAS queries.

In the initial implementation proposed in this issue, we will write the results of the query to HDFS (or the location specified in the CTAS statement), and submit a HadoopIndexing task to the Druid overlord. The task will contain the path where data was stored, it will read it and create the segments in Druid. Once this is done, the results are removed from Hive.

The syntax will be as follows:

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource")
AS <input_query>;

This statement stores the results of query <input_query> in a Druid datasource named 'my_query_based_datasource'. One of the columns of the query needs to be the time dimension, which is mandatory in Druid. In particular, we use the same convention that it is used for Druid: there needs to be a the column named '__time' in the result of the executed query, which will act as the time dimension column in Druid. Currently, the time column dimension needs to be a 'timestamp' type column.

This initial implementation interacts with Druid API as it is currently exposed to the user. In a follow-up issue, we should propose an implementation that integrates tighter with Druid. In particular, we would like to store segments directly in Druid from Hive, thus avoiding the overhead of writing Hive results to HDFS and then launching a MR job that basically reads them again to create the segments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-14474.patch
13/Sep/16 15:03
51 kB
jcamachorodriguez
HIVE-14474.01.patch
14/Sep/16 11:26
55 kB
jcamachorodriguez
HIVE-14474.02.patch
14/Sep/16 13:44
59 kB
jcamachorodriguez
HIVE-14474.03.patch
04/Oct/16 17:41
61 kB
jcamachorodriguez
HIVE-14474.04.patch
05/Oct/16 12:24
62 kB
jcamachorodriguez

Issue Links

is superceded by

HIVE-15277 Teach Hive how to create/delete Druid segments

Closed

links to

GitHub Pull Request #107

Activity

People

Assignee:: Jesús Camacho Rodríguez

Reporter:: Jesús Camacho Rodríguez

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 08/Aug/16 11:59

Updated:: 27/Feb/24 22:24

Resolved:: 15/Dec/16 20:54