[DRILL-3209] [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: Query Planning & Optimization, Storage - Hive
Labels:
None

Description

All reads against Hive are currently done through the Hive Serde interface. While this provides the most flexibility, the API is not optimized for maximum performance while reading the data into Drill's native data structures. For Parquet and Text file backed tables, we can plan these reads as Drill native reads. Currently reads of these file types provide untyped data. While parquet has metadata in the file we currently do not make use of the type information while planning. For text files we read all of the files as lists of varchars. In both of these cases, casts will need to be injected to provide the same datatypes provided by the reads through the SerDe interface.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

tpch13-native-scan-off.sys.drill
01/Oct/15 19:13
329 kB
Chun Chang
tpch13-native-scan-on.sys.drill
01/Oct/15 19:13
25 kB
Chun Chang

Issue Links

is depended upon by

DRILL-3678 Plan generating for Drill on Hive takes huge java heap size

Open

links to

Review Board Link

Sub-Tasks

Enhance HiveDrillNativeParquetScan and related classes to support multiple formats.

Open

Unassigned

Activity

People

Assignee:: Venki Korukanti

Reporter:: Jason Altekruse

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/May/15 22:49

Updated:: 01/Oct/15 23:27

Resolved:: 01/Oct/15 23:27