[DRILL-3866] Parquet schema details not being utilised for metadata information - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.1.0, 1.2.0
Fix Version/s: None
Component/s: Metadata, Storage - Parquet
Labels:
- Tableau
- features
- metadata
- parquet
- performance
- schema
Environment:

CentOS release 6.3 (Final)
Java jdk1.7.0_79
apache-drill-1.1.0
apache-hadoop-2.7.1
apache-zookeeper-3.4.6

Description

To access parquet files using Tableau, Drill must be configured with individual views for each parquet schema, and every column cast to specific data types before Tableau can access the data correctly, or for that matter even see the list of available tables.

Understandably, this is a necessary requirement for other file formats which do not persist schema information, since Drill does not know the data types for any fields until the query is executed, but why for parquet files ?

Having defined AVRO schemas for each parquet file in the AvroParquetWriter phase, and the parquet files storing the schema as part of the data, couldn't Drill leverage the information from the footers and make it available to reporting tools ?

Also, as part of these investigations some parquet files were created using CTAS. The directory is created and the files contain the data but the tables do not seem to be displayed when we do a SHOW TABLES command. Shouldn't the metadata also be available for these tables ?

I understand that with the new REFRESH TABLE METADATA feature Drill collects all the information from the parquet footers and store it in a cache file, but even in this case Drill does not seem to leverage this information to provide metadata to reporting tools such as Tableau.

I know there have been discussions around this in the past but I could not find a Jira for this specific use-case.

My thanks to Rahul Challapalli of MapR Technologies for his help here.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Chris Mathews

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Sep/15 12:19

Updated:: 30/Sep/15 12:19