[TAJO-30] Parquet Integration - ASF JIRA

XML

Word

Printable

JSON

Parquet is a columnar storage format developed by Twitter. Implement Parquet (http://parquet.io/) support for Tajo.

The implementation consists of the following:

ParquetScanner and ParquetAppender - FileScanner and FileAppenders for reading and writing Parquet.
TajoParquetReader and TajoParquetWriter - Top-level reader and writer for serializing/deserializing to Tajo Tuples.
TajoReadSupport and TajoWriteSupport - Abstractions to perform conversion between Parquet and Tajo records.
TajoRecordMaterializer - Materializes Tajo Tuples from Parquet's internal representation.
TajoRecordConverter - Used by TajoRecordMateriailzer to materialize a Tajo Tuple.
TajoSchemaConverter - Converts between Tajo and Parquet schemas.

is blocked by

TAJO-705 CTAS always stores tables with CSV storage type into catalog

is required by

TAJO-714 Enable setting Parquet tuning parameters