Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
Added support for 'STORED AS PARQUET' and for setting parquet as the default storage engine.
Description
Problem Statement:
Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive.
About Parquet:
Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration.
Changes Details:
Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-6367 Implement Decimal in ParquetSerde
- Closed
-
HIVE-6375 Fix CTAS for parquet
- Resolved
-
HIVE-6366 Refactor some items in Hive Parquet
- Open
-
HIVE-6384 Implement all Hive data types in Parquet
- Resolved
- is related to
-
HIVE-25296 Replace parquet-hadoop-bundle dependency with the actual parquet modules
- Open
-
HIVE-5976 Decouple input formats from STORED as keywords
- Closed
- relates to
-
HIVE-5998 Add vectorized reader for Parquet files
- Closed
-
HIVE-6368 Document parquet on hive wiki
- Resolved