[HIVE-8120] Umbrella JIRA tracking Parquet improvements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Attachments

Issue Links

blocks

HIVE-8128 Improve Parquet Vectorization

Patch Available

incorporates

HIVE-10666 Improvement for Parquet Predicate Push down on Hive

Open

is blocked by

HIVE-7685 Parquet memory manager

Resolved

HIVE-7858 Parquet compression should be configurable via table property

Resolved

HIVE-9442 Make sure all data types work for PARQUET

Resolved

HIVE-6384 Implement all Hive data types in Parquet

Resolved

is related to

HIVE-6914 parquet-hive cannot write nested map (map value is map)

Resolved

HIVE-7800 Parquet Column Index Access Schema Size Checking

Closed

HIVE-8838 Support Parquet through HCatalog

Closed

relates to

HIVE-5998 Add vectorized reader for Parquet files

Closed

HIVE-4329 HCatalog should use getHiveRecordWriter rather than getRecordWriter

Open

HIVE-11598 Document Configuration for Parquet Files

Open

(1 is blocked by, 3 is related to, 3 relates to)

Sub-Tasks

1.	Create micro-benchmarks for ParquetSerde and evaluate performance	Resolved	Sergio Peña
2.	Make use of SearchArgument classes for Parquet SERDE	Resolved	Ferdinand Xu
3.	Improve Parquet Vectorization	Patch Available	Dong Chen
4.	NanoTimeUtils performs some work needlessly	Closed	Sergio Peña
5.	Implement the bloom filter for the ParquetSerde	Open	Ferdinand Xu
6.	Warn user when parquet mm kicks in	Open	Dong Chen
7.	Move parquet serialize implementation to DataWritableWriter to improve write speeds	Closed	Sergio Peña
8.	Improve the speed of select count(*) statement for a parquet table with big input(~1GB)	Open	Ferdinand Xu
9.	Remove parquet nested objects from wrapper writable objects	Closed	Sergio Peña
10.	Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter	Patch Available	Sergio Peña
11.	Avoid reading file footers in ParquetRecordReaderWrapper	Open	Sergio Peña
12.	Allocate only parquet selected columns in HiveStructConverter class	Patch Available	Sergio Peña
13.	Turn on Parquet vectorization in parquet branch	Patch Available	Dong Chen
14.	Remove duplicated Hive table schema parsing in DataWritableReadSupport	Resolved	Dong Chen
15.	Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor	Resolved	Dong Chen
16.	Override new init API fom ReadSupport instead of the deprecated one	Closed	Ferdinand Xu
17.	Clean up ETypeConverter since Parquet supports timestamp type already	Resolved	Ferdinand Xu
18.	Bump up parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6	Closed	Ferdinand Xu
19.	Use new ParquetInputSplit constructor API	Patch Available	Ferdinand Xu
20.	Enable parquet column index in HIVE	Resolved	Ferdinand Xu
21.	Merge master to parquet 05/20/2015 [Parquet branch]	Resolved	Sergio Peña
22.	Add parquet branch profile to jenkins-submit-build.sh	Closed	Sergio Peña
23.	Parquet: Bump the parquet version up to 1.8.1	Closed	Ferdinand Xu
24.	Get row information on DataWritableWriter once for better writing performance	Closed	Sergio Peña
25.	Merge master to parquet 06/16/2015 [Parquet branch]	Resolved	Ferdinand Xu
26.	Predicate pushing down doesn't work for float type for Parquet	Closed	Ferdinand Xu
27.	A bad performance regression issue with Parquet happens if Hive does not select any columns	Reopened	Ferdinand Xu
28.	Map instances with null keys are not properly handled for Parquet tables	Open	Unassigned
29.	Use * instead of sum(hash(*)) on Parquet predicate (PPD) integration tests	Closed	Sergio Peña
30.	Hive cannot read Parquet decimals backed by INT32 or INT64	Resolved	Unassigned

Activity

People

Assignee:: Brock Noland

Reporter:: Brock Noland

Votes:: 4 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 16/Sep/14 04:53

Updated:: 18/Aug/15 23:28