Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8120

Umbrella JIRA tracking Parquet improvements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Attachments

      Issue Links

        1.
        Create micro-benchmarks for ParquetSerde and evaluate performance Sub-task Resolved Sergio Peña
        2.
        Make use of SearchArgument classes for Parquet SERDE Sub-task Resolved Ferdinand Xu
        3.
        Improve Parquet Vectorization Sub-task Patch Available Dong Chen
        4.
        NanoTimeUtils performs some work needlessly Sub-task Closed Sergio Peña
        5.
        Implement the bloom filter for the ParquetSerde Sub-task Open Ferdinand Xu
        6.
        Warn user when parquet mm kicks in Sub-task Open Dong Chen
        7.
        Move parquet serialize implementation to DataWritableWriter to improve write speeds Sub-task Closed Sergio Peña
        8.
        Improve the speed of select count(*) statement for a parquet table with big input(~1GB) Sub-task Open Ferdinand Xu
        9.
        Remove parquet nested objects from wrapper writable objects Sub-task Closed Sergio Peña
        10.
        Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter Sub-task Patch Available Sergio Peña
        11.
        Avoid reading file footers in ParquetRecordReaderWrapper Sub-task Open Sergio Peña
        12.
        Allocate only parquet selected columns in HiveStructConverter class Sub-task Patch Available Sergio Peña
        13.
        Turn on Parquet vectorization in parquet branch Sub-task Patch Available Dong Chen
        14.
        Remove duplicated Hive table schema parsing in DataWritableReadSupport Sub-task Resolved Dong Chen
        15.
        Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor Sub-task Resolved Dong Chen
        16.
        Override new init API fom ReadSupport instead of the deprecated one Sub-task Closed Ferdinand Xu
        17.
        Clean up ETypeConverter since Parquet supports timestamp type already Sub-task Resolved Ferdinand Xu
        18.
        Bump up parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6 Sub-task Closed Ferdinand Xu
        19.
        Use new ParquetInputSplit constructor API Sub-task Patch Available Ferdinand Xu
        20.
        Enable parquet column index in HIVE Sub-task Resolved Ferdinand Xu
        21.
        Merge master to parquet 05/20/2015 [Parquet branch] Sub-task Resolved Sergio Peña
        22.
        Add parquet branch profile to jenkins-submit-build.sh Sub-task Closed Sergio Peña
        23.
        Parquet: Bump the parquet version up to 1.8.1 Sub-task Closed Ferdinand Xu
        24.
        Get row information on DataWritableWriter once for better writing performance Sub-task Closed Sergio Peña
        25.
        Merge master to parquet 06/16/2015 [Parquet branch] Sub-task Resolved Ferdinand Xu
        26.
        Predicate pushing down doesn't work for float type for Parquet Sub-task Closed Ferdinand Xu
        27.
        A bad performance regression issue with Parquet happens if Hive does not select any columns Sub-task Reopened Ferdinand Xu
        28.
        Map instances with null keys are not properly handled for Parquet tables Sub-task Open Unassigned
        29.
        Use * instead of sum(hash(*)) on Parquet predicate (PPD) integration tests Sub-task Closed Sergio Peña
        30.
        Hive cannot read Parquet decimals backed by INT32 or INT64 Sub-task Resolved Unassigned

        Activity

          People

            brocknoland Brock Noland
            brocknoland Brock Noland
            Votes:
            4 Vote for this issue
            Watchers:
            18 Start watching this issue

            Dates

              Created:
              Updated: