Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14826

Support vectorization for Parquet

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Parquet vectorized reader can improve both throughput and also leverages existing Hive vectorization execution engine. This is an umbrella ticket to track this feature.

      Attachments

        Issue Links

          1.
          Implement Parquet vectorization reader for Primitive types Sub-task Resolved Ferdinand Xu  
          2.
          Micro benchmark for Parquet vectorized reader Sub-task Resolved Colin  
          3.
          Test the predicate pushing down support for Parquet vectorization read path Sub-task Patch Available Ferdinand Xu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          4.
          Implement Parquet vectorization reader for Struct type Sub-task Resolved Ferdinand Xu  
          5.
          Support Nested Column Field Pruning for Parquet Vectorized Reader Sub-task Open Chao Sun  
          6.
          Fix the NullPointer problem caused by split phase Sub-task Resolved Colin  
          7.
          Parquet vectorization doesn't work for tables with partition info Sub-task Closed Colin  
          8.
          ParquetFileReader should be closed to avoid resource leak Sub-task Closed Colin  
          9.
          Measure Performance for Parquet Vectorization Reader Sub-task Open Colin  
          10.
          When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY. Sub-task Closed Colin  
          11.
          Add more q-tests for Hive-on-Spark with Parquet vectorized reader Sub-task Closed Ferdinand Xu  
          12.
          Add a config to turn off parquet vectorization Sub-task Closed Vihang Karajgaonkar  
          13.
          Vectorized reader does not seem to be pushing down projection columns in certain code paths Sub-task Closed Ferdinand Xu  
          14.
          Remove Parquet specific code from VectorizedColumnReader Sub-task Open Unassigned  
          15.
          Parquet vectorization fails on tables with complex columns when there are no projected columns Sub-task Closed Vihang Karajgaonkar  
          16.
          Vectorized reader does push down projection columns for index access schema Sub-task Resolved Unassigned  
          17.
          Implement Parquet vectorization reader for Array type Sub-task Closed Colin  
          18.
          Support column projection for index access when using Parquet Vectorization Sub-task Closed Ferdinand Xu  
          19.
          NPE during initialization of VectorizedParquetRecordReader when input split is null Sub-task Closed Vihang Karajgaonkar  
          20.
          Implement Parquet vectorization reader for Map type Sub-task Closed Colin  
          21.
          Fix API call in VectorizedListColumnReader to get value from BytesColumnVector Sub-task Closed Colin  
          22.
          Support to read multiple level definition for Map type in Parquet file Sub-task Closed Colin  
          23.
          Support vectorization for INTERVAL_DAY_TIME type Sub-task Open Unassigned  
          24.
          Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet Sub-task Closed Vihang Karajgaonkar  
          25.
          Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader Sub-task Closed Colin  
          26.
          Support schema evolution in Parquet Vectorization reader Sub-task Closed Ferdinand Xu  
          27.
          Support to read nested complex type with Parquet in vectorization mode Sub-task Open Haifeng Chen  

          Activity

            People

              Ferd Ferdinand Xu
              Ferd Ferdinand Xu
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m