Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14826 Support vectorization for Parquet
  3. HIVE-17696

Vectorized reader does not seem to be pushing down projection columns in certain code paths

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 2.4.0
    • Component/s: None
    • Labels:
      None

      Description

      This is the code snippet from VectorizedParquetRecordReader.java

      MessageType tableSchema;
          if (indexAccess) {
            List<Integer> indexSequence = new ArrayList<>();
      
            // Generates a sequence list of indexes
            for(int i = 0; i < columnNamesList.size(); i++) {
              indexSequence.add(i);
            }
      
            tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, columnNamesList,
              indexSequence);
          } else {
            tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, columnNamesList,
              columnTypesList);
          }
      
          indexColumnsWanted = ColumnProjectionUtils.getReadColumnIDs(configuration);
          if (!ColumnProjectionUtils.isReadAllColumns(configuration) && !indexColumnsWanted.isEmpty()) {
            requestedSchema =
              DataWritableReadSupport.getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted);
          } else {
            requestedSchema = fileSchema;
          }
      
          this.reader = new ParquetFileReader(
            configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns());
      
      

      Couple of things to notice here:

      Most of this code is duplicated from DataWritableReadSupport.init() method.
      the else condition passes in fileSchema instead of using tableSchema like we do in DataWritableReadSupport.init() method. Does this cause projection columns to be missed when we read parquet files? We should probably just reuse ReadContext returned from DataWritableReadSupport.init() method here.

        Attachments

        1. HIVE-17696-branch-2.patch
          7 kB
          Ferdinand Xu
        2. HIVE-17696.patch
          3 kB
          Ferdinand Xu
        3. HIVE-17696.2.patch
          7 kB
          Ferdinand Xu

          Activity

            People

            • Assignee:
              Ferd Ferdinand Xu
              Reporter:
              vihangk1 Vihang Karajgaonkar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: