Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14826 Support vectorization for Parquet
  3. HIVE-17696

Vectorized reader does not seem to be pushing down projection columns in certain code paths

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0, 3.0.0
    • None
    • None

    Description

      This is the code snippet from VectorizedParquetRecordReader.java

      MessageType tableSchema;
          if (indexAccess) {
            List<Integer> indexSequence = new ArrayList<>();
      
            // Generates a sequence list of indexes
            for(int i = 0; i < columnNamesList.size(); i++) {
              indexSequence.add(i);
            }
      
            tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, columnNamesList,
              indexSequence);
          } else {
            tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, columnNamesList,
              columnTypesList);
          }
      
          indexColumnsWanted = ColumnProjectionUtils.getReadColumnIDs(configuration);
          if (!ColumnProjectionUtils.isReadAllColumns(configuration) && !indexColumnsWanted.isEmpty()) {
            requestedSchema =
              DataWritableReadSupport.getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted);
          } else {
            requestedSchema = fileSchema;
          }
      
          this.reader = new ParquetFileReader(
            configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns());
      
      

      Couple of things to notice here:

      Most of this code is duplicated from DataWritableReadSupport.init() method.
      the else condition passes in fileSchema instead of using tableSchema like we do in DataWritableReadSupport.init() method. Does this cause projection columns to be missed when we read parquet files? We should probably just reuse ReadContext returned from DataWritableReadSupport.init() method here.

      Attachments

        1. HIVE-17696.patch
          3 kB
          Ferdinand Xu
        2. HIVE-17696.2.patch
          7 kB
          Ferdinand Xu
        3. HIVE-17696-branch-2.patch
          7 kB
          Ferdinand Xu

        Activity

          People

            Ferd Ferdinand Xu
            vihangk1 Vihang Karajgaonkar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: