Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11081

Partition key scan optimization may return incorrect results when partition file have more than one block

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.1.2, Impala 4.3.0
    • None
    • None

    Description

       In https://issues.apache.org/jira/browse/IMPALA-8834  will only generate one scan range for partition key's scan, but it may cause wrong result. 

      In this case, when a file with more than one block.

      1. The planner will only transforms the first block into TScanRange,  which does not include footer.
      2. The backend can't find the split with the footer,  so that can neither parse the footer nor do the scan.

      so that  the paritition key scan's result will be incorrect. 

       

      see this snippet in HdfsScanNode.java:

       

      private Pair<Boolean, Long> transformBlocksToScanRanges(
          FeFsPartition partition, FileDescriptor fileDesc, 
          boolean fsHasBlocks, long scanRangeBytesLimit, 
          Analyzer analyzer) { 
          for (int i = 0; i < fileDesc.getNumFileBlocks(); ++i) {
            // Only generate one scan range for partition key scans.      
            if (isPartitionKeyScan_) break;
          }
      }

      In FE,  if file with more than one block do partition key scan,  transformBlocksToScanRanges will not include footer range. 

       

      see this snippet in hdfs-scanner.cc:

       

      /// Issue just the footer range for each file. This function is only used /// in parquet and orc scanners. We'll then parse the footer and pick out /// the columns we want.  
      Status HdfsScanner::IssueFooterRanges(HdfsScanNodeBase* scan_node, 
          const THdfsFileFormat::type& file_type, 
          const std::vector<HdfsFileDesc*>& files) {
          // Try to find the split with the footer.    
          ScanRange* footer_split = FindFooterSplit(files[i]);
      }

      In BE, there no footer split won't add range to do the scan. 

       

       

      Attachments

        Issue Links

          Activity

            People

              zhangyifan27 YifanZhang
              carolinchen carolinchen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: