Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-673

PPD: LTE Point equality comparison is wrong when RG MIN==MAX

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.5.12, 1.6.6, 1.7.0
    • None
    • None

    Description

      Currently LESS_THAN_EQUALS PPD evaluation does not properly handle the Equality corner case where a RG has a single repeating value and thus MIN == MAX:
      As part of the range to point comparison the compare method will return MIN:
      https://github.com/apache/orc/blob/2f98b1a555850051b5081105262c1744dcc14906/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java#L359

      Since the evaluatePredicateMinMax method does explicitly account for that scenario, it will return YES_NO, with the row group ending up being selected (even though it could be avoided):
      https://github.com/apache/orc/blob/2f98b1a555850051b5081105262c1744dcc14906/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java#L658

      Steps to repro on Hive:

      create table tbl2 (fld int, fld1 int) stored as ORC tblproperties('transactional'='true');
      insert into tbl2 values (1,1);
      insert into tbl2 values (2,2);
      insert into tbl2 values (3,3);
      select * from tbl2 where fld > 1 and fld < 3;
      

      Attachments

        Activity

          People

            pgaref Panagiotis Garefalakis
            pgaref Panagiotis Garefalakis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: