Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1553

Reading information from Row group, where there are 0 records of SArg column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.9.2
    • 2.0.0, 1.9.3
    • None
    • None

    Description

      We have created .orc file using Apache ORC library, I can not provide a reproducible way to create such a file.
      We have statistics for 100% row groups, checked with orc dump.

      But I see that when we search by that file we get a very strange behavior:

      TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
      stringStatistics {
      }
      hasNull: false
      
      TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212) to YES_NO_NULL
      DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is included.
      

      If there are 0 values according to existing statistics, so there is obviously no need to read that row group.

      And yet we have YES_NO_NULL decision which forces inclusion of that row group in subsequent operation, which meaningless and bad for performance.

      Attachments

        1. MAJOR-2023-11-21.orc
          15.61 MB
          Alexander Petrossian (PAF)
        2. Снимок экрана 2023-12-21 в 10.00.23.png
          1.61 MB
          Alexander Petrossian (PAF)

        Issue Links

          Activity

            People

              yqzhang Yiqun Zhang
              neopaf Alexander Petrossian (PAF)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: