Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1553

Reading information from Row group, where there are 0 records of SArg column

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.9.2
    • 2.0.0, 1.9.3
    • None
    • None

    Description

      We have created .orc file using Apache ORC library, I can not provide a reproducible way to create such a file.
      We have statistics for 100% row groups, checked with orc dump.

      But I see that when we search by that file we get a very strange behavior:

      TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
      stringStatistics {
      }
      hasNull: false
      
      TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212) to YES_NO_NULL
      DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is included.
      

      If there are 0 values according to existing statistics, so there is obviously no need to read that row group.

      And yet we have YES_NO_NULL decision which forces inclusion of that row group in subsequent operation, which meaningless and bad for performance.

      Attachments

        1. Снимок экрана 2023-12-21 в 10.00.23.png
          1.61 MB
          Alexander Petrossian (PAF)
        2. MAJOR-2023-11-21.orc
          15.61 MB
          Alexander Petrossian (PAF)

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yqzhang Yiqun Zhang
            neopaf Alexander Petrossian (PAF)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment