Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2328

Parquet scan should use min/max statistics to skip blocks based on predicate

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Parquet stores min/max stats which can be used to skip reading blocks if they don't qualify a certain predicate

      The query below ends up scanning all rows, which is not needed.

      select count(*) from tpch_parquet.lineitem where l_orderkey = -1;
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lv Lars Volker
            mmokhtar Mostafa Mokhtar
            Votes:
            3 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment