Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4449

Wrong results when using metadata cache with specific set of queries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.5.0
    • 1.6.0
    • Storage - Parquet
    • None

    Description

      We are still working on a reproduction but when we have a query similar to this one:

      with q1 as (
      select a.field
      from `table` a
      where <some condition that causes the table to be pruned>
      group by a.field
      having ...
      )
      , q2 as (
      select a.field
      from `table` a
      where <some other pruning condition>
      group by a.field
      )
      select * from (
      select count(*) as cnt from q1
      union all
      select count(*) as cnt from q2
      );
      

      The table is partitioned and both sub queries will force a parquet pruning on the table. Because we share the parquet metadata object in ParquetGroupScan, the second query end up being "over pruned" and we get wrong results.

      The plan doesn't show the problem.

      Attachments

        Activity

          People

            adeneche Abdel Hakim Deneche
            adeneche Abdel Hakim Deneche
            Rahul Kumar Challapalli Rahul Kumar Challapalli
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: