Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4449

Wrong results when using metadata cache with specific set of queries

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.6.0
    • Component/s: Storage - Parquet
    • Labels:
      None

      Description

      We are still working on a reproduction but when we have a query similar to this one:

      with q1 as (
      select a.field
      from `table` a
      where <some condition that causes the table to be pruned>
      group by a.field
      having ...
      )
      , q2 as (
      select a.field
      from `table` a
      where <some other pruning condition>
      group by a.field
      )
      select * from (
      select count(*) as cnt from q1
      union all
      select count(*) as cnt from q2
      );
      

      The table is partitioned and both sub queries will force a parquet pruning on the table. Because we share the parquet metadata object in ParquetGroupScan, the second query end up being "over pruned" and we get wrong results.

      The plan doesn't show the problem.

        Attachments

          Activity

            People

            • Assignee:
              adeneche Abdel Hakim Deneche
              Reporter:
              adeneche Abdel Hakim Deneche
              Reviewer:
              Rahul Kumar Challapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: