Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4376

Wrong results when doing a count(*) on part of directories with metadata cache

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.4.0
    • 1.7.0
    • Metadata
    • None

    Description

      First create some parquet tables in multiple subfolders:

      create table dfs.tmp.`test/201501` as select employee_id, full_name from cp.`employee.json` limit 2;
      create table dfs.tmp.`test/201502` as select employee_id, full_name from cp.`employee.json` limit 2;
      create table dfs.tmp.`test/201601` as select employee_id, full_name from cp.`employee.json` limit 2;
      create table dfs.tmp.`test/201602` as select employee_id, full_name from cp.`employee.json` limit 2;
      

      Running the following query gives the expected count:

      select count(*) from dfs.tmp.`test/20160*`;
      +---------+
      | EXPR$0  |
      +---------+
      | 4       |
      +---------+
      

      But once you create the metadata cache files, the query no longer returns the correct results:

      refresh table metadata dfs.tmp.`test`;
      select count(*) from dfs.tmp.`test/20160*`;
      +---------+
      | EXPR$0  |
      +---------+
      | 2       |
      +---------+
      

      Attachments

        Issue Links

          Activity

            People

              adeneche Abdel Hakim Deneche
              adeneche Abdel Hakim Deneche
              Rahul Kumar Challapalli Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: