Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11209

Inconsistent results querying tables with subdirectories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Catalog, Frontend
    • None
    • ghx-label-7

    Description

      IMPALA-8454 introduced the recursive listing of table/partition directories. It seems that it is not properly handling if we would like to intentionally disable this new behavior through the impala.disable.recursive.listing=true table property. Within the same session a refresh statement on the table flaps the behavior, see below reproduction steps.

      CREATE EXTERNAL TABLE subdirtest (col1 string) partitioned by (p1 string) TBLPROPERTIES ('impala.disable.recursive.listing'='true');
      
      ALTER TABLE subdirtest ADD PARTITION (p1='A');
      

      then ingest some files into subdirectories

      hdfs dfs -mkdir /warehouse/tablespace/external/hive/subdirtest/p1=A/00
      hdfs dfs -put testdata.parq /warehouse/tablespace/external/hive/subdirtest/p1=A/00/
      

      The "testdata.parq" matches the schema, and has two rows/records.

      [coordinator.example.com:21050] default> refresh subdirtest;
      ...
      [coordinator.example.com:21050] default> select count(*) from subdirtest;
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      
      [coordinator.example.com:21050] default> refresh subdirtest;
      ...
      [coordinator.example.com:21050] default> select count(*) from subdirtest;
      +----------+
      | count(*) |
      +----------+
      | 2        |
      +----------+
      
      [coordinator.example.com:21050] default> refresh subdirtest;
      ...
      [coordinator.example.com:21050] default> select count(*) from subdirtest;
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      
      [coordinator.example.com:21050] default> refresh subdirtest;
      ...
      [coordinator.example.com:21050] default> select count(*) from subdirtest;
      +----------+
      | count(*) |
      +----------+
      | 2        |
      +----------+
      

      This can be reproduced within the same / single impala-shell session (without any other coordinators or load-balancing).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mszurap Miklos Szurap
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: