Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7559

Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

    XMLWordPrintableJSON

Details

    Description

      UPDATE: the issue turned out to be different than I first thought, see my last comment. I will update the description with more details later.

      If the min/max value of a timestamp column chunk is during the hour of the Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can drop row groups that contain rows that would be "ok" for the predicate otherwise.

      To reproduce (on current master branch):

      1. it is assumed that the timezone is CET and that flag convert_legacy_hive_parquet_utc_timestamps is enabled
      ( export TZ=CET; bin/start-impala-cluster.py --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
      2. create a table in hive and fill data in 3 inserts to create 3 files:
      create table t (i int, d timestamp) stored as parquet;
      insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
      insert into t values (3, "2018-10-28 02:30:00");
      insert into t values (4, "2017-10-29 02:30:00")
      3. Query from Impala
      set num_nodes=1;
      select * from t; -- returns all 4 values (same as Hive) 
      select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive returns 1,4)
      select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive returns 2,3)
      profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been stat filtered)
      select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 in Impala (same as Hive), because the "or" part disabled stat filtering
      

      Attachments

        Issue Links

          Activity

            People

              csringhofer Csaba Ringhofer
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: