Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3316

convert_legacy_hive_parquet_utc_timestamps=true makes reading parquet tables 30x slower

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: impala 2.3
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
      None
    • Environment:
      CDH 5.5.2/ Impala 2.3
      Parquet table with a timestamp column
      Secure cluster
      convert_legacy_hive_parquet_utc_timestamps=true
      Timestamp column is not being filtered on

      Description

      Enabling convert_legacy_hive_parquet_utc_timestamps=true
      makes simple queries that don't even filter on a timestamp attribute perform really poorly.

      Parquet table.
      Impala 2.3 / CDH 5.5.2.

      convert_legacy_hive_parquet_utc_timestamps=true makes following simple query 30x slower (1.1minutes -> over 30 minutes).

      select * from parquet_table_with_a_timestamp_attribute where bigint_attribute=1000771658169

      Notice I did not even filter on a timestamp attribute.

      Made multiple tests with and without convert_legacy_hive_parquet_utc_timestamps=true impalad present.

      Also, from https://issues.cloudera.org/browse/IMPALA-1658

      Casey Ching added a comment - 15/Jun/15 5:12 PM
      Btw, a perf test showed enabling this flag was 10x slower.

        Attachments

        1. screenshot-1.png
          9 kB
          Ruslan Dautkhanov
        2. screenshot-2.png
          19 kB
          Boris Tyukin

          Issue Links

            Activity

              People

              • Assignee:
                attilaj Attila Jeges
                Reporter:
                tagar_impala_e3b3 Ruslan Dautkhanov
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: