Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3316

convert_legacy_hive_parquet_utc_timestamps=true makes reading parquet tables 30x slower

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: impala 2.3
    • Fix Version/s: Impala 3.1.0
    • Component/s: Backend
    • Labels:
      None
    • Environment:
      CDH 5.5.2/ Impala 2.3
      Parquet table with a timestamp column
      Secure cluster
      convert_legacy_hive_parquet_utc_timestamps=true
      Timestamp column is not being filtered on

      Description

      Enabling convert_legacy_hive_parquet_utc_timestamps=true
      makes simple queries that don't even filter on a timestamp attribute perform really poorly.

      Parquet table.
      Impala 2.3 / CDH 5.5.2.

      convert_legacy_hive_parquet_utc_timestamps=true makes following simple query 30x slower (1.1minutes -> over 30 minutes).

      select * from parquet_table_with_a_timestamp_attribute where bigint_attribute=1000771658169

      Notice I did not even filter on a timestamp attribute.

      Made multiple tests with and without convert_legacy_hive_parquet_utc_timestamps=true impalad present.

      Also, from https://issues.cloudera.org/browse/IMPALA-1658

      Casey Ching added a comment - 15/Jun/15 5:12 PM
      Btw, a perf test showed enabling this flag was 10x slower.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              attilaj Attila Jeges
              Reporter:
              tagar_impala_e3b3 Ruslan Dautkhanov

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment