Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4055

Investigate and fix to_date() slowness

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Impala 2.9.0
    • Backend

    Description

      Looks like to_date() pays a steep penalty when converting timestamps.

      +-----------------------------------------------------------------------------------------+
      | version()                                                                               |
      +-----------------------------------------------------------------------------------------+
      | impalad version 2.6.0-cdh5.8.0 RELEASE (build 5464d1750381b40a7e7163b12b09f11b891b4de3) |
      | Built on Thu, 16 Jun 2016 12:43:48 PST                                                  |
      +-----------------------------------------------------------------------------------------+
      
      -- single column timestamp parquet table of 100,000,000 rows
      
      select 
        l_shipdate,
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 11.25s
      
      select 
        trunc(l_shipdate,'DD'),
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 10.74s
      
      select 
        to_date(l_shipdate),
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 102.36s  <<< ~10x slower
      

      Attachments

        1. repro_query_cdh57_codegen_off_1.zip
          942 kB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              alex.behm Alexander Behm
              grahn Greg Rahn
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: