Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4055

Investigate and fix to_date() slowness

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Looks like to_date() pays a steep penalty when converting timestamps.

      +-----------------------------------------------------------------------------------------+
      | version()                                                                               |
      +-----------------------------------------------------------------------------------------+
      | impalad version 2.6.0-cdh5.8.0 RELEASE (build 5464d1750381b40a7e7163b12b09f11b891b4de3) |
      | Built on Thu, 16 Jun 2016 12:43:48 PST                                                  |
      +-----------------------------------------------------------------------------------------+
      
      -- single column timestamp parquet table of 100,000,000 rows
      
      select 
        l_shipdate,
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 11.25s
      
      select 
        trunc(l_shipdate,'DD'),
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 10.74s
      
      select 
        to_date(l_shipdate),
        count(*)
      from ts1
      group by 1;
      
      Fetched 2526 row(s) in 102.36s  <<< ~10x slower
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alex.behm Alexander Behm
                Reporter:
                grahn Greg Rahn
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: