Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18624

Parsing time is extremely high (~10 min) for queries with complex select expressions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.3.2
    • Fix Version/s: 2.4.0, 4.0.0, 3.2.0, 3.1.2
    • Component/s: Hive, Parser
    • Labels:
      None

      Description

      Explain of the same query takes

      0.1 to 3 seconds in hive 2.1.0 &
      10-15 min in hive 2.3.2 & latest master

      Sample expression below

      EXPLAIN
      SELECT DISTINCT
      
      
        IF(lower('a') <= lower('a')
        ,'a'
        ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a')
        ,'a'
        ,IF(if('a' = 'a', TRUE, FALSE) = 1
        ,'a'
        ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a')
             and lower(if('a' = 'a','a','a')) <= lower('a'))
            OR ('a' like 'a' OR 'a' like 'a')
            OR 'a' in ('a','a')
        ,'a'
        ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1
        ,'a'
        ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp())))) <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + 9*3600
        ,'a'
      
        ,If(lower('a') <= lower('a')
            and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1
        ,'a'
        ,IF('a'=1 AND 'a'=1
        ,'a'
        ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0
        ,'a'
        ,IF('a' = 'a'
        ,'a'
      
        ,If('a' = 'a' AND lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp)))
        ,'a'
      
      
      
        ,IF('a' = 1
      
        ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) > 30 and 'a' = 1
      
      
        ,'a', 'a')
      
      
        ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL
        ,'a'
        ,IF('a' = 1 and 'a'>0
        , 'a'
      
        ,IF('a' = 1 AND 'a' ='a'
        ,'a'
        ,IF('a' is not null and 'a' is not null and 'a' > 'a'
        ,'a'
        ,IF('a' = 1
        ,'a'
      
        ,IF('a' = 'a'
        ,'a'
      
        ,If('a' = 1
        ,'a'
        ,IF('a' = 1
        ,'a'
        ,IF('a' = 1
        ,'a'
      
        ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as  int) + 93600 < cast(unix_timestamp()  as int)
        ,'a'
        ,IF('a' = 'a'
        ,'a'
        ,IF('a' = 'a' and 'a' in ('a','a','a')
        ,'a'
        ,IF('a' = 'a'
        ,'a','a'))
            )))))))))))))))))))))))
      AS test_comp_exp
      

       

      Taking a look at thread_dump shows a very large function stack getting created.

      Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed up the parsing.

       

        Attachments

        1. HIVE-18624.01.patch
          7 kB
          Zoltan Haindrich
        2. HIVE-18624.02.patch
          9 kB
          Zoltan Haindrich
        3. thread_dump
          77 kB
          Amruth S

          Issue Links

            Activity

              People

              • Assignee:
                kgyrtkirk Zoltan Haindrich
                Reporter:
                amrk7 Amruth S
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: