Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21076

Hive optimizer is ignoring predicate condition provided in the query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.1, 2.1.0
    • None
    • Hive
    • None
    • OS : Oracle Linux 6.x

      JDK: jdk1.8.0_192 

    • Hide
      look like the problem with the optimizer where eighter it tries to merge this predicate with other filter condition or somehow ignoring it.

      putting the whole condition like follows make it work
       transaction_history.get_rate_attribute_range(Table__13.RateAttributes.RateAttribute,cast(CAST('2018-10-30 00:00:00' AS timestamp) as string),'AGE_RANGE') in ('35 - 39','60 - 64')

      PS: get_rate_attribute_range is the udf.
      Show
      look like the problem with the optimizer where eighter it tries to merge this predicate with other filter condition or somehow ignoring it. putting the whole condition like follows make it work  transaction_history.get_rate_attribute_range(Table__13.RateAttributes.RateAttribute,cast(CAST('2018-10-30 00:00:00' AS timestamp) as string),'AGE_RANGE') in ('35 - 39','60 - 64') PS: get_rate_attribute_range is the udf.

    Description

      In big sql which contains a where condition, in the condition there is IN clause which hive is not honoring. SQL with and without IN CLAUSE is attached herewith along with hiveserver2 logs

       

      /good plan
      // in hive server logs
      2018-11-15T14:44:57,682  INFO [7984d196-64b5-4135-854b-136886f7ce8b HiveServer2-HttpHandler-Pool: Thread-2726946] ql.Driver: Compiling command(queryId=hive_20181115144457_81293879-57e8-4173-99e3-d4947a4c19a3): EXPLAIN
      
      // where condtition
      WHERE
        (
         ( ( concat(( from_unixtime(unix_timestamp(substr(Table__6.participationperiodstartdate, 0, 10), 'yyyy-MM-dd'), 'MM/dd/yyyy') ), ' - ', ( from_unixtime(unix_timestamp(substr(Table__6.participationperiodenddate, 0, 10), 'yyyy-MM-dd'), 'MM/dd/yyyy') )) ) in  ('07/01/2018 - 06/30/2019')  )
         AND
         ( transaction_history.get_rate_attribute_range(Table__13.RateAttributes.RateAttribute,cast(CAST('2018-10-30 00:00:00' AS timestamp) as string),'AGE_RANGE') in ('35 - 39','60 - 64') )
         AND
         ( Table__6.benefittype in ('LIFE','SPOUSELIFE','VOLLIFE')  )
        )
      
      //bad plan
      // in hiveserver2 logs
      2018-11-15T14:45:33,744  INFO [HiveServer2-Background-Pool: Thread-2728943] ql.Driver: Executing command(queryId=hive_20181115144457_81293879-57e8-4173-99e3-d4947a4c19a3): EXPLAIN
      // where condition which is ignored and hive tried to do full table scan 
        WHERE
          (
           ( ( concat(( from_unixtime(unix_timestamp(substr(Table__6.participationperiodstartdate, 0, 10), 'yyyy-MM-dd'), 'MM/dd/yyyy') ), ' - ', ( from_unixtime(unix_timestamp(substr(Table__6.participationperiodenddate, 0, 10), 'yyyy-MM-dd'), 'MM/dd/yyyy') )) ) in  ('07/01/2018 - 06/30/2019')  )
           AND
           (  Table__13.ageband in ('35 - 39')  )
           AND
           ( Table__6.benefittype in ('LIFE','SPOUSELIFE','VOLLIFE')  )
          )
      

      Please find attached the logs for hiveserver, table ddl involved in the query along with the good and bad query plan. 

       

      Attachments

        1. Explain Plan.zip
          53 kB
          Nithin Nagaraj

        Activity

          People

            Unassigned Unassigned
            nnagaraj Nithin Nagaraj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: