Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15634

Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 2.2.0
    • None
    • Druid integration

    Description

      SET hive.tez.java.opts=-Duser.timezone="UTC"; can be used to change timezone for Tez tasks. However, when Fetch optimizer kicks in because we can push the full query to Druid, we obtain different values for the timestamp than when jobs are executed. This probably has to do with the timezone on the client side. How should we handle this issue?

      For instance, this can be observed with the following query:

      set hive.fetch.task.conversion=more;
      SELECT DISTINCT `__time`
      FROM store_sales_sold_time_subset
      WHERE `__time` < '1999-11-10 00:00:00';
      OK
      1999-10-31 19:00:00
      1999-11-01 19:00:00
      1999-11-02 19:00:00
      1999-11-03 19:00:00
      1999-11-04 19:00:00
      1999-11-05 19:00:00
      1999-11-06 19:00:00
      1999-11-07 19:00:00
      1999-11-08 19:00:00
      
      set hive.fetch.task.conversion=none;
      SELECT DISTINCT `__time`
      FROM store_sales_sold_time_subset
      WHERE `__time` < '1999-11-10 00:00:00';
      OK
      1999-11-01 00:00:00
      1999-11-02 00:00:00
      1999-11-03 00:00:00
      1999-11-04 00:00:00
      1999-11-05 00:00:00
      1999-11-06 00:00:00
      1999-11-07 00:00:00
      1999-11-08 00:00:00
      1999-11-09 00:00:00
      

      Attachments

        Activity

          People

            bslim Slim Bouguerra
            jcamacho Jesús Camacho Rodríguez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: