Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23893

Extract deterministic conditions for pdd when the predicate contains non-deterministic function

    XMLWordPrintableJSON

Details

    Description

      Taken the following query for example, assume unix_timestamp is non-deterministic before version 1.3.0:
       
      SELECT
              from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,
              b.game_id AS game_id,
              b.game_name AS game_name,
              count(DISTINCT a.sha1_imei) uv
      FROM
              gamesdk_userprofile a
              JOIN game_info_all b ON a.appid = b.dev_app_id
      WHERE
              a.date = 20200704
              AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704
              AND b.date = 20200704
      GROUP BY
              from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),
              b.game_id,
              b.game_name
      ORDER BY
              uv DESC
      LIMIT 200;
       
      The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to join op, make the optimizer unable to prune partitions, which may result  to a full scan on tables gamesdk_userprofile and game_info_all.

      Attachments

        Issue Links

          Activity

            People

              zhishui zhishui
              dengzh Zhihua Deng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m