Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2048 Expedite Partition Pruning
  3. HIVE-2050

batch processing partition pruning process

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      For partition predicates that cannot be pushed down to JDO filtering (HIVE-2049), we should fall back to the old approach of listing all partition names first and use Hive's expression evaluation engine to select the correct partitions. Then the partition pruner should hand Hive a list of partition names and return a list of Partition Object (this should be added to the Hive API).

      A possible optimization is that the the partition pruner should give Hive a set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and the JDO query should be formulated as range queries. Range queries are possible because the first step list all partition names in sorted order. It's easy to come up with a range and it is guaranteed that the JDO range query results should be equivalent to the query with a list of partition names.

      Attachments

        1. HIVE-2050.patch
          161 kB
          Ning Zhang
        2. HIVE-2050.4.patch
          166 kB
          Ning Zhang
        3. HIVE-2050.3.patch
          166 kB
          Ning Zhang
        4. HIVE-2050.2.patch
          170 kB
          Ning Zhang

        Activity

          People

            nzhang Ning Zhang
            nzhang Ning Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: