Hive
  1. Hive
  2. HIVE-1660

Change get_partitions_ps to pass partition filter to database

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.7.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Support for doing partition pruning by passing the partition filter to the database is added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement for tables having large number of partitions. A listPartitionNamesByFilter API might be required for implementing this for use from Hive.

      1. HIVE-1660.1.patch
        28 kB
        Paul Yang
      2. HIVE-1660_regex.patch
        31 kB
        Paul Yang
      3. HIVE-1660.2.patch
        28 kB
        Paul Yang
      4. HIVE-1660.3.patch
        36 kB
        Paul Yang
      5. HIVE-1660.4.patch
        35 kB
        Paul Yang

        Activity

        Hide
        John Sichi added a comment -

        Amendment is in HIVE-1874.

        Show
        John Sichi added a comment - Amendment is in HIVE-1874 .
        Hide
        John Sichi added a comment -

        Namit, HIVE-1660.4.patch contains no reference to hbase_pushdown.q.out. However, your commit did change that file:

        http://svn.apache.org/viewvc?view=revision&revision=1024341

        I am guessing that the commit for HIVE-1638 (close in time) broke this test, and then you updated the log file as part of this commit.

        Show
        John Sichi added a comment - Namit, HIVE-1660 .4.patch contains no reference to hbase_pushdown.q.out. However, your commit did change that file: http://svn.apache.org/viewvc?view=revision&revision=1024341 I am guessing that the commit for HIVE-1638 (close in time) broke this test, and then you updated the log file as part of this commit.
        Hide
        Namit Jain added a comment -

        Committed. Thanks Paul

        Show
        Namit Jain added a comment - Committed. Thanks Paul
        Hide
        Paul Yang added a comment -

        Also, HIVE-1660.4.patch is the refreshed version.

        Show
        Paul Yang added a comment - Also, HIVE-1660 .4.patch is the refreshed version.
        Hide
        Paul Yang added a comment -

        Full suite passed on my end

        Show
        Paul Yang added a comment - Full suite passed on my end
        Hide
        Namit Jain added a comment -

        Paul, the patch does not apply cleanly. Can you refresh ?

        Show
        Namit Jain added a comment - Paul, the patch does not apply cleanly. Can you refresh ?
        Hide
        Paul Yang added a comment -
        • Fixed causes of test failures
        • Fixed bugs with the filter optimization
        • Added additional test cases

        I'll submit the patch once the full test suite completes.

        Show
        Paul Yang added a comment - Fixed causes of test failures Fixed bugs with the filter optimization Added additional test cases I'll submit the patch once the full test suite completes.
        Hide
        Namit Jain added a comment -

        The following tests failed:

        bucket3.q
        stats10.q
        stats2.q
        stats8.q
        union22.q

        reduce_deduplicate.q (in minimr)

        input2.q, input3.q (TestParse)

        Show
        Namit Jain added a comment - The following tests failed: bucket3.q stats10.q stats2.q stats8.q union22.q reduce_deduplicate.q (in minimr) input2.q, input3.q (TestParse)
        Hide
        Namit Jain added a comment -

        I will take a look

        Show
        Namit Jain added a comment - I will take a look
        Hide
        Paul Yang added a comment -

        HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() method and fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition, the patch makes an optimization to use a partition name regex for filtering in cases of equality comparisons.

        HIVE-1660_regex.patch was a little experiment to test out the potential speed up from filtering based on a more complete regex of the partition name. For example, for a table partitioned on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch. Since the table with 5 million partitions was a very unusual case, I didn't think the speedup was worth the additional complexity.

        Show
        Paul Yang added a comment - HIVE-1660 .1.patch is the main patch - it create a listPartitionNamesByFilter() method and fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition, the patch makes an optimization to use a partition name regex for filtering in cases of equality comparisons. HIVE-1660 _regex.patch was a little experiment to test out the potential speed up from filtering based on a more complete regex of the partition name. For example, for a table partitioned on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch. Since the table with 5 million partitions was a very unusual case, I didn't think the speedup was worth the additional complexity.

          People

          • Assignee:
            Paul Yang
            Reporter:
            Ajay Kidave
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development