Hive
  1. Hive
  2. HIVE-1783

CommonJoinOperator optimize the case of 1:1 join

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      CommonJoinOperator.genObject() is expensive. It does a recursive and keeps lots of states because it has to:
      1. handle null cases for outer joins
      2. handle the case of duplicated keys from one join party
      We can do a minor optimization to detect a 1:1 join (which is quite common) before calling CommonJoinOperator.genObject() and forward columns in a simple for-loop if we are sure neither of 1 or 2 will happen.

      1. HIVE-1783.1.patch
        4 kB
        Siying Dong
      2. HIVE-1783.2.patch
        32 kB
        Siying Dong
      3. HIVE-1783.3.patch
        32 kB
        Siying Dong
      4. HIVE-1783.4.patch
        54 kB
        Siying Dong

        Activity

        Hide
        Namit Jain added a comment -

        +1

        running tests

        Show
        Namit Jain added a comment - +1 running tests
        Hide
        He Yongqiang added a comment -

        namit, can you hold this for Liyin's patch?

        Show
        He Yongqiang added a comment - namit, can you hold this for Liyin's patch?
        Hide
        Namit Jain added a comment -

        sure - let me know when you are done

        Show
        Namit Jain added a comment - sure - let me know when you are done
        Hide
        Namit Jain added a comment -

        Actually, on second thoughts, Siying, can you add more tests ?

        set hive.join.emit.interval to a very small value (say 1), and add couple
        of small data files to check all variants of join - single key followed by null followed
        by single key. Try to get the coverage on all those 'if' conditions

        Show
        Namit Jain added a comment - Actually, on second thoughts, Siying, can you add more tests ? set hive.join.emit.interval to a very small value (say 1), and add couple of small data files to check all variants of join - single key followed by null followed by single key. Try to get the coverage on all those 'if' conditions
        Hide
        Siying Dong added a comment -

        Added a unit test.
        I ran the test with and without the patch applied and the test results are identical.
        (the result doesn't seem to right though. Even it they are wrong, it is a totally separated issue)

        Show
        Siying Dong added a comment - Added a unit test. I ran the test with and without the patch applied and the test results are identical. (the result doesn't seem to right though. Even it they are wrong, it is a totally separated issue)
        Hide
        Namit Jain added a comment -

        Let us get it after HIVE-1642

        Show
        Namit Jain added a comment - Let us get it after HIVE-1642
        Hide
        Namit Jain added a comment -

        Can you refresh the patch ? HIVE-1642 has been committed, so this is good to go

        Show
        Namit Jain added a comment - Can you refresh the patch ? HIVE-1642 has been committed, so this is good to go
        Hide
        Siying Dong added a comment -

        after previous patches.

        Show
        Siying Dong added a comment - after previous patches.
        Hide
        Namit Jain added a comment -

        <property>
        <name>hive.outerjoin.supports.filters</name>
        <value>false</value>
        <description> Whether hive should correctly not push the filters for outer joins </description>
        </property>

        Can you set the above property to false, and run the same tests as above - I mean double the tests

        Show
        Namit Jain added a comment - <property> <name>hive.outerjoin.supports.filters</name> <value>false</value> <description> Whether hive should correctly not push the filters for outer joins </description> </property> Can you set the above property to false, and run the same tests as above - I mean double the tests
        Hide
        Siying Dong added a comment -

        The problem is, with hive.outerjoin.supports.filters=false. Some code paths generating different kind of empty rows are not covered by the test.
        With hive.outerjoin.supports.filters=true, although results don't seem quite right, we are sure with the patch we don't change any behavior.

        Show
        Siying Dong added a comment - The problem is, with hive.outerjoin.supports.filters=false. Some code paths generating different kind of empty rows are not covered by the test. With hive.outerjoin.supports.filters=true, although results don't seem quite right, we are sure with the patch we don't change any behavior.
        Hide
        Namit Jain added a comment -

        can we have the test with both ?

        Show
        Namit Jain added a comment - can we have the test with both ?
        Hide
        Siying Dong added a comment -

        with hive.outerjoin.supports.filters=true and false;

        Show
        Siying Dong added a comment - with hive.outerjoin.supports.filters=true and false;
        Hide
        Namit Jain added a comment -

        +1

        running tests

        Show
        Namit Jain added a comment - +1 running tests
        Hide
        Namit Jain added a comment -

        Committed. Thanks Siying

        Show
        Namit Jain added a comment - Committed. Thanks Siying

          People

          • Assignee:
            Siying Dong
            Reporter:
            Siying Dong
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development