Hive
  1. Hive
  2. HIVE-4689

For outerjoins, joinEmitInterval might make wrong result

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Alias filter tag is calculated for each group and used for outer joins. But if joinEmitInterval is smaller than the group size, pre-matured alias filter tag would be used and might introduce different(wrong) result.

      It can be observed in join_1to1.q test but I cannot imagine proper solution which does not override intention of joinEmitInterval. Should it be disabled for outer joins?

        Issue Links

          Activity

          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-4689 [jira] For outerjoins, joinEmitInterval might make wrong result".

          Reviewers: JIRA

          HIVE-4689 For outerjoins, joinEmitInterval might make wrong result

          Alias filter tag is calculated for each group and used for outer joins. But if joinEmitInterval is smaller than the group size, pre-matured alias filter tag would be used and might introduce different(wrong) result.

          It can be observed in join_1to1.q test but I cannot imagine proper solution which does not override intention of joinEmitInterval. Should it be disabled for outer joins?

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D11211

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
          ql/src/test/results/clientpositive/join_1to1.q.out

          MANAGE HERALD RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/26691/

          To: JIRA, navis

          Show
          Phabricator added a comment - navis requested code review of " HIVE-4689 [jira] For outerjoins, joinEmitInterval might make wrong result". Reviewers: JIRA HIVE-4689 For outerjoins, joinEmitInterval might make wrong result Alias filter tag is calculated for each group and used for outer joins. But if joinEmitInterval is smaller than the group size, pre-matured alias filter tag would be used and might introduce different(wrong) result. It can be observed in join_1to1.q test but I cannot imagine proper solution which does not override intention of joinEmitInterval. Should it be disabled for outer joins? TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11211 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/test/results/clientpositive/join_1to1.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/26691/ To: JIRA, navis
          Hide
          Brock Noland added a comment -

          Clicking "Submit Patch" so this shows up in the patch review queue.

          Show
          Brock Noland added a comment - Clicking "Submit Patch" so this shows up in the patch review queue.
          Hide
          Ashutosh Chauhan added a comment -

          I am not super aware of this config. What exactly does it govern? It seems like it for buffering rows pulled from reducer iterator in memory. So, we are trading speed with memory utilization with config. Is that correct?
          In that case think it makes sense to disable this altogether for outer join case.

          Show
          Ashutosh Chauhan added a comment - I am not super aware of this config. What exactly does it govern? It seems like it for buffering rows pulled from reducer iterator in memory. So, we are trading speed with memory utilization with config. Is that correct? In that case think it makes sense to disable this altogether for outer join case.
          Hide
          Ashutosh Chauhan added a comment -

          Navis I think we should disable joinEmitInterval altogether for outer joins.

          Show
          Ashutosh Chauhan added a comment - Navis I think we should disable joinEmitInterval altogether for outer joins.
          Hide
          Navis added a comment -

          Sorry, I was busy for these days.

          I think disabling joinEmitInterval for all outer joins seemed too much. This problem occurs when alias-filter is needed, which is used only for outer joins with condition on 'on' clause. Current patch is for that.

          Show
          Navis added a comment - Sorry, I was busy for these days. I think disabling joinEmitInterval for all outer joins seemed too much. This problem occurs when alias-filter is needed, which is used only for outer joins with condition on 'on' clause. Current patch is for that.
          Hide
          Yin Huai added a comment -

          I agree with Navis. Here is my understanding. Right now, if we do not have alias-filter, a non-matching row will be generated only when either left or right table does not have a matching row (assuming we have a 2-way join). In this case, either the container of the left table or that of the right table will be empty, so joinEmitInterval will not affect the correctness of results.

          Show
          Yin Huai added a comment - I agree with Navis. Here is my understanding. Right now, if we do not have alias-filter, a non-matching row will be generated only when either left or right table does not have a matching row (assuming we have a 2-way join). In this case, either the container of the left table or that of the right table will be empty, so joinEmitInterval will not affect the correctness of results.
          Hide
          Ashutosh Chauhan added a comment -

          Okies. Lets go ahead with current patch. +1

          Show
          Ashutosh Chauhan added a comment - Okies. Lets go ahead with current patch. +1
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Navis!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Navis!
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #2177 (See https://builds.apache.org/job/Hive-trunk-h0.21/2177/)
          HIVE-4689 : For outerjoins, joinEmitInterval might make wrong result (Navis via Ashutosh Chauhan) (Revision 1499453)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499453
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
          • /hive/trunk/ql/src/test/results/clientpositive/join_1to1.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #2177 (See https://builds.apache.org/job/Hive-trunk-h0.21/2177/ ) HIVE-4689 : For outerjoins, joinEmitInterval might make wrong result (Navis via Ashutosh Chauhan) (Revision 1499453) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499453 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java /hive/trunk/ql/src/test/results/clientpositive/join_1to1.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #270 (See https://builds.apache.org/job/Hive-trunk-hadoop2/270/)
          HIVE-4689 : For outerjoins, joinEmitInterval might make wrong result (Navis via Ashutosh Chauhan) (Revision 1499453)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499453
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
          • /hive/trunk/ql/src/test/results/clientpositive/join_1to1.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #270 (See https://builds.apache.org/job/Hive-trunk-hadoop2/270/ ) HIVE-4689 : For outerjoins, joinEmitInterval might make wrong result (Navis via Ashutosh Chauhan) (Revision 1499453) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1499453 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java /hive/trunk/ql/src/test/results/clientpositive/join_1to1.q.out
          Hide
          Ashutosh Chauhan added a comment -

          This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

            People

            • Assignee:
              Navis
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development