Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Null is not equal in terms of join. Need to change comparator to do that. e2e Join_9, Join_10, Join_11 are manifests for this issue.

      1. PIG-3761-1.patch
        11 kB
        Daniel Dai
      2. PIG-3761-2.patch
        2 kB
        Daniel Dai

        Activity

        Hide
        Daniel Dai added a comment -

        Patch committed to tez. Thanks Cheolsoo for review!

        Show
        Daniel Dai added a comment - Patch committed to tez. Thanks Cheolsoo for review!
        Hide
        Cheolsoo Park added a comment -

        +1. LGTM.

        Show
        Cheolsoo Park added a comment - +1. LGTM.
        Hide
        Daniel Dai added a comment -

        Changed JoinPackager handling to deal with the issue.

        Show
        Daniel Dai added a comment - Changed JoinPackager handling to deal with the issue.
        Hide
        Daniel Dai added a comment -

        We do set index correctly in tez. But what you suggest should work. Let me try.

        Show
        Daniel Dai added a comment - We do set index correctly in tez. But what you suggest should work. Let me try.
        Hide
        Mark Wagner added a comment -

        What are your thoughts on handling this as part of the JoinPackager instead of in the Comparator? It seems like that might make the fact that we're handling null specially more explicit, which could help maintainability in the future.

        Are there any cases where we'll have the same index from two different relations? We haven't really been using the indices explicitly in the Tez branch, so I'm not sure how reliable they are.

        Show
        Mark Wagner added a comment - What are your thoughts on handling this as part of the JoinPackager instead of in the Comparator? It seems like that might make the fact that we're handling null specially more explicit, which could help maintainability in the future. Are there any cases where we'll have the same index from two different relations? We haven't really been using the indices explicitly in the Tez branch, so I'm not sure how reliable they are.
        Hide
        Daniel Dai added a comment -

        In MR, we only use PigXXXRawComparator for sorting, not cogroup/join, that use PigWritableComparator, which has the same null comparative logic.

        Show
        Daniel Dai added a comment - In MR, we only use PigXXXRawComparator for sorting, not cogroup/join, that use PigWritableComparator, which has the same null comparative logic.
        Hide
        Rohini Palaniswamy added a comment -

        How does it work with MR?

        Show
        Rohini Palaniswamy added a comment - How does it work with MR?

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development