Pig
  1. Pig
  2. PIG-3568

Define the semantics of POStatus.STATUS_NULL

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      The meaning of POStatus.STATUS_NULL is not well documented and there are conflicting view points on its interpretation. The two interpretations are:

      1. POStatus.STATUS_NULL indicates that the pulled output IS null. This is mostly found in expression operators, particularly comparison operators.

      2. POStatus.STATUS_NULL indicates that the pull did not produce any output. This is backed up by its usage in POPackage (not JoinPackage) for flattening an empty bag, and PigGenericMapBase where pulls on the operator pipeline that result in STATUS_NULL are discarded.

      I propose that 2 should be the official definition going forward. The first meaning is easily indicated by (null, STATUS_OK) and all the relational operators already seem to follow 2. I'd like to hear others' opinions as well though.

      1. PIG-3568.2.patch
        37 kB
        Mark Wagner
      2. PIG-3568.1.patch
        36 kB
        Mark Wagner

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Agree with #1. For #2, seems that's not the intention for the current code. Though we can see it in POPackage, but the code path is not in use. But since we plan to clean up STATUS_NULL -> (null, STATUS_OK), we are free to redefine the meaning for STATUS_NULL (a different name might be better).

          Show
          Daniel Dai added a comment - Agree with #1. For #2, seems that's not the intention for the current code. Though we can see it in POPackage, but the code path is not in use. But since we plan to clean up STATUS_NULL -> (null, STATUS_OK), we are free to redefine the meaning for STATUS_NULL (a different name might be better).
          Hide
          Daniel Dai added a comment -

          I think #2 is also fine. We do use it to signal a tuple need to be skipped in POForEach (which wrongly changed by PIG-3060, PIG-3570 roll it back).

          However, when we use STATUS_NULL to signal a skip tuple, STATUS_NULL need to go through the rest of pipeline. We might consider to contain STATUS_NULL in the operator in the future.

          Show
          Daniel Dai added a comment - I think #2 is also fine. We do use it to signal a tuple need to be skipped in POForEach (which wrongly changed by PIG-3060 , PIG-3570 roll it back). However, when we use STATUS_NULL to signal a skip tuple, STATUS_NULL need to go through the rest of pipeline. We might consider to contain STATUS_NULL in the operator in the future.
          Hide
          Mark Wagner added a comment -

          Here's a patch for option #2. I've also created a RB: https://reviews.apache.org/r/15524/

          Show
          Mark Wagner added a comment - Here's a patch for option #2. I've also created a RB: https://reviews.apache.org/r/15524/
          Hide
          Cheolsoo Park added a comment -

          Thank you for cleaning up this mess!

          The patch looks good to me. I am running unit and e2e tests now.

          Show
          Cheolsoo Park added a comment - Thank you for cleaning up this mess! The patch looks good to me. I am running unit and e2e tests now.
          Hide
          Mark Wagner added a comment -

          Updated in response to Daniel's comments on RB.

          Show
          Mark Wagner added a comment - Updated in response to Daniel's comments on RB.
          Hide
          Cheolsoo Park added a comment -

          +1. Will commit to trunk and merge down to tez branch.

          Show
          Cheolsoo Park added a comment - +1. Will commit to trunk and merge down to tez branch.
          Hide
          Cheolsoo Park added a comment -

          Committed.

          Show
          Cheolsoo Park added a comment - Committed.

            People

            • Assignee:
              Mark Wagner
              Reporter:
              Mark Wagner
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development