Pig
  1. Pig
  2. PIG-3568

Define the semantics of POStatus.STATUS_NULL

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      The meaning of POStatus.STATUS_NULL is not well documented and there are conflicting view points on its interpretation. The two interpretations are:

      1. POStatus.STATUS_NULL indicates that the pulled output IS null. This is mostly found in expression operators, particularly comparison operators.

      2. POStatus.STATUS_NULL indicates that the pull did not produce any output. This is backed up by its usage in POPackage (not JoinPackage) for flattening an empty bag, and PigGenericMapBase where pulls on the operator pipeline that result in STATUS_NULL are discarded.

      I propose that 2 should be the official definition going forward. The first meaning is easily indicated by (null, STATUS_OK) and all the relational operators already seem to follow 2. I'd like to hear others' opinions as well though.

      1. PIG-3568.1.patch
        36 kB
        Mark Wagner
      2. PIG-3568.2.patch
        37 kB
        Mark Wagner

        Issue Links

          Activity

          Mark Wagner created issue -
          Mark Wagner made changes -
          Field Original Value New Value
          Description The meaning of POStatus.STATUS_NULL is not well documented and there are conflicting view points on its interpretation. The two interpretations are:

          1. POStatus.STATUS_NULL indicates that the pulled output IS null. This is mostly found in expression operators, particularly comparison operators.

          2. POStatus.STATUS_NULL indicates that the pull did not produce any output. This is backed up by its usage in POPackage (not JoinPackage) for flattening an empty bag, and PigGenericMapBase where pulls on the operator pipeline that result in STATUS_NULL are discarded.

          I propose that 2 should be the official definition going forward. The first meaning is easily indicated by (null, STATUS_OK) and all the relational operators already seem to follow 2. I'd like others opinions as well though.
          The meaning of POStatus.STATUS_NULL is not well documented and there are conflicting view points on its interpretation. The two interpretations are:

          1. POStatus.STATUS_NULL indicates that the pulled output IS null. This is mostly found in expression operators, particularly comparison operators.

          2. POStatus.STATUS_NULL indicates that the pull did not produce any output. This is backed up by its usage in POPackage (not JoinPackage) for flattening an empty bag, and PigGenericMapBase where pulls on the operator pipeline that result in STATUS_NULL are discarded.

          I propose that 2 should be the official definition going forward. The first meaning is easily indicated by (null, STATUS_OK) and all the relational operators already seem to follow 2. I'd like to hear others' opinions as well though.
          Hide
          Daniel Dai added a comment -

          Agree with #1. For #2, seems that's not the intention for the current code. Though we can see it in POPackage, but the code path is not in use. But since we plan to clean up STATUS_NULL -> (null, STATUS_OK), we are free to redefine the meaning for STATUS_NULL (a different name might be better).

          Show
          Daniel Dai added a comment - Agree with #1. For #2, seems that's not the intention for the current code. Though we can see it in POPackage, but the code path is not in use. But since we plan to clean up STATUS_NULL -> (null, STATUS_OK), we are free to redefine the meaning for STATUS_NULL (a different name might be better).
          Hide
          Daniel Dai added a comment -

          I think #2 is also fine. We do use it to signal a tuple need to be skipped in POForEach (which wrongly changed by PIG-3060, PIG-3570 roll it back).

          However, when we use STATUS_NULL to signal a skip tuple, STATUS_NULL need to go through the rest of pipeline. We might consider to contain STATUS_NULL in the operator in the future.

          Show
          Daniel Dai added a comment - I think #2 is also fine. We do use it to signal a tuple need to be skipped in POForEach (which wrongly changed by PIG-3060 , PIG-3570 roll it back). However, when we use STATUS_NULL to signal a skip tuple, STATUS_NULL need to go through the rest of pipeline. We might consider to contain STATUS_NULL in the operator in the future.
          Hide
          Mark Wagner added a comment -

          Here's a patch for option #2. I've also created a RB: https://reviews.apache.org/r/15524/

          Show
          Mark Wagner added a comment - Here's a patch for option #2. I've also created a RB: https://reviews.apache.org/r/15524/
          Mark Wagner made changes -
          Attachment PIG-3568.1.patch [ 12613891 ]
          Mark Wagner made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Cheolsoo Park added a comment -

          Thank you for cleaning up this mess!

          The patch looks good to me. I am running unit and e2e tests now.

          Show
          Cheolsoo Park added a comment - Thank you for cleaning up this mess! The patch looks good to me. I am running unit and e2e tests now.
          Hide
          Mark Wagner added a comment -

          Updated in response to Daniel's comments on RB.

          Show
          Mark Wagner added a comment - Updated in response to Daniel's comments on RB.
          Mark Wagner made changes -
          Attachment PIG-3568.2.patch [ 12614159 ]
          Hide
          Cheolsoo Park added a comment -

          +1. Will commit to trunk and merge down to tez branch.

          Show
          Cheolsoo Park added a comment - +1. Will commit to trunk and merge down to tez branch.
          Hide
          Cheolsoo Park added a comment -

          Committed.

          Show
          Cheolsoo Park added a comment - Committed.
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.13.0 [ 12324971 ]
          Resolution Fixed [ 1 ]
          Cheolsoo Park made changes -
          Link This issue breaks PIG-3679 [ PIG-3679 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          1d 17h 17m 1 Mark Wagner 14/Nov/13 18:56
          Patch Available Patch Available Resolved Resolved
          1d 5h 31m 1 Cheolsoo Park 16/Nov/13 00:27
          Resolved Resolved Closed Closed
          233d 17h 40m 1 Daniel Dai 07/Jul/14 19:08

            People

            • Assignee:
              Mark Wagner
              Reporter:
              Mark Wagner
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development