Pig
  1. Pig
  2. PIG-2929

Improve documentation around AVG, CONCAT, MIN, MAX

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.1, 0.9.2
    • Fix Version/s: 0.11
    • Labels:
      None

      Description

      Just stumbled upon this issue over the weekend. It appears the built-in CONCAT function will return null in the advent of 1 item in the Tuple being null. If this is as intended it would be nice if the docs made note of this. Otherwise it would be nice if nulls where just skipped and a proper string of what was left was returned. If all items are null then just return null in that case.

      1. PIG-2929-2.patch
        3 kB
        Cheolsoo Park
      2. PIG-2929.patch
        2 kB
        Cheolsoo Park

        Activity

        Hide
        Bill Graham added a comment -

        Looks good, thanks Cheolsoo! Renaming JIRA and committing.

        Show
        Bill Graham added a comment - Looks good, thanks Cheolsoo! Renaming JIRA and committing.
        Hide
        Cheolsoo Park added a comment -

        Thank you for reviewing my patch Bill!

        I removed "now". Please let me know if you want to edit anything more.

        Show
        Cheolsoo Park added a comment - Thank you for reviewing my patch Bill! I removed "now". Please let me know if you want to edit anything more.
        Hide
        Nicholas Verbeck added a comment -

        This all makes since. The addition of comments was more the main goal I was looking for. Just to prevent anyone else from having the assumptions I had.

        However I did managed to solve the issue without a UDF by leveraging the ternary operations. ie. CONCAT(domain is null?"":domain, path is null?"":path). The case I had was concating parts of a URL back together. Where the Query String might be null if there wasn't one.

        Show
        Nicholas Verbeck added a comment - This all makes since. The addition of comments was more the main goal I was looking for. Just to prevent anyone else from having the assumptions I had. However I did managed to solve the issue without a UDF by leveraging the ternary operations. ie. CONCAT(domain is null?"":domain, path is null?"":path). The case I had was concating parts of a URL back together. Where the Query String might be null if there wasn't one.
        Hide
        Bill Graham added a comment -

        +1 to Cheolsoo's comments. A null value passed to a concat could be seen as an invalid record where a null result it warranted. Plus there's the backward compatibility issue.

        If you want a concat with the semantics you describe, I think a custom UDF is the way go.

        @Chealsoo, thanks for the documentation patch. Would you please make one minor edit, which is to remove the now from the text. Comments should be non-temporal.

        The MIN function now ignores NULL values.
        
        Show
        Bill Graham added a comment - +1 to Cheolsoo's comments. A null value passed to a concat could be seen as an invalid record where a null result it warranted. Plus there's the backward compatibility issue. If you want a concat with the semantics you describe, I think a custom UDF is the way go. @Chealsoo, thanks for the documentation patch. Would you please make one minor edit, which is to remove the now from the text. Comments should be non-temporal. The MIN function now ignores NULL values.
        Hide
        Cheolsoo Park added a comment -

        Thank you very much for reporting an issue Nicholas!

        In fact, this is documented here although it is not very visible.

        I am attaching a patch that updats the built-in function doc (including CONCAT, AVG, MIN, MAX, and SIZE) regarding nulls.

        Regarding your suggestion of changing the behavior of CONCAT, even though it is total valid, I'm afraid that that could introduce backward incompatibilities. That is, some people may rely on that CONCAT returns nulls in their scripts. Given that this can be easily achieved by writing your own UDF, I prefer to keep the current behavior.

        Please let me know if anyone thinks otherwise. Thanks!

        Show
        Cheolsoo Park added a comment - Thank you very much for reporting an issue Nicholas! In fact, this is documented here although it is not very visible. I am attaching a patch that updats the built-in function doc (including CONCAT, AVG, MIN, MAX, and SIZE) regarding nulls. Regarding your suggestion of changing the behavior of CONCAT, even though it is total valid, I'm afraid that that could introduce backward incompatibilities. That is, some people may rely on that CONCAT returns nulls in their scripts. Given that this can be easily achieved by writing your own UDF, I prefer to keep the current behavior. Please let me know if anyone thinks otherwise. Thanks!

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Nicholas Verbeck
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development