Pig
  1. Pig
  2. PIG-885

New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Bunch of UDFs:
      1. Bin – Converts a continuous value into discrete values
      2. Decode – Converts a given attribute or expression into another string value, based on the value of the source attribute
      3. LookupInFiles – Check for the existence of an expression in a serial of text files
      4. RegexExtract and RegexMatch – Similar to perl regexes
      5. HashFNV – An implementation of FNV hash
      6. DiffDate – Caculate the number of days in between

      1. PIG-885.patch
        36 kB
        Daniel Dai
      2. PIG-885-2.patch
        36 kB
        Daniel Dai
      3. PIG-885-3.patch
        47 kB
        Daniel Dai
      4. PIG-885-4.patch
        42 kB
        Daniel Dai
      5. PIG-885-5.patch
        44 kB
        Daniel Dai
      6. PIG-885-6.patch
        45 kB
        Daniel Dai
      7. PIG-885-7.patch
        46 kB
        Daniel Dai
      8. PIG-885-8.patch
        50 kB
        Daniel Dai

        Activity

        Hide
        Amr Awadallah added a comment -

        very nice collection, reminds me of myna

        – amr

        Show
        Amr Awadallah added a comment - very nice collection, reminds me of myna – amr
        Hide
        Daniel Dai added a comment -

        Some misspell on the function names.

        Show
        Daniel Dai added a comment - Some misspell on the function names.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12413351/PIG-885.patch
        against trunk revision 793660.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 176 release audit warnings (more than the trunk's current 163 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413351/PIG-885.patch against trunk revision 793660. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 176 release audit warnings (more than the trunk's current 163 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/126/console This message is automatically generated.
        Hide
        Daniel Dai added a comment -

        Attach patch again to solve release audit warnings.

        Show
        Daniel Dai added a comment - Attach patch again to solve release audit warnings.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12414031/PIG-885-3.patch
        against trunk revision 795931.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414031/PIG-885-3.patch against trunk revision 795931. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/136/console This message is automatically generated.
        Hide
        Daniel Dai added a comment -

        Rework on error handling part.

        Show
        Daniel Dai added a comment - Rework on error handling part.
        Hide
        Olga Natkovich added a comment -

        The code looks good.

        Comments:

        (1) LookupInFile - I think it would make sense to require that files are provided in a constructor (via define) rather than checking on every exec.
        (2) In LookupInFile.exec - you get first element of the tuple without checking that it exists. I think you need to check for that and give an error.
        (3) LookupInFile.init - There are also some comments there that seems unrelated to the code - please remove
        (4) RegexpExtract.exec, RegexpMatch.exec - you refer to elements in the tuple without checking that they exist. We should give meaningful errors when we don't get all expected parameters
        (5) HashFNV.exec - needs to check size of the tuple.
        (6) HashFNV - needs the mapping function that that Pig insert implicit cast
        (7) DiffDate.exec - needs to check input tuple size before getting fields out
        (8) DiffDate - needs mapping function so that Pig inserts casts

        Show
        Olga Natkovich added a comment - The code looks good. Comments: (1) LookupInFile - I think it would make sense to require that files are provided in a constructor (via define) rather than checking on every exec. (2) In LookupInFile.exec - you get first element of the tuple without checking that it exists. I think you need to check for that and give an error. (3) LookupInFile.init - There are also some comments there that seems unrelated to the code - please remove (4) RegexpExtract.exec, RegexpMatch.exec - you refer to elements in the tuple without checking that they exist. We should give meaningful errors when we don't get all expected parameters (5) HashFNV.exec - needs to check size of the tuple. (6) HashFNV - needs the mapping function that that Pig insert implicit cast (7) DiffDate.exec - needs to check input tuple size before getting fields out (8) DiffDate - needs mapping function so that Pig inserts casts
        Hide
        Daniel Dai added a comment -

        New patch addresses most of problems in the comments except for these two:
        (1) LookupInFile takes arbitrary number input files. It cannot be put into define. There is a single file version called INSETFROMFILE already in internal piggybank. It makes use of construct via define
        (6) Second input parameter of HashFNV is optional, so we cannot specify input schema using the existing mechanism.

        Show
        Daniel Dai added a comment - New patch addresses most of problems in the comments except for these two: (1) LookupInFile takes arbitrary number input files. It cannot be put into define. There is a single file version called INSETFROMFILE already in internal piggybank. It makes use of construct via define (6) Second input parameter of HashFNV is optional, so we cannot specify input schema using the existing mechanism.
        Hide
        Olga Natkovich added a comment -

        +1, please, commit

        Show
        Olga Natkovich added a comment - +1, please, commit
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12414123/PIG-885-6.patch
        against trunk revision 797290.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 19 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414123/PIG-885-6.patch against trunk revision 797290. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/140/console This message is automatically generated.
        Hide
        Daniel Dai added a comment -

        Add null checking to all applicable UDFs

        Show
        Daniel Dai added a comment - Add null checking to all applicable UDFs
        Hide
        Daniel Dai added a comment -

        Add NullPointerException check

        Show
        Daniel Dai added a comment - Add NullPointerException check
        Hide
        Olga Natkovich added a comment -

        The latest patch looks good. Couple of comments:

        (1) RegexExtract - input.get(1).equals(mExpression)) - need to check for null return from get(1). The same for get(2)
        (2) RegexpMatch - the same

        Once they are addressed, please, commit the patch

        Show
        Olga Natkovich added a comment - The latest patch looks good. Couple of comments: (1) RegexExtract - input.get(1).equals(mExpression)) - need to check for null return from get(1). The same for get(2) (2) RegexpMatch - the same Once they are addressed, please, commit the patch
        Hide
        Daniel Dai added a comment -

        Patch committed.

        Show
        Daniel Dai added a comment - Patch committed.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development