Pig
  1. Pig
  2. PIG-958

Splitting output data on key field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution.

      1. 958.v3.patch
        18 kB
        Ankur
      2. 958.v4.patch
        18 kB
        Ankur

        Activity

        Hide
        Ankur added a comment -

        Attached is an implementation of a custom store function that splits the data dynamically based on the values of user specified key field in the output tuple

        Show
        Ankur added a comment - Attached is an implementation of a custom store function that splits the data dynamically based on the values of user specified key field in the output tuple
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12419527/958.v1.patch
        against trunk revision 815571.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to cause Findbugs to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/31/testReport/
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/31/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419527/958.v1.patch against trunk revision 815571. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/31/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/31/console This message is automatically generated.
        Hide
        Ankur added a comment -

        Hudson seems to be failing during compilation as my test case defined in package org.apache.pig.piggybank.test.storage is reusing certain classes from org.apache.pig.test, namely 'Util' and MiniCluster.

        Show
        Ankur added a comment - Hudson seems to be failing during compilation as my test case defined in package org.apache.pig.piggybank.test.storage is reusing certain classes from org.apache.pig.test, namely 'Util' and MiniCluster.
        Hide
        Ankur added a comment -

        Fixed wrong src path of the classes

        Show
        Ankur added a comment - Fixed wrong src path of the classes
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch
        against trunk revision 817319.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 280 release audit warnings (more than the trunk's current 278 warnings).

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch against trunk revision 817319. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 280 release audit warnings (more than the trunk's current 278 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/console This message is automatically generated.
        Hide
        Pradeep Kamath added a comment -

        In general the patch looks good - do address the hudson QA issues. Some code review comments:

        1) I don't think pig.properties changes should be part of this patch - was this accidental? Likewise for Main.java.

        2)

         72   /**                                                                                                                                                                                                                                  
         73    *  storeUID is needed to get the around the case of multi-query optimisation where                                                                                                                                                  
         74    *  multiple instances of MultiStorage could be running in a single mapper/reducer                                                                                                                                                   
         75    *  and run the risk of overwriting each other's output.                                                                                                                                                                             
         76    */      
        

        Wouldn't the multiple instances be writing to different locations in which case there should be no race condition right?

        3)

        172   private void initJobSpecificParams() throws IOException {                                                                                                                                                                            
        173     if (partition == null || outputPath == null) {   
        

        Wouldn't the outputPath never be null since it is initialized in the constructor?

        4) Consider removing the log.debug() since these are all in the inner loop and would possibly impact performance.

        5)

        291         String fieldValueBasedPathStr = fieldValueBasedPath.toUri().getPath(); 
        

        This variable is only really used in a log.debug(), I think removing this and using fieldValueBasedPath in all the fs.create() will make the code shorter and cleaner.

        6) Some comments on why the move of the output dirs is needed and in the moveResults() and removePart() methods would be helpful. Additional comments on the code flow would also help

        Show
        Pradeep Kamath added a comment - In general the patch looks good - do address the hudson QA issues. Some code review comments: 1) I don't think pig.properties changes should be part of this patch - was this accidental? Likewise for Main.java. 2) 72 /** 73 * storeUID is needed to get the around the case of multi-query optimisation where 74 * multiple instances of MultiStorage could be running in a single mapper/reducer 75 * and run the risk of overwriting each other's output. 76 */ Wouldn't the multiple instances be writing to different locations in which case there should be no race condition right? 3) 172 private void initJobSpecificParams() throws IOException { 173 if (partition == null || outputPath == null) { Wouldn't the outputPath never be null since it is initialized in the constructor? 4) Consider removing the log.debug() since these are all in the inner loop and would possibly impact performance. 5) 291 String fieldValueBasedPathStr = fieldValueBasedPath.toUri().getPath(); This variable is only really used in a log.debug(), I think removing this and using fieldValueBasedPath in all the fs.create() will make the code shorter and cleaner. 6) Some comments on why the move of the output dirs is needed and in the moveResults() and removePart() methods would be helpful. Additional comments on the code flow would also help
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch
        against trunk revision 818929.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 281 release audit warnings (more than the trunk's current 279 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch against trunk revision 818929. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 281 release audit warnings (more than the trunk's current 279 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/46/console This message is automatically generated.
        Hide
        Pradeep Kamath added a comment -

        The release audit warning I think is related to missing Apache header comment - can you add Apache header comment by pasting it from some other source file in svn - every file needs to have the apache header as a comment at the beginning of the file - you will need to add it to the beginning of source and test file. Also if you agree with any of the review comments you can incorporate those changes when you submit the next version of the patch.

        Show
        Pradeep Kamath added a comment - The release audit warning I think is related to missing Apache header comment - can you add Apache header comment by pasting it from some other source file in svn - every file needs to have the apache header as a comment at the beginning of the file - you will need to add it to the beginning of source and test file. Also if you agree with any of the review comments you can incorporate those changes when you submit the next version of the patch.
        Hide
        Ankur added a comment -

        Pradeep,
        Thanks for your review comments. I have incorporated the suggestions provided in the code review. The code is vastly simplified, cleaner and more readable .

        Unit test now pass in local mode but fail in cluster mode after taking an update of Pig code base. The error I see is :-
        hdfs://localhost.localdomain:40352/user/gankur/output/_temporary/_attempt_20091009030519686_0001_m_000000_0/output, expected: file:///

        Looks like a config issue with org.apache.pig.test.MiniCluster in the latest pig code. I didn't get time to debug this as I am going on a vacation. Regardless, I have attached the new patch for your review. Please suggest what needs to be done to pass the unit test in cluster mode.

        -Ankur

        Show
        Ankur added a comment - Pradeep, Thanks for your review comments. I have incorporated the suggestions provided in the code review. The code is vastly simplified, cleaner and more readable . Unit test now pass in local mode but fail in cluster mode after taking an update of Pig code base. The error I see is :- hdfs://localhost.localdomain:40352/user/gankur/output/_temporary/_attempt_20091009030519686_0001_m_000000_0/output, expected: file:/// Looks like a config issue with org.apache.pig.test.MiniCluster in the latest pig code. I didn't get time to debug this as I am going on a vacation. Regardless, I have attached the new patch for your review. Please suggest what needs to be done to pass the unit test in cluster mode. -Ankur
        Hide
        Pradeep Kamath added a comment -

        +1 - changes looks good!
        For the test, I observed you were using the mapreduce mode pigserver object even in local mode - I made some changes but was unable to run the tests due to some config issue in setting up the test run - did not explore more - nevertheless here is what I changed:

        
        

        127 private void testMultiStorage(PigServer pigServer, Mode mode,
        128 String... queries) throws IOException {
        129 PigServer ps = (mode == Mode.cluster) ? pigServer: pigServerLocal;
        130 ps.setBatchOn();
        131 for (String query : queries)

        { 132 ps.registerQuery(query); 133 }


        134 ps.executeBatch();
        135 verifyResults(mode);
        136 }

        {nofrmat}

        Check if making the above changes solves the issue you are seeing.

        Show
        Pradeep Kamath added a comment - +1 - changes looks good! For the test, I observed you were using the mapreduce mode pigserver object even in local mode - I made some changes but was unable to run the tests due to some config issue in setting up the test run - did not explore more - nevertheless here is what I changed: 127 private void testMultiStorage(PigServer pigServer, Mode mode, 128 String... queries) throws IOException { 129 PigServer ps = (mode == Mode.cluster) ? pigServer: pigServerLocal; 130 ps.setBatchOn(); 131 for (String query : queries) { 132 ps.registerQuery(query); 133 } 134 ps.executeBatch(); 135 verifyResults(mode); 136 } {nofrmat} Check if making the above changes solves the issue you are seeing.
        Hide
        Ankur added a comment -

        1. When run in cluster mode, static variable PigMapReduce.sJobConf is null when checked in the UDF constructor but NOT null when UDF is actually invoked. This causes incorrect initialization of FileSystem object 'fs' to local filesystem, causing the test to fail. Moved to 'fs' initialization to intijobSpecificParams() method.

        2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed the manual deletion. As a side effect, user specified PARENT output directory in the UDF will have empty part-* files. These should be deleted manually by the user.

        Verfied that UDF works correctly and that unit test pass

        Show
        Ankur added a comment - 1. When run in cluster mode, static variable PigMapReduce.sJobConf is null when checked in the UDF constructor but NOT null when UDF is actually invoked. This causes incorrect initialization of FileSystem object 'fs' to local filesystem, causing the test to fail. Moved to 'fs' initialization to intijobSpecificParams() method. 2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed the manual deletion. As a side effect, user specified PARENT output directory in the UDF will have empty part-* files. These should be deleted manually by the user. Verfied that UDF works correctly and that unit test pass
        Hide
        Ankur added a comment -

        Just back from vacation. Have updated the code with required changes. It should be good to go now. Pradeep can you or any other committer review it ?

        Show
        Ankur added a comment - Just back from vacation. Have updated the code with required changes. It should be good to go now. Pradeep can you or any other committer review it ?
        Hide
        Ankur added a comment -

        Can we have an update on this please ?

        Show
        Ankur added a comment - Can we have an update on this please ?
        Hide
        Pradeep Kamath added a comment -

        2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed the manual deletion. As a side effect, user specified PARENT output directory in the UDF will have empty part-* files. These should be deleted manually by the user.

        Can you explain this a little more - been long since I last looked at the code - there seems to be some mv and this deletion happening - if you can explain that part too it would be helpful

        Otherwise looks good.

        Show
        Pradeep Kamath added a comment - 2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed the manual deletion. As a side effect, user specified PARENT output directory in the UDF will have empty part-* files. These should be deleted manually by the user. Can you explain this a little more - been long since I last looked at the code - there seems to be some mv and this deletion happening - if you can explain that part too it would be helpful Otherwise looks good.
        Hide
        Pradeep Kamath added a comment -

        I saw compile errors while trying to run unit test:

        [..contrib/piggybank/java]ant test
        ..
        
            [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:44: cannot find symbol
            [javac] symbol  : variable MiniCluster
            [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage
            [javac]   private MiniCluster cluster = MiniCluster.buildCluster();
            [javac]                                 ^
            [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:73: cannot find symbol
            [javac] symbol  : variable Util
            [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage
            [javac]     Util.deleteFile(cluster, INPUT_FILE);
            [javac]     ^
            [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:74: cannot find symbol
            [javac] symbol  : variable Util
            [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage
            [javac]     Util.copyFromLocalToCluster(cluster, INPUT_FILE, INPUT_FILE);
            [javac]     ^
            [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:96: cannot find symbol
            [javac] symbol  : variable Util
            [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage
            [javac]     Util.deleteFile(cluster, INPUT_FILE);
            [javac]     ^
        ..
        
        Show
        Pradeep Kamath added a comment - I saw compile errors while trying to run unit test: [..contrib/piggybank/java]ant test .. [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:44: cannot find symbol [javac] symbol : variable MiniCluster [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] private MiniCluster cluster = MiniCluster.buildCluster(); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:73: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.deleteFile(cluster, INPUT_FILE); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:74: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.copyFromLocalToCluster(cluster, INPUT_FILE, INPUT_FILE); [javac] ^ [javac] /homes/pradeepk/dev/pig-commit/PIG-958.v4/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java:96: cannot find symbol [javac] symbol : variable Util [javac] location: class org.apache.pig.piggybank.test.storage.TestMultiStorage [javac] Util.deleteFile(cluster, INPUT_FILE); [javac] ^ ..
        Hide
        Ankur added a comment -

        > Can you explain this a little bit more - ......
        In the earlier patch (958.v3.patch), After moving the results from the tasks current working directory, I was manually deleting the directory. This is to ensure that empty part files don't get moved to the final output directory. But doing so causes hadoop to complain that it can no longer write to task's output dir and the task fails.

        > I saw compile errors while trying to run unit test: ...
        Did you compile the pig.jar and ran core test before. This creates the necessary classes and jar file son the local machine required by contrib tests.

        On my local machine
        gankur@grainflydivide-dr:pig_trunk$ ant
        ...
        buildJar:
        [echo] svnString 830456
        [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar
        [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar
        [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk

        gankur@grainflydivide-dr:pig_trunk$ ant test
        ...
        test-core:
        [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/build/test/logs
        [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/build/test/logs
        [junit] Running org.apache.pig.test.TestAdd
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec
        [junit] Running org.apache.pig.test.TestAlgebraicEval
        ...
        gankur@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/
        gankur@grainflydivide-dr:java$ ant test
        ...
        test:
        [echo] *** Running UDF tests ***
        [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
        [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
        [junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString
        [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF
        [junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.TestStat
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode
        [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor
        [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec
        [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader
        [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader
        [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestHelper
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestRegExLoader
        [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.302 sec
        [junit] Running org.apache.pig.piggybank.test.storage.TestSequenceFileLoader
        [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.883 sec

        BUILD SUCCESSFUL
        Total time: 58 seconds

        Show
        Ankur added a comment - > Can you explain this a little bit more - ...... In the earlier patch (958.v3.patch), After moving the results from the tasks current working directory, I was manually deleting the directory. This is to ensure that empty part files don't get moved to the final output directory. But doing so causes hadoop to complain that it can no longer write to task's output dir and the task fails. > I saw compile errors while trying to run unit test: ... Did you compile the pig.jar and ran core test before. This creates the necessary classes and jar file son the local machine required by contrib tests. On my local machine gankur@grainflydivide-dr:pig_trunk$ ant ... buildJar: [echo] svnString 830456 [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk gankur@grainflydivide-dr:pig_trunk$ ant test ... test-core: [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/build/test/logs [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/build/test/logs [junit] Running org.apache.pig.test.TestAdd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec [junit] Running org.apache.pig.test.TestAlgebraicEval ... gankur@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/ gankur@grainflydivide-dr:java$ ant test ... test: [echo] *** Running UDF tests *** [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs [junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec [junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF [junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec [junit] Running org.apache.pig.piggybank.test.evaluation.TestStat [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec [junit] Running org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec [junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec [junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec [junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec [junit] Running org.apache.pig.piggybank.test.storage.TestHelper [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec [junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec [junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec [junit] Running org.apache.pig.piggybank.test.storage.TestRegExLoader [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.302 sec [junit] Running org.apache.pig.piggybank.test.storage.TestSequenceFileLoader [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.883 sec BUILD SUCCESSFUL Total time: 58 seconds
        Hide
        Pradeep Kamath added a comment -

        Patch committed, thanks for the contribution Ankur!

        Show
        Pradeep Kamath added a comment - Patch committed, thanks for the contribution Ankur!

          People

          • Assignee:
            Ankur
            Reporter:
            Ankur
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development