Pig
  1. Pig
  2. PIG-2347

Fix Pig Unit tests for hadoop 23

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1, 0.10.0, 0.11
    • Fix Version/s: 0.9.2, 0.10.0, 0.11
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is the continuation work for PIG-2125. There are still 20+ unit test suit for hadoop 23. We need to fix them.

      1. PIG-2347-4.patch
        1 kB
        Daniel Dai
      2. PIG-2347-3_0.9.patch
        5 kB
        Daniel Dai
      3. PIG-2347-AvroDatumWriter.java
        2 kB
        Thomas Weise
      4. PIG-2347-3.patch
        53 kB
        Daniel Dai
      5. PIG-2347.patch
        9 kB
        Arun C Murthy
      6. PIG-2347-2.patch
        9 kB
        Arun C Murthy
      7. PIG-2347-1.patch
        9 kB
        Daniel Dai
      8. syslog
        67 kB
        Tom White
      9. PIG-2347.patch
        11 kB
        Tom White
      10. PIG-2347.patch
        7 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          Tests that were failing were due to incomplete classpaths causing class not found exceptions. With this patch I get past that but now hit MAPREDUCE-3389.

          Show
          Tom White added a comment - Tests that were failing were due to incomplete classpaths causing class not found exceptions. With this patch I get past that but now hit MAPREDUCE-3389 .
          Hide
          Tom White added a comment -

          Here's a new patch which generates a classpath file to take advantage of MAPREDUCE-3389.

          However, lots of tests are still failing. For example, TestAccumulator times out, with the following exception in the log (see attached syslog file)

          Exception: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Application doesn't exist in cache appattempt_1323213758124_0001_000001

          I'm not sure what is happening here, any pointers appreciated.

          Show
          Tom White added a comment - Here's a new patch which generates a classpath file to take advantage of MAPREDUCE-3389 . However, lots of tests are still failing. For example, TestAccumulator times out, with the following exception in the log (see attached syslog file) Exception: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Application doesn't exist in cache appattempt_1323213758124_0001_000001 I'm not sure what is happening here, any pointers appreciated.
          Hide
          Daniel Dai added a comment -

          Thanks, Tom! I will take a look. There is still lots of tests fail for me as well.

          Show
          Daniel Dai added a comment - Thanks, Tom! I will take a look. There is still lots of tests fail for me as well.
          Hide
          Ahmed Radwan added a comment -

          Thanks Tom! I have tried the new patch on a small (4-node) hadoop-0.23 cluster.
          I checked out trunk, applied your patch, but I also needed to manually edit the build.xml to change the "hadoopversion" to "23" (otherwise it doesn't run: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String

          Changing the version to 23 and rebuilding solves this issue. I have run some basic Pig scripts without problems.

          Can we add a command line argument to ant to specify the hadoopVersion so we don't need to manually edit the build.xml?

          Show
          Ahmed Radwan added a comment - Thanks Tom! I have tried the new patch on a small (4-node) hadoop-0.23 cluster. I checked out trunk, applied your patch, but I also needed to manually edit the build.xml to change the "hadoopversion" to "23" (otherwise it doesn't run: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String Changing the version to 23 and rebuilding solves this issue. I have run some basic Pig scripts without problems. Can we add a command line argument to ant to specify the hadoopVersion so we don't need to manually edit the build.xml?
          Hide
          Thomas Weise added a comment -

          ant -Dhadoopversion=23 clean test

          Show
          Thomas Weise added a comment - ant -Dhadoopversion=23 clean test
          Hide
          Daniel Dai added a comment -

          Arun Murthy provides PIG-2347-1.patch, which fix a large amount of failures related to MiniMRYarnCluster and counters. Now I see 22 failed test suits:
          TestEvalPipeline
          TestEvalPipeline2
          TestEvalPipelineLocal
          TestFRJoin
          TestFRJoin2
          TestGrunt
          TestHBaseStorage
          TestJobSubmission
          TestJoinSmoke
          TestKeyTypeDiscoveryVisitor
          TestMergeJoin
          TestMultiQuery
          TestMultiQueryBasic
          TestMultiQueryCompiler
          TestNestedForeach
          TestParser
          TestPigRunner
          TestPruneColumn
          TestScalarAliases
          TestScriptLanguage
          TestScriptUDF
          TestSkewedJoin

          Show
          Daniel Dai added a comment - Arun Murthy provides PIG-2347 -1.patch, which fix a large amount of failures related to MiniMRYarnCluster and counters. Now I see 22 failed test suits: TestEvalPipeline TestEvalPipeline2 TestEvalPipelineLocal TestFRJoin TestFRJoin2 TestGrunt TestHBaseStorage TestJobSubmission TestJoinSmoke TestKeyTypeDiscoveryVisitor TestMergeJoin TestMultiQuery TestMultiQueryBasic TestMultiQueryCompiler TestNestedForeach TestParser TestPigRunner TestPruneColumn TestScalarAliases TestScriptLanguage TestScriptUDF TestSkewedJoin
          Hide
          Tom White added a comment -

          Arun/Daniel - thanks for the update. I ran a few of the tests that were previously failing for me (e.g. TestAccumulator) and they now pass.

          Apart from the change to PigStatsUtil, I think this could be committed, and the remaining failures fixed in another JIRA. I opened MAPREDUCE-3542 to fix the counter name incompatibility.

          Show
          Tom White added a comment - Arun/Daniel - thanks for the update. I ran a few of the tests that were previously failing for me (e.g. TestAccumulator) and they now pass. Apart from the change to PigStatsUtil, I think this could be committed, and the remaining failures fixed in another JIRA. I opened MAPREDUCE-3542 to fix the counter name incompatibility.
          Hide
          Arun C Murthy added a comment -

          More fixes to ensure Pig uses multiple NMs for unit tests.

          With this and MAPREDUCE-3537 we are down to 15 test failures.

          Show
          Arun C Murthy added a comment - More fixes to ensure Pig uses multiple NMs for unit tests. With this and MAPREDUCE-3537 we are down to 15 test failures.
          Hide
          Arun C Murthy added a comment -

          Related: TestJobSubmission fails because Pig depends on hbase-0.90.0 which doesn't play nice with hadoop-0.23.

          I guess we should bump up hbase version to 0.90.x or 0.92?

          Show
          Arun C Murthy added a comment - Related: TestJobSubmission fails because Pig depends on hbase-0.90.0 which doesn't play nice with hadoop-0.23. I guess we should bump up hbase version to 0.90.x or 0.92?
          Hide
          Tom White added a comment -

          Yes to 0.92, although it has not been released yet.

          Show
          Tom White added a comment - Yes to 0.92, although it has not been released yet.
          Hide
          Arun C Murthy added a comment -

          So, hbase-0.92 will work with hadoop-0.23 by default? Or is it a different mvn profile?

          Show
          Arun C Murthy added a comment - So, hbase-0.92 will work with hadoop-0.23 by default? Or is it a different mvn profile?
          Hide
          Tom White added a comment -

          A different profile (pass -Dhadoop.profile=23).

          Show
          Tom White added a comment - A different profile (pass -Dhadoop.profile=23).
          Hide
          Arun C Murthy added a comment -

          Ok, this means hbase-0.92 will have to ship jars of both profiles? (20.2xx & 23)? Is that already the case?

          Show
          Arun C Murthy added a comment - Ok, this means hbase-0.92 will have to ship jars of both profiles? (20.2xx & 23)? Is that already the case?
          Hide
          Roman Shaposhnik added a comment -

          HBase 0.92 has decided to make dependency on hadoop optional. Downstream of HBase is now required to manage that explicitly.

          Show
          Roman Shaposhnik added a comment - HBase 0.92 has decided to make dependency on hadoop optional. Downstream of HBase is now required to manage that explicitly.
          Hide
          Arun C Murthy added a comment -

          So, if pig depends on hadoop-0.23 and hbase-0.92 it should work fine?

          Show
          Arun C Murthy added a comment - So, if pig depends on hadoop-0.23 and hbase-0.92 it should work fine?
          Hide
          Roman Shaposhnik added a comment -

          @Arun: correct. The HBase .jar is agnostic to the version of the Hadoop that it runs against. The hbase-0.92-test.jar, however, is not. So if you're depending on any code from there – your original comment applies.

          Show
          Roman Shaposhnik added a comment - @Arun: correct. The HBase .jar is agnostic to the version of the Hadoop that it runs against. The hbase-0.92-test.jar, however, is not. So if you're depending on any code from there – your original comment applies.
          Hide
          Arun C Murthy added a comment -

          Roman Shaposhnik Pig needs both hbase-0.92.jar and hbase-0.92-test.jar. Will hbase-0.92 ship with test jars for both hadoop-0.20.2xx and hadoop-0.23?

          Show
          Arun C Murthy added a comment - Roman Shaposhnik Pig needs both hbase-0.92.jar and hbase-0.92-test.jar. Will hbase-0.92 ship with test jars for both hadoop-0.20.2xx and hadoop-0.23?
          Hide
          Arun C Murthy added a comment -

          Updated patch, should fix TestFRJoin also.

          Show
          Arun C Murthy added a comment - Updated patch, should fix TestFRJoin also.
          Hide
          Roman Shaposhnik added a comment -

          @Arun, not currently. There's a plan to make test artifact Hadoop version agnostic, though: HBASE-4850

          Show
          Roman Shaposhnik added a comment - @Arun, not currently. There's a plan to make test artifact Hadoop version agnostic, though: HBASE-4850
          Hide
          Arun C Murthy added a comment -

          MAPREDUCE-3563 fixes a couple of Pig unit tests too.

          Show
          Arun C Murthy added a comment - MAPREDUCE-3563 fixes a couple of Pig unit tests too.
          Hide
          Thomas Weise added a comment -

          Patch for 0.9 branch that includes relevant portion of trunk changes plus conditional exclude of JobHistoryLoader for 0.23

          Show
          Thomas Weise added a comment - Patch for 0.9 branch that includes relevant portion of trunk changes plus conditional exclude of JobHistoryLoader for 0.23
          Hide
          Daniel Dai added a comment -

          PIG-2347-3.patch should fix all 23 unit tests theoretically. I still haven't finish a complete successful run in 23 yet, but I want to share early so folks can help test.

          There are still couple of holes:
          1. PIG-2446, Arun is still investigating alternatives for map input bytes
          2. PIG-2433 & PIG-2449, which are test failures revealed in 23 test, but they are not 23 related
          3. TestHBaseStorage & TestJobSubmission.testReducerNumEstimation are blocked by HBASE-4850

          I skip the above mentioned tests for 23 in the patch.

          Show
          Daniel Dai added a comment - PIG-2347 -3.patch should fix all 23 unit tests theoretically. I still haven't finish a complete successful run in 23 yet, but I want to share early so folks can help test. There are still couple of holes: 1. PIG-2446 , Arun is still investigating alternatives for map input bytes 2. PIG-2433 & PIG-2449 , which are test failures revealed in 23 test, but they are not 23 related 3. TestHBaseStorage & TestJobSubmission.testReducerNumEstimation are blocked by HBASE-4850 I skip the above mentioned tests for 23 in the patch.
          Hide
          Daniel Dai added a comment -

          Tests run complete successfully for both 20 and 23 with PIG-2347-3.patch.

          Show
          Daniel Dai added a comment - Tests run complete successfully for both 20 and 23 with PIG-2347 -3.patch.
          Hide
          Daniel Dai added a comment -

          test-patch result:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 43 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 508 release audit warnings (more than the trunk's current 500 warnings).

          The only new file is excluded-tests-23, which has no way to add Apache header. So ignore release audit warning.

          Patch is ready for review.

          Show
          Daniel Dai added a comment - test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 43 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 508 release audit warnings (more than the trunk's current 500 warnings). The only new file is excluded-tests-23, which has no way to add Apache header. So ignore release audit warning. Patch is ready for review.
          Hide
          Thejas M Nair added a comment -

          +1

          Show
          Thejas M Nair added a comment - +1
          Hide
          Daniel Dai added a comment -

          Patch committed to trunk/0.10. I need to do some resync for 0.9 branch.

          Show
          Daniel Dai added a comment - Patch committed to trunk/0.10. I need to do some resync for 0.9 branch.
          Hide
          Daniel Dai added a comment -

          Attach patch for 0.9 branch PIG-2347-3_0.9.patch.

          Show
          Daniel Dai added a comment - Attach patch for 0.9 branch PIG-2347 -3_0.9.patch.
          Hide
          Thomas Weise added a comment -

          For 0.9, had to make a small change to the ivy.xml to get past CNF exception (missing slf4j-api dependency):

          <dependency org="org.slf4j" name="slf4j-log4j12" rev="$

          {slf4j-log4j12.version}

          "
          conf="compile->master;test->default"/>

          In general it looks like we could rely more on the maven dependencies (refer to default configuration rather than master and if necessary blacklist problematic dependencies).

          Show
          Thomas Weise added a comment - For 0.9, had to make a small change to the ivy.xml to get past CNF exception (missing slf4j-api dependency): <dependency org="org.slf4j" name="slf4j-log4j12" rev="$ {slf4j-log4j12.version} " conf="compile->master;test->default"/> In general it looks like we could rely more on the maven dependencies (refer to default configuration rather than master and if necessary blacklist problematic dependencies).
          Hide
          Daniel Dai added a comment -

          All unit tests pass on 0.9 branch as well.

          Show
          Daniel Dai added a comment - All unit tests pass on 0.9 branch as well.
          Hide
          Daniel Dai added a comment -

          Patch also committed to 0.9 branch.

          Show
          Daniel Dai added a comment - Patch also committed to 0.9 branch.
          Hide
          Thomas Weise added a comment -

          Thanks Daniel. Please change default hadoopversion back to 20. As result of avro update from 1.4.1 to 1.5.3 there is a compile error in piggybank. I will update PIG-2410 to account for the same.

          Show
          Thomas Weise added a comment - Thanks Daniel. Please change default hadoopversion back to 20. As result of avro update from 1.4.1 to 1.5.3 there is a compile error in piggybank. I will update PIG-2410 to account for the same.
          Hide
          Daniel Dai added a comment -

          Yes, default version should be 20. I change it back. Thanks Thomas.

          Show
          Daniel Dai added a comment - Yes, default version should be 20. I change it back. Thanks Thomas.
          Hide
          Thomas Weise added a comment -

          PigAvroDatumWriter for 0.9 piggybank as separate patch.

          Show
          Thomas Weise added a comment - PigAvroDatumWriter for 0.9 piggybank as separate patch.
          Hide
          Daniel Dai added a comment -

          +1 for PigAvroDatumWriter change.

          Show
          Daniel Dai added a comment - +1 for PigAvroDatumWriter change.
          Hide
          Daniel Dai added a comment -

          I committed PIG-2347-AvroDatumWriter.java to 0.9 branch to fix piggybank compilation failure. Thanks Thomas!

          Show
          Daniel Dai added a comment - I committed PIG-2347 -AvroDatumWriter.java to 0.9 branch to fix piggybank compilation failure. Thanks Thomas!
          Hide
          Tom White added a comment -

          > Tests run complete successfully for both 20 and 23 with PIG-2347-3.patch.

          Daniel, is this running test-commit, test-core, or test-unit?

          Show
          Tom White added a comment - > Tests run complete successfully for both 20 and 23 with PIG-2347 -3.patch. Daniel, is this running test-commit, test-core, or test-unit?
          Hide
          Daniel Dai added a comment -

          It's all unit tests, include test-commit, test-core, test-unit. Are you able to run?

          Show
          Daniel Dai added a comment - It's all unit tests, include test-commit, test-core, test-unit. Are you able to run?
          Show
          Roman Shaposhnik added a comment - Daniel, we're seeing ~100 test failures when the tests are executed against the tip of the branch-0.9: http://bigtop01.cloudera.org:8080/view/0.23%20Unit%20Tests/job/Bigtop-hadoop23-pig-unit-test/lastCompletedBuild/testReport/ Do you have any suggestions for us as what could be wrong. E.g.: http://bigtop01.cloudera.org:8080/view/0.23%20Unit%20Tests/job/Bigtop-hadoop23-pig-unit-test/lastCompletedBuild/testReport/org.apache.pig.test/TestAccumulator/testAccumWithBuildinAvg/
          Hide
          Daniel Dai added a comment -

          I see lots of "Too many open files" errors. Seems like you still hit MAPREDUCE-3586. Can you check if you have lots of MRAppMaster process running? Can you try to remove .ivy2 and pull all jars again?

          Show
          Daniel Dai added a comment - I see lots of "Too many open files" errors. Seems like you still hit MAPREDUCE-3586 . Can you check if you have lots of MRAppMaster process running? Can you try to remove .ivy2 and pull all jars again?
          Hide
          Roman Shaposhnik added a comment -

          Thanks for the suggestion. I've added removal of .m2 and .ivy2 caches and also I'm running the # of process monitoring script. So far it seems that the issue is not really relate to the # of processes. But I'll let the testrun finish:
          http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop23-pig-unit-test/13/console

          Show
          Roman Shaposhnik added a comment - Thanks for the suggestion. I've added removal of .m2 and .ivy2 caches and also I'm running the # of process monitoring script. So far it seems that the issue is not really relate to the # of processes. But I'll let the testrun finish: http://bigtop01.cloudera.org:8080/job/Bigtop-hadoop23-pig-unit-test/13/console
          Hide
          Daniel Dai added a comment -

          If TestAccumulator fails, I bet a lot test cases will fail.

          Have you run it locally? Thomas and I run the tests on several different machines, the result seems good.

          Show
          Daniel Dai added a comment - If TestAccumulator fails, I bet a lot test cases will fail. Have you run it locally? Thomas and I run the tests on several different machines, the result seems good.
          Hide
          Daniel Dai added a comment -

          I still see "Too many open files", is that hit the system limit?

          Show
          Daniel Dai added a comment - I still see "Too many open files", is that hit the system limit?
          Hide
          Roman Shaposhnik added a comment -

          The previous limit was a couple of thousand open files. I bumped it to 32768 (as confirmed by ulimit -a) and re-started the tests.

          Show
          Roman Shaposhnik added a comment - The previous limit was a couple of thousand open files. I bumped it to 32768 (as confirmed by ulimit -a) and re-started the tests.
          Hide
          Daniel Dai added a comment -

          Hi, Roman,
          I see #15 finish, seems it takes much more time than on my machine (mine takes 8h). TestSkewedJoin & TestEvalPipeline2 fail due to timeout, might blame to the slowness. You can try to increase timeout in build.xml. Many of other failures seems related to ant 1.8 (PIG-2172). Use ant 1.7 should solve the issue.

          Show
          Daniel Dai added a comment - Hi, Roman, I see #15 finish, seems it takes much more time than on my machine (mine takes 8h). TestSkewedJoin & TestEvalPipeline2 fail due to timeout, might blame to the slowness. You can try to increase timeout in build.xml. Many of other failures seems related to ant 1.8 ( PIG-2172 ). Use ant 1.7 should solve the issue.
          Hide
          Daniel Dai added a comment -

          Some new development in hadoop23 require commons-httpclient.jar. PIG-2347-4.patch add it.

          Show
          Daniel Dai added a comment - Some new development in hadoop23 require commons-httpclient.jar. PIG-2347 -4.patch add it.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Daniel, seems odd that commons-httpclient is required by hadoop23. This library has been EOLed and replaced by httpcore and httpclient (see http://hc.apache.org/httpclient-3.x/). We pull that in via requiring httpcomponents.version=4.1. What was the failure you got without commons-httpclient?

          Show
          Dmitriy V. Ryaboy added a comment - Daniel, seems odd that commons-httpclient is required by hadoop23. This library has been EOLed and replaced by httpcore and httpclient (see http://hc.apache.org/httpclient-3.x/ ). We pull that in via requiring httpcomponents.version=4.1. What was the failure you got without commons-httpclient?
          Hide
          Daniel Dai added a comment -

          Here is the stack:
          java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
          at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:458)
          Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
          ... 1 more

          This happens when running local mode test (PIG-2480, Harsh's comment)

          Show
          Daniel Dai added a comment - Here is the stack: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:458) Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 1 more This happens when running local mode test ( PIG-2480 , Harsh's comment)
          Hide
          Dmitriy V. Ryaboy added a comment -

          I see, thanks.

          This is a Hadoop 0.23 bug, not a Pig issue. Theoretically Hadoop should be listing commons-httpclient as a dependency in its pom, it should come in from ivy via transitive dependencies, and automatically get on the classpath.

          In fact, that's happening: if I remove this dependency from ivy/libraries.properties, it still shows up in build/ivy/lib/Pig when I build for hadoop 20, but not if I build for hadoop 23.

          Should be fixed upstream...

          Show
          Dmitriy V. Ryaboy added a comment - I see, thanks. This is a Hadoop 0.23 bug, not a Pig issue. Theoretically Hadoop should be listing commons-httpclient as a dependency in its pom, it should come in from ivy via transitive dependencies, and automatically get on the classpath. In fact, that's happening: if I remove this dependency from ivy/libraries.properties, it still shows up in build/ivy/lib/Pig when I build for hadoop 20, but not if I build for hadoop 23. Should be fixed upstream...
          Hide
          Arun C Murthy added a comment -

          Dmitriy - hadoop does declare a dep on commons-httpclient in our pom. The problem is that Pig doesn't do transitive closure of Hadoop's deps for various reasons (differences in versions etc.) and hence we have to manually include them in Pig. I agree this isn't ideal, but I don't see a way around without a massive change in Pig.

          Show
          Arun C Murthy added a comment - Dmitriy - hadoop does declare a dep on commons-httpclient in our pom. The problem is that Pig doesn't do transitive closure of Hadoop's deps for various reasons (differences in versions etc.) and hence we have to manually include them in Pig. I agree this isn't ideal, but I don't see a way around without a massive change in Pig.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Arun – it's not a big deal to include it, I am just curious why it works for 20 and doesn't work for 23, given the same Pig. Commons-httpclient does get pulled in via transitive dependencies when we depend on 0.20 progeny.

          Show
          Dmitriy V. Ryaboy added a comment - Arun – it's not a big deal to include it, I am just curious why it works for 20 and doesn't work for 23, given the same Pig. Commons-httpclient does get pulled in via transitive dependencies when we depend on 0.20 progeny.
          Hide
          Thomas Weise added a comment -

          @Dmitriy: 0.20 is referenced through its default configuration (which will allow for the dependencies to be pulled), while 0.23 uses master, which will only pull the artifact. We should really try to improve this to let ivy/maven do the work for 0.23, the current setup is quite fragile and verbose.

          conf="hadoop20->default"/>

          vs.

          conf="hadoop23->master"/>

          Show
          Thomas Weise added a comment - @Dmitriy: 0.20 is referenced through its default configuration (which will allow for the dependencies to be pulled), while 0.23 uses master, which will only pull the artifact. We should really try to improve this to let ivy/maven do the work for 0.23, the current setup is quite fragile and verbose. conf="hadoop20->default"/> vs. conf="hadoop23->master"/>
          Hide
          Dmitriy V. Ryaboy added a comment -

          Thanks for the explanation.
          I tried switching the conf to "default" and got a few maven errors.. Seems like default pulls in the world, and master pulls in nothing.. perhaps a "runtime" target or some minimal set of dependencies could be published. But that's a separate issue, probably better surfaced on the Hadoop project(s).

          fwiw, here are the errors I got when pulling default conf:

          
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		::          UNRESOLVED DEPENDENCIES         ::
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		:: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http://repo2.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons'; 
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		::              FAILED DOWNLOADS            ::
          [ivy:resolve] 		:: ^ see resolution messages for details  ^ ::
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		:: com.sun.jdmk#jmxtools;1.2.1!jmxtools.jar
          [ivy:resolve] 		:: com.sun.jmx#jmxri;1.2.1!jmxri.jar
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          
          Show
          Dmitriy V. Ryaboy added a comment - Thanks for the explanation. I tried switching the conf to "default" and got a few maven errors.. Seems like default pulls in the world, and master pulls in nothing.. perhaps a "runtime" target or some minimal set of dependencies could be published. But that's a separate issue, probably better surfaced on the Hadoop project(s). fwiw, here are the errors I got when pulling default conf: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http: //repo2.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons'; [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: FAILED DOWNLOADS :: [ivy:resolve] :: ^ see resolution messages for details ^ :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: com.sun.jdmk#jmxtools;1.2.1!jmxtools.jar [ivy:resolve] :: com.sun.jmx#jmxri;1.2.1!jmxri.jar [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
          Hide
          Thomas Weise added a comment -

          Those are issues with dependency POMs. You can see excludes for those elsewhere in the file:

          <exclude org="commons-daemon" module="commons-daemon"/><!-bad POM->
          <exclude org="org.apache.commons" module="commons-daemon"/><!-bad POM->

          I think we should follow the hbase setup in which we specifically exclude what we don't want vs. excluding all.

          Show
          Thomas Weise added a comment - Those are issues with dependency POMs. You can see excludes for those elsewhere in the file: <exclude org="commons-daemon" module="commons-daemon"/><!- bad POM -> <exclude org="org.apache.commons" module="commons-daemon"/><!- bad POM -> I think we should follow the hbase setup in which we specifically exclude what we don't want vs. excluding all.
          Hide
          Gianmarco De Francisci Morales added a comment -

          I am not an expert but I think the errors are just due to maven1 vs maven2 naming conventions and should be easily solvable.
          I will try to have a look into it soonish.

          I am not sure I would like to have a blacklist rather than a whitelist.
          Pulling in dependencies without control is itself a fragile setup.

          Show
          Gianmarco De Francisci Morales added a comment - I am not an expert but I think the errors are just due to maven1 vs maven2 naming conventions and should be easily solvable. I will try to have a look into it soonish. I am not sure I would like to have a blacklist rather than a whitelist. Pulling in dependencies without control is itself a fragile setup.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development