Pig
  1. Pig
  2. PIG-2700

Unit tests fail against Hadoop 2.0.0

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.9.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

      --- ivy/libraries.properties
      +++ ivy/libraries.properties
      @@ -37,9 +37,9 @@ guava.version=11.0
       jersey-core.version=1.8
       hadoop-core.version=1.0.0
       hadoop-test.version=1.0.0
      -hadoop-common.version=0.23.1
      -hadoop-hdfs.version=0.23.1
      -hadoop-mapreduce.version=0.23.1
      +hadoop-common.version=2.0.0-SNAPSHOT
      +hadoop-hdfs.version=2.0.0-SNAPSHOT
      +hadoop-mapreduce.version=2.0.0-SNAPSHOT
      

      And I see the following issues:

      1) copyFromLocalToCluster fails:

      fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
      java.io.IOException: fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
          at org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
      

      I am getting around this problem by explicitly creating intermediate directories that do not exist. (Please see the attached patch.)

      2) Many tests including TestAccumulator hang and eventually timeout. The JVM thread dump shows the following call stack:

      [junit]    java.lang.Thread.State: TIMED_WAITING (sleeping)
      [junit]     at java.lang.Thread.sleep(Native Method)
      [junit]     at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
      [junit]     at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
      [junit]     at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
      [junit]     at org.apache.pig.PigServer.storeEx(PigServer.java:996)
      [junit]     at org.apache.pig.PigServer.store(PigServer.java:963)
      [junit]     at org.apache.pig.PigServer.openIterator(PigServer.java:876)
      [junit]     at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
      

      This is because test jobs are never finished in the mini cluster. The reason why test jobs are never finished is because they fail with a ClassNotFound exception while being executed.

      In fact, this is a regression of HADOOP-6963 where hadoop introduced dependency on Apache Commons IO library:

      FileUtil.java
      isSymLink = org.apache.commons.io.FileUtils.isSymlink(allFiles[i]);
      

      But the Apache Commons IO library is missing in Pig, so test jobs keep failing in the mini cluster until timeout.

      I am fixing this issue by adding commons-io-2.3.jar to ivy.xml and library.properties. (Please see the attached patch.)

      1. PIG-2700.patch
        2 kB
        Cheolsoo Park
      2. PIG-2700.2.patch
        3 kB
        Cheolsoo Park

        Activity

        Cheolsoo Park created issue -
        Cheolsoo Park made changes -
        Field Original Value New Value
        Description I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

        {code}
        --- ivy/libraries.properties
        +++ ivy/libraries.properties
        @@ -37,9 +37,9 @@ guava.version=11.0
         jersey-core.version=1.8
         hadoop-core.version=1.0.0
         hadoop-test.version=1.0.0
        -hadoop-common.version=0.23.1
        -hadoop-hdfs.version=0.23.1
        -hadoop-mapreduce.version=0.23.1
        +hadoop-common.version=2.0.0-SNAPSHOT
        +hadoop-hdfs.version=2.0.0-SNAPSHOT
        +hadoop-mapreduce.version=2.0.0-SNAPSHOT
        {code}

        And see the following issues:

        1) copyFromLocalToCluster fails:
        {code}
        fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
        java.io.IOException: fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
            at org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
            at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
            at org.apache.pig.test.Util.copyFromLocalToCluster(Util.java:538)
            at org.apache.pig.test.TestAccumulator.createFiles(TestAccumulator.java:83)
            at org.apache.pig.test.TestAccumulator.setUp(TestAccumulator.java:63)
        {code}

        2) TestAccumulator times out with the following error message in the log:

        {code}
        Testcase: testAccumBasic took 794.69 sec
            Caused an ERROR
        Unable to open iterator for alias C
        org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C
            at org.apache.pig.PigServer.openIterator(PigServer.java:901)
            at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        Caused by: java.io.IOException: Job terminated with anomalous status FAILED
            at org.apache.pig.PigServer.openIterator(PigServer.java:893)
        {code}

        The JVM thread dump shows the following call stack:

        {code}
            [junit] java.lang.Thread.State: TIMED_WAITING (sleeping)
            [junit] at java.lang.Thread.sleep(Native Method)
            [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
            [junit] at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
            [junit] at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
            [junit] at org.apache.pig.PigServer.storeEx(PigServer.java:996)
            [junit] at org.apache.pig.PigServer.store(PigServer.java:963)
            [junit] at org.apache.pig.PigServer.openIterator(PigServer.java:876)
            [junit] at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        {code}

        As for the 1st issue, I am getting around with the attached patch. But I am not sure what's happening with the 2nd issue.
        I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

        {code}
        --- ivy/libraries.properties
        +++ ivy/libraries.properties
        @@ -37,9 +37,9 @@ guava.version=11.0
         jersey-core.version=1.8
         hadoop-core.version=1.0.0
         hadoop-test.version=1.0.0
        -hadoop-common.version=0.23.1
        -hadoop-hdfs.version=0.23.1
        -hadoop-mapreduce.version=0.23.1
        +hadoop-common.version=2.0.0-SNAPSHOT
        +hadoop-hdfs.version=2.0.0-SNAPSHOT
        +hadoop-mapreduce.version=2.0.0-SNAPSHOT
        {code}

        And see the following issues:

        1) copyFromLocalToCluster fails:
        {code}
        fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
        java.io.IOException: fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
            at org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
            at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
            at org.apache.pig.test.Util.copyFromLocalToCluster(Util.java:538)
            at org.apache.pig.test.TestAccumulator.createFiles(TestAccumulator.java:83)
            at org.apache.pig.test.TestAccumulator.setUp(TestAccumulator.java:63)
        {code}

        2) TestAccumulator times out with the following error message in the log:

        {code}
        Testcase: testAccumBasic took 794.69 sec
            Caused an ERROR
        Unable to open iterator for alias C
        org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C
            at org.apache.pig.PigServer.openIterator(PigServer.java:901)
            at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        Caused by: java.io.IOException: Job terminated with anomalous status FAILED
            at org.apache.pig.PigServer.openIterator(PigServer.java:893)
        {code}

        The JVM thread dump shows the following call stack:

        {code}
        [junit] java.lang.Thread.State: TIMED_WAITING (sleeping)
        [junit] at java.lang.Thread.sleep(Native Method)
        [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
        [junit] at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
        [junit] at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        [junit] at org.apache.pig.PigServer.storeEx(PigServer.java:996)
        [junit] at org.apache.pig.PigServer.store(PigServer.java:963)
        [junit] at org.apache.pig.PigServer.openIterator(PigServer.java:876)
        [junit] at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        {code}


        As for the 1st issue, I am getting around it with the following change:

        {code}
        diff --git test/org/apache/pig/test/Util.java test/org/apache/pig/test/Util.java
        index ca168ca..e88eb4a 100644
        --- test/org/apache/pig/test/Util.java
        +++ test/org/apache/pig/test/Util.java
        @@ -531,7 +531,14 @@ public class Util {
                 PigServer ps = new PigServer(ExecType.MAPREDUCE, cluster.getProperties());
                 String script = "fs -put " + localFileName + " " + fileNameOnCluster;
         
        - GruntParser parser = new GruntParser(new StringReader(script));
        + FileSystem fs = cluster.getFileSystem();
        + Path clusterFile = new Path(fileNameOnCluster);
        + Path clusterFileParent = clusterFile.getParent();
        + if (!fs.exists(clusterFileParent)) {
        + fs.mkdirs(clusterFileParent);
        + }
        +
        + GruntParser parser = new GruntParser(new StringReader(script));
                 parser.setInteractive(false);
                 parser.setParams(ps);
                 try {
        {code}

        But I am not sure what's happening with the 2nd issue.
        Cheolsoo Park made changes -
        Attachment PIG-2460.patch [ 12527565 ]
        Cheolsoo Park made changes -
        Patch Info Patch Available [ 10042 ]
        Description I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

        {code}
        --- ivy/libraries.properties
        +++ ivy/libraries.properties
        @@ -37,9 +37,9 @@ guava.version=11.0
         jersey-core.version=1.8
         hadoop-core.version=1.0.0
         hadoop-test.version=1.0.0
        -hadoop-common.version=0.23.1
        -hadoop-hdfs.version=0.23.1
        -hadoop-mapreduce.version=0.23.1
        +hadoop-common.version=2.0.0-SNAPSHOT
        +hadoop-hdfs.version=2.0.0-SNAPSHOT
        +hadoop-mapreduce.version=2.0.0-SNAPSHOT
        {code}

        And see the following issues:

        1) copyFromLocalToCluster fails:
        {code}
        fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
        java.io.IOException: fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
            at org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
            at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
            at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
            at org.apache.pig.test.Util.copyFromLocalToCluster(Util.java:538)
            at org.apache.pig.test.TestAccumulator.createFiles(TestAccumulator.java:83)
            at org.apache.pig.test.TestAccumulator.setUp(TestAccumulator.java:63)
        {code}

        2) TestAccumulator times out with the following error message in the log:

        {code}
        Testcase: testAccumBasic took 794.69 sec
            Caused an ERROR
        Unable to open iterator for alias C
        org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C
            at org.apache.pig.PigServer.openIterator(PigServer.java:901)
            at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        Caused by: java.io.IOException: Job terminated with anomalous status FAILED
            at org.apache.pig.PigServer.openIterator(PigServer.java:893)
        {code}

        The JVM thread dump shows the following call stack:

        {code}
        [junit] java.lang.Thread.State: TIMED_WAITING (sleeping)
        [junit] at java.lang.Thread.sleep(Native Method)
        [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
        [junit] at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
        [junit] at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        [junit] at org.apache.pig.PigServer.storeEx(PigServer.java:996)
        [junit] at org.apache.pig.PigServer.store(PigServer.java:963)
        [junit] at org.apache.pig.PigServer.openIterator(PigServer.java:876)
        [junit] at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        {code}


        As for the 1st issue, I am getting around it with the following change:

        {code}
        diff --git test/org/apache/pig/test/Util.java test/org/apache/pig/test/Util.java
        index ca168ca..e88eb4a 100644
        --- test/org/apache/pig/test/Util.java
        +++ test/org/apache/pig/test/Util.java
        @@ -531,7 +531,14 @@ public class Util {
                 PigServer ps = new PigServer(ExecType.MAPREDUCE, cluster.getProperties());
                 String script = "fs -put " + localFileName + " " + fileNameOnCluster;
         
        - GruntParser parser = new GruntParser(new StringReader(script));
        + FileSystem fs = cluster.getFileSystem();
        + Path clusterFile = new Path(fileNameOnCluster);
        + Path clusterFileParent = clusterFile.getParent();
        + if (!fs.exists(clusterFileParent)) {
        + fs.mkdirs(clusterFileParent);
        + }
        +
        + GruntParser parser = new GruntParser(new StringReader(script));
                 parser.setInteractive(false);
                 parser.setParams(ps);
                 try {
        {code}

        But I am not sure what's happening with the 2nd issue.
        I am running Pig unit tests against Hadoop 2.0.0-SNAPSHOT as follows:

        {code}
        --- ivy/libraries.properties
        +++ ivy/libraries.properties
        @@ -37,9 +37,9 @@ guava.version=11.0
         jersey-core.version=1.8
         hadoop-core.version=1.0.0
         hadoop-test.version=1.0.0
        -hadoop-common.version=0.23.1
        -hadoop-hdfs.version=0.23.1
        -hadoop-mapreduce.version=0.23.1
        +hadoop-common.version=2.0.0-SNAPSHOT
        +hadoop-hdfs.version=2.0.0-SNAPSHOT
        +hadoop-mapreduce.version=2.0.0-SNAPSHOT
        {code}

        And I see the following issues:

        1) copyFromLocalToCluster fails:
        {code}
        fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
        java.io.IOException: fs command '-put AccumulatorInput.txt AccumulatorInput.txt' failed. Please check output logs for details
            at org.apache.pig.tools.grunt.GruntParser.processFsCommand(GruntParser.java:1012)
        {code}

        I am getting around this problem by explicitly creating intermediate directories that do not exist. (Please see the attached patch.)


        2) Many tests including TestAccumulator hang and eventually timeout. The JVM thread dump shows the following call stack:

        {code}
        [junit] java.lang.Thread.State: TIMED_WAITING (sleeping)
        [junit] at java.lang.Thread.sleep(Native Method)
        [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:245)
        [junit] at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
        [junit] at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        [junit] at org.apache.pig.PigServer.storeEx(PigServer.java:996)
        [junit] at org.apache.pig.PigServer.store(PigServer.java:963)
        [junit] at org.apache.pig.PigServer.openIterator(PigServer.java:876)
        [junit] at org.apache.pig.test.TestAccumulator.testAccumBasic(TestAccumulator.java:150)
        {code}

        This is because test jobs are never finished in the mini cluster. The reason why test jobs are never finished is because they fail with a ClassNotFound exception while being executed.

        In fact, this is a regression of HADOOP-6963 where hadoop introduced dependency on Apache Commons IO library:

        {code:title=FileUtil.java}
        isSymLink = org.apache.commons.io.FileUtils.isSymlink(allFiles[i]);
        {code}

        But the Apache Commons IO library is missing in Pig, so test jobs keep failing in the mini cluster until timeout.

        I am fixing this issue by adding commons-io-2.3.jar to ivy.xml and library.properties. (Please see the attached patch.)
        Cheolsoo Park made changes -
        Attachment PIG-2700.patch [ 12527567 ]
        Hide
        Cheolsoo Park added a comment -

        Please ignore PIG-2460. It was uploaded accidentally.

        Show
        Cheolsoo Park added a comment - Please ignore PIG-2460 . It was uploaded accidentally.
        Cheolsoo Park made changes -
        Attachment PIG-2460.patch [ 12527565 ]
        Hide
        Cheolsoo Park added a comment -

        I also found that 'testReducerNumEstimation' and 'classLoaderTest' have to be skipped for MR2. But since the Hadoop version is 2.0.* instead of 0.23.*, isHadoop23() returns false, and these tests are executed and fail.

        To address this problem, I added a new method isHadoop2_0() to Util.java.

        I updated the patch accordingly.

        Show
        Cheolsoo Park added a comment - I also found that 'testReducerNumEstimation' and 'classLoaderTest' have to be skipped for MR2. But since the Hadoop version is 2.0.* instead of 0.23.*, isHadoop23() returns false, and these tests are executed and fail. To address this problem, I added a new method isHadoop2_0() to Util.java. I updated the patch accordingly.
        Cheolsoo Park made changes -
        Attachment PIG-2700.2.patch [ 12527976 ]
        Hide
        Cheolsoo Park added a comment -

        PIG-2791 incorporates this issue, so I am closing it.

        Show
        Cheolsoo Park added a comment - PIG-2791 incorporates this issue, so I am closing it.
        Cheolsoo Park made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Duplicate [ 3 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development