Pig
  1. Pig
  2. PIG-4083

TestAccumuloPigCluster always failed with timeout error

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestAccumuloPigCluster always failed with timeout error.
      Tried with sun jdk 6 and sun jdk 7.

      1. PIG-4083.patch
        0.4 kB
        fang fang chen
      2. pig_junit_tmp827919480.tar.gz
        76 kB
        fang fang chen
      3. PIG-4083-debug.patch
        1 kB
        Josh Elser

        Issue Links

          Activity

          Hide
          Josh Elser added a comment -

          I'll try to look into this, fang fang chen. Any logs or other information you have would be helpful.

          Show
          Josh Elser added a comment - I'll try to look into this, fang fang chen . Any logs or other information you have would be helpful.
          Hide
          Josh Elser added a comment -

          This is passing for me using Oracle 1.7.0_55. It's possible that the MiniAccumuloCluster being started by the test is failing to start for a variety of reasons (lack of memory probably the most common). I can provide a quick patch which will add some extra logging information if you want to help me debug this.

          Show
          Josh Elser added a comment - This is passing for me using Oracle 1.7.0_55. It's possible that the MiniAccumuloCluster being started by the test is failing to start for a variety of reasons (lack of memory probably the most common). I can provide a quick patch which will add some extra logging information if you want to help me debug this.
          Hide
          fang fang chen added a comment -

          Hi Josh,
          For your 1# comment:
          Here is all the output from log file. I did not find any useful information for debug.
          Testcase: test took 0.001 sec
          Caused an ERROR
          Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
          junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout.

          For your 2# comment:
          Yes, please provide the quick patch for debugging. Thanks

          Show
          fang fang chen added a comment - Hi Josh, For your 1# comment: Here is all the output from log file. I did not find any useful information for debug. Testcase: test took 0.001 sec Caused an ERROR Timeout occurred. Please note the time in the report does not reflect the time until the timeout. junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout. For your 2# comment: Yes, please provide the quick patch for debugging. Thanks
          Hide
          fang fang chen added a comment -

          BTW, I was uring sun jdk 1.7.0_60/1.6.0_45 and ibm jdk 1.6.0/1.7.0. All failed. If this is caused by environment, I want to know what caused this issue and how to resolve. This would be helpful if pig can provide this information. Thanks.

          Show
          fang fang chen added a comment - BTW, I was uring sun jdk 1.7.0_60/1.6.0_45 and ibm jdk 1.6.0/1.7.0. All failed. If this is caused by environment, I want to know what caused this issue and how to resolve. This would be helpful if pig can provide this information. Thanks.
          Hide
          Josh Elser added a comment -

          Sounds good, I'll get a patch with some extra debugging here for you. Out of curiosity, does it fail quickly?

          Show
          Josh Elser added a comment - Sounds good, I'll get a patch with some extra debugging here for you. Out of curiosity, does it fail quickly?
          Hide
          Josh Elser added a comment -

          Ok, fang fang chen. You can apply this using patch -p1 PIG-4083-debug.patch.

          Then, run just the testcase ant test -Dtestcase=TestAccumuloPigCluster.

          After, please attach build/test/logs/TEST-org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster.txt.

          Also, in that same log file, you will also see a line that matches INFO org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster - Starting MiniAccumuloCluster in ..., where ... is some directory on your local filesystem. That directory is where the MiniAccumuloCluster was started from. Please attach the contents of the logs directory beneath the temporary directory path, as well.

          Those two logs should help me better understand why this test was failing for you. Thanks.

          Show
          Josh Elser added a comment - Ok, fang fang chen . You can apply this using patch -p1 PIG-4083 -debug.patch . Then, run just the testcase ant test -Dtestcase=TestAccumuloPigCluster . After, please attach build/test/logs/TEST-org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster.txt . Also, in that same log file, you will also see a line that matches INFO org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster - Starting MiniAccumuloCluster in ... , where ... is some directory on your local filesystem. That directory is where the MiniAccumuloCluster was started from. Please attach the contents of the logs directory beneath the temporary directory path, as well. Those two logs should help me better understand why this test was failing for you. Thanks.
          Hide
          fang fang chen added a comment -

          Hi Josh, the test is not finished yet. Do you know how long it will last until timeout?

          I saw following log during ut:

          test-core:
          [mkdir] Created dir: /root/ff/git/pig/build/test/logs
          [mkdir] Created dir: /tmp/pig_junit_tmp827919480

          I assume "/tmp/pig_junit_tmp827919480" is the one what you want. Attach here.

          Show
          fang fang chen added a comment - Hi Josh, the test is not finished yet. Do you know how long it will last until timeout? I saw following log during ut: test-core: [mkdir] Created dir: /root/ff/git/pig/build/test/logs [mkdir] Created dir: /tmp/pig_junit_tmp827919480 I assume "/tmp/pig_junit_tmp827919480" is the one what you want. Attach here.
          Hide
          Josh Elser added a comment -

          The timeout is set globally for the Pig Junit Ant task for 2 hours. This is defined in the build.xml and you can change it. This failure case should occur in a minute or so – very quickly.

          Looking at the logs you provided showed that when the Accumulo processes attempted to start, they had the incorrect Thrift jars on the classpath which caused:

          Caused by: java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
          

          The classpath was also printed out in the same log files that you attached in that tarball. It appears that there is only libthrift-0.9.0 included, which is correct. The problem now is trying to figure out what artifact included on the classpath is also bundling Thrift classes. I ran a diff against the classpaths that your accumulo test was running with and mine, and found the offender: /root/ff/git/pig/lib/hive-exec-0.8.0.jar. That old version of Hive is shading in Thrift classes (from 0.6 IIRC) which is where the errors are coming from.

          Your classpath actually has a bunch more entries than mine, notably a dozen or so all from /root/ff/git/pig/lib/. I'll try to poke around – I forget where/how that lib directory is used. I'll also see what I can do about the underlying AccumuloMiniCluster issue. We may have a better fix in a newer version of Accumulo we could upgrade to, otherwise, we might be able to determine when the processes immediately died and avoid the infinite loop of the test code trying to connect to an Accumulo that isn't alive.

          Show
          Josh Elser added a comment - The timeout is set globally for the Pig Junit Ant task for 2 hours. This is defined in the build.xml and you can change it. This failure case should occur in a minute or so – very quickly. Looking at the logs you provided showed that when the Accumulo processes attempted to start, they had the incorrect Thrift jars on the classpath which caused: Caused by: java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B The classpath was also printed out in the same log files that you attached in that tarball. It appears that there is only libthrift-0.9.0 included, which is correct. The problem now is trying to figure out what artifact included on the classpath is also bundling Thrift classes. I ran a diff against the classpaths that your accumulo test was running with and mine, and found the offender: /root/ff/git/pig/lib/hive-exec-0.8.0.jar . That old version of Hive is shading in Thrift classes (from 0.6 IIRC) which is where the errors are coming from. Your classpath actually has a bunch more entries than mine, notably a dozen or so all from /root/ff/git/pig/lib/ . I'll try to poke around – I forget where/how that lib directory is used. I'll also see what I can do about the underlying AccumuloMiniCluster issue. We may have a better fix in a newer version of Accumulo we could upgrade to, otherwise, we might be able to determine when the processes immediately died and avoid the infinite loop of the test code trying to connect to an Accumulo that isn't alive.
          Hide
          fang fang chen added a comment -

          Hi Josh,

          Thanks for pointing out the root cause. Do you know how to fix this? Or is there any work around for this?
          If this is caused by thrift version(hive-0.8.0 bring), I think this issue should also happened in your environment. But it is not. Is there any special step I should do to avoid this issue?

          Thanks

          Show
          fang fang chen added a comment - Hi Josh, Thanks for pointing out the root cause. Do you know how to fix this? Or is there any work around for this? If this is caused by thrift version(hive-0.8.0 bring), I think this issue should also happened in your environment. But it is not. Is there any special step I should do to avoid this issue? Thanks
          Hide
          Josh Elser added a comment -

          I have no idea what is pulling in that jar in the lib directory. It doesn't for me. I can only really give you recommendation to start from a clean repository and try to build again. If you can figure out what placed that jar there, we can see if there's something we can do to address the test failure.

          Show
          Josh Elser added a comment - I have no idea what is pulling in that jar in the lib directory. It doesn't for me. I can only really give you recommendation to start from a clean repository and try to build again. If you can figure out what placed that jar there, we can see if there's something we can do to address the test failure.
          Hide
          fang fang chen added a comment -

          After upgrade hive from 0.8.0 to 0.13.1. The test case passed. Attach the patch which is for pig-0.13.

          Show
          fang fang chen added a comment - After upgrade hive from 0.8.0 to 0.13.1. The test case passed. Attach the patch which is for pig-0.13.
          Hide
          Daniel Dai added a comment -

          I see. This will not happen on trunk since we switch to use hive-exec-core.jar which should not wrap thrift. I can check the patch into Pig 0.13 branch if it solves your issue.

          Show
          Daniel Dai added a comment - I see. This will not happen on trunk since we switch to use hive-exec-core.jar which should not wrap thrift. I can check the patch into Pig 0.13 branch if it solves your issue.
          Hide
          Daniel Dai added a comment -

          Patch committed to 0.13 branch. Thanks Fang Fang!

          Show
          Daniel Dai added a comment - Patch committed to 0.13 branch. Thanks Fang Fang!

            People

            • Assignee:
              fang fang chen
              Reporter:
              fang fang chen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development