Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4467

Random Walk broken because of unmet dependency on commons-math

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.6, 1.7.2
    • Fix Version/s: 1.7.3, 1.8.1, 2.0.0
    • Component/s: test
    • Labels:
      None

      Description

      When trying to run the Random Walk with LongEach.xml module, I hit a failure once we reach the Shard.xml step:

      16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
      java.lang.Exception: Error running node Shard.xml
      	at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
      	at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
      	at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.accumulo.start.Main$2.run(Main.java:157)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.Exception: Error running node shard.BulkInsert
      	at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
      	at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
      	at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
      	... 1 more
      Caused by: java.lang.Exception: Failed to run map/red verify
      	at org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
      	at org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
      	... 9 more
      

      Digging into YARN to see why the MR job became unhappy, I see the following:

      Error: java.lang.ClassNotFoundException: org.apache.commons.math.stat.descriptive.SummaryStatistics at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.accumulo.core.file.rfile.RFile$Writer.<init>(RFile.java:310) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106) at org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78) at org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172) at org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
      

      It looks like this commit introduced a dependency on the commons-math JAR at runtime (in the RFiles Writer class), but tests weren't updated to ensure that the same dependency would be put onto the classpath of MR jobs submitted by Random Walk.

      Props to Sean Busbey for helping to figure out the root cause here. On a separate note, we may want to start running this test before releases, as it appears this regression also snuck into 1.8.0 and at least one 1.6 release (though, since I don't have any easy way to test this against non-1.7.2 cluster, I'm limiting the affects versions to what I've confirmed myself). Ping Keith Turner, who might know the simplest way to fix this.

      1. ACCUMULO-4467-1.8.v1.patch
        2 kB
        Sean Busbey
      2. ACCUMULO-4467-1.7.v1.patch
        1 kB
        Sean Busbey
      3. ACCUMULO-4467-1.6.v1.patch
        1 kB
        Sean Busbey

        Activity

        Hide
        busbey Sean Busbey added a comment -

        Dima later pointed out to me that ACCUMULO-4354 probably masks this on 1.8+, since YARN leaks commons-math3 into the classpath of MR jobs.

        Show
        busbey Sean Busbey added a comment - Dima later pointed out to me that ACCUMULO-4354 probably masks this on 1.8+, since YARN leaks commons-math3 into the classpath of MR jobs.
        Show
        elserj Josh Elser added a comment - I think https://github.com/apache/accumulo/blob/5cb5b9372103761c829403c03007b9f53241400f/test/src/main/java/org/apache/accumulo/test/randomwalk/Node.java#L94 is the culprit that needs to change.
        Hide
        elserj Josh Elser added a comment -

        since YARN leaks commons-math3 into the classpath of MR jobs

        Oh good.

        Show
        elserj Josh Elser added a comment - since YARN leaks commons-math3 into the classpath of MR jobs Oh good.
        Hide
        ctubbsii Christopher Tubbs added a comment -

        Oh wow, that's terribly obscure. RW needs some work. That's just messy.

        Show
        ctubbsii Christopher Tubbs added a comment - Oh wow, that's terribly obscure. RW needs some work. That's just messy.
        Hide
        kturner Keith Turner added a comment -

        Dima Spivak glad to see you are running RW. I had wanted to run it before 1.8.0 but did not get to it.

        Show
        kturner Keith Turner added a comment - Dima Spivak glad to see you are running RW. I had wanted to run it before 1.8.0 but did not get to it.
        Hide
        elserj Josh Elser added a comment -

        Ack, I forgot to write this up yesterday. I was musing about how to fix this once and for all. I think I stole the idea from hbase mapredcp. We can encapsulate what our runtime dependencies are for mapreduce in one place, and replace all other occurrences with a call to accumulo mapredcp. I would guess that you probably had the same though though, Sean Busbey

        Show
        elserj Josh Elser added a comment - Ack, I forgot to write this up yesterday. I was musing about how to fix this once and for all. I think I stole the idea from hbase mapredcp . We can encapsulate what our runtime dependencies are for mapreduce in one place, and replace all other occurrences with a call to accumulo mapredcp . I would guess that you probably had the same though though, Sean Busbey
        Hide
        busbey Sean Busbey added a comment -

        My plan for this was to just fix the immediate problem of RandomWalk not including commons-math, since AFAICT we hadn't centralized the needed dependencies anywhere. I'm not sure of the full needed scope of that command (e.g. it probably needs to handle classpath for configured iterators for offline table scans) so I'd much rather that kind of improvement go in a follow on.

        Show
        busbey Sean Busbey added a comment - My plan for this was to just fix the immediate problem of RandomWalk not including commons-math, since AFAICT we hadn't centralized the needed dependencies anywhere. I'm not sure of the full needed scope of that command (e.g. it probably needs to handle classpath for configured iterators for offline table scans) so I'd much rather that kind of improvement go in a follow on.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 4s ACCUMULO-4467 does not apply to master. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829666/ACCUMULO-4467-1.6.v1.patch
        JIRA Issue ACCUMULO-4467
        Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/42/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 4s ACCUMULO-4467 does not apply to master. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829666/ACCUMULO-4467-1.6.v1.patch JIRA Issue ACCUMULO-4467 Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/42/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        busbey Sean Busbey added a comment -

        attaching a patch for 1.6 and 1.7, depending on which branch folks want to test.

        Show
        busbey Sean Busbey added a comment - attaching a patch for 1.6 and 1.7, depending on which branch folks want to test.
        Hide
        elserj Josh Elser added a comment -

        My plan for this was to just fix the immediate problem of RandomWalk not including commons-math, since AFAICT we hadn't centralized the needed dependencies anywhere

        That's fine. My point was that we should consolidate it somewhere (this isn't the first time we've had problems like this). NBD, if these are separate tasks.

        -0 for 1.6 (let it die)
        +1 for 1.7
        -1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the patch won't work, the spirit of the change is still +1).

        Show
        elserj Josh Elser added a comment - My plan for this was to just fix the immediate problem of RandomWalk not including commons-math, since AFAICT we hadn't centralized the needed dependencies anywhere That's fine. My point was that we should consolidate it somewhere (this isn't the first time we've had problems like this). NBD, if these are separate tasks. -0 for 1.6 (let it die) +1 for 1.7 -1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the patch won't work, the spirit of the change is still +1).
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 6s ACCUMULO-4467 does not apply to 1.7. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829670/ACCUMULO-4467-1.7.v1.patch
        JIRA Issue ACCUMULO-4467
        Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/43/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 6s ACCUMULO-4467 does not apply to 1.7. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829670/ACCUMULO-4467-1.7.v1.patch JIRA Issue ACCUMULO-4467 Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/43/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        busbey Sean Busbey added a comment -

        -1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the patch won't work, the spirit of the change is still +1).

        on 1.8+ I was going to merge with -sours and then rely on MR including commons-math3 in the classpath. would you prefer I expressly include it in our libjars?

        Show
        busbey Sean Busbey added a comment - -1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the patch won't work, the spirit of the change is still +1). on 1.8+ I was going to merge with -sours and then rely on MR including commons-math3 in the classpath. would you prefer I expressly include it in our libjars?
        Hide
        elserj Josh Elser added a comment -

        would you prefer I expressly include it in our libjars?

        I think this would be less error prone. e.g. what happens if some user happens to be using some weirdo version of YARN that doesn't provide this jar. Happy to be told why this isn't accurate though. I have not dug into this to the depth that I assume you and Dima have.

        Show
        elserj Josh Elser added a comment - would you prefer I expressly include it in our libjars? I think this would be less error prone. e.g. what happens if some user happens to be using some weirdo version of YARN that doesn't provide this jar. Happy to be told why this isn't accurate though. I have not dug into this to the depth that I assume you and Dima have.
        Hide
        busbey Sean Busbey added a comment -

        that's a good point. hopefully conflicts on commons-math3 version compatibility are rare.

        Show
        busbey Sean Busbey added a comment - that's a good point. hopefully conflicts on commons-math3 version compatibility are rare.
        Hide
        busbey Sean Busbey added a comment - - edited

        patch for branch 1.8, with a merge sours from 1.7 on the first commit. (which I think means just look at the second commit)

        Show
        busbey Sean Busbey added a comment - - edited patch for branch 1.8, with a merge sours from 1.7 on the first commit. (which I think means just look at the second commit)
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 3s ACCUMULO-4467 does not apply to 1.8. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829679/ACCUMULO-4467-1.8.v1.patch
        JIRA Issue ACCUMULO-4467
        Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/44/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 3s ACCUMULO-4467 does not apply to 1.8. Rebase required? Wrong Branch? See http://accumulo.apache.org/git.html#contributors for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829679/ACCUMULO-4467-1.8.v1.patch JIRA Issue ACCUMULO-4467 Console output https://builds.apache.org/job/PreCommit-ACCUMULO-Build/44/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        elserj Josh Elser added a comment -

        +1

        Show
        elserj Josh Elser added a comment - +1
        Hide
        busbey Sean Busbey added a comment -

        -0 for 1.6 (let it die)

        Christopher Tubbs has now cleaned up the 1.6 branch, so I'll skip 1.6.

        Show
        busbey Sean Busbey added a comment - -0 for 1.6 (let it die) Christopher Tubbs has now cleaned up the 1.6 branch, so I'll skip 1.6.
        Hide
        ctubbsii Christopher Tubbs added a comment -

        Yeah, the last commit on that branch was included in the 1.6.6 tag, and merged forward. If we had to, we could branch again from the tag, to apply fixes... but I don't think we should do that unless there's a really serious issue to address. So happy to have immediately dropped Java 6 and LaTeX from my build/test environments...

        Show
        ctubbsii Christopher Tubbs added a comment - Yeah, the last commit on that branch was included in the 1.6.6 tag, and merged forward. If we had to, we could branch again from the tag, to apply fixes... but I don't think we should do that unless there's a really serious issue to address. So happy to have immediately dropped Java 6 and LaTeX from my build/test environments...

          People

          • Assignee:
            busbey Sean Busbey
            Reporter:
            dimaspivak Dima Spivak
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 10m
              1h 10m

                Development