Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4467

Random Walk broken because of unmet dependency on commons-math

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabels
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.6, 1.7.2
    • 1.7.3, 1.8.1, 2.0.0
    • test
    • None

    Description

      When trying to run the Random Walk with LongEach.xml module, I hit a failure once we reach the Shard.xml step:

      16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
      java.lang.Exception: Error running node Shard.xml
      	at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
      	at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
      	at org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.accumulo.start.Main$2.run(Main.java:157)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.Exception: Error running node shard.BulkInsert
      	at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
      	at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
      	at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
      	... 1 more
      Caused by: java.lang.Exception: Failed to run map/red verify
      	at org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
      	at org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
      	... 9 more
      

      Digging into YARN to see why the MR job became unhappy, I see the following:

      Error: java.lang.ClassNotFoundException: org.apache.commons.math.stat.descriptive.SummaryStatistics at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.accumulo.core.file.rfile.RFile$Writer.<init>(RFile.java:310) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127) at org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106) at org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78) at org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172) at org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
      

      It looks like this commit introduced a dependency on the commons-math JAR at runtime (in the RFiles Writer class), but tests weren't updated to ensure that the same dependency would be put onto the classpath of MR jobs submitted by Random Walk.

      Props to Sean Busbey for helping to figure out the root cause here. On a separate note, we may want to start running this test before releases, as it appears this regression also snuck into 1.8.0 and at least one 1.6 release (though, since I don't have any easy way to test this against non-1.7.2 cluster, I'm limiting the affects versions to what I've confirmed myself). Ping Keith Turner, who might know the simplest way to fix this.

      Attachments

        1. ACCUMULO-4467-1.8.v1.patch
          2 kB
          Sean Busbey
        2. ACCUMULO-4467-1.7.v1.patch
          1 kB
          Sean Busbey
        3. ACCUMULO-4467-1.6.v1.patch
          1 kB
          Sean Busbey

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            busbey Sean Busbey Assign to me
            dimaspivak Dima Spivak
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 10m
              1h 10m

              Slack

                Issue deployment