Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1824

Optimize FlinkOpAtA to use upper triangular matrices

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 0.11.2
    • Fix Version/s: 0.12.0
    • Component/s: Flink
    • Labels:

      Description

      Optimize FlinkOpAtA to use upper triangular matrices (similar to what's being done in Spark backend).

      Presently dals fails on FlinkOpAtA computation with an OOM

      57766 [flink-akka.actor.default-dispatcher-5] ERROR akka.actor.ActorSystemImpl  - exception on LARS’ timer thread
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      57770 [flink-akka.actor.default-dispatcher-5] ERROR akka.actor.ActorSystemImpl  - Uncaught fatal error from thread [flink-scheduler-1] shutting down ActorSystem [flink]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      - dals *** FAILED ***
        org.apache.flink.runtime.client.JobTimeoutException: Timeout while waiting for JobManager answer. Job time exceeded 21474835 seconds
        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:136)
        at org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:423)
        at org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:409)
        at org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:401)
        at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:190)
        at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:90)
        at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:855)
        at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:638)
        at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:546)
        at org.apache.mahout.flinkbindings.blas.FlinkOpAtA$.slim(FlinkOpAtA.scala:53)
        ...
        Cause: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/$a#372851579]] after [21474835000 ms]
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
        at akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
        at akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
      
      

        Attachments

          Activity

            People

            • Assignee:
              Andrew_Palumbo Andrew Palumbo
              Reporter:
              smarthi Suneel Marthi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: