Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2412

Race leading to IndexOutOfBoundsException when querying for buffer while releasing SpillablePartition

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.9, 0.10.0
    • Fix Version/s: 0.9.1, 0.10.0
    • Component/s: Runtime / Coordination
    • Labels:
      None

      Description

      When running a code as simple as:

      		ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
      
      		DataSet<Edge<String, NullValue>> edges = getEdgesDataSet(env);
      		Graph<String, NullValue, NullValue> graph = Graph.fromDataSet(edges, env);
      
      		DataSet<Tuple2<String, Long>> degrees = graph.getDegrees();
      degrees.writeAsCsv(outputPath, "\n", " ");
      			env.execute();
      
      on the Freindster data set: https://snap.stanford.edu/data/com-Friendster.html; on 30 Wally nodes
       
      I get the following exception:
      java.lang.Exception: The data preparation for task 'CoGroup (CoGroup at inDegrees(Graph.java:701))' , caused an error: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: Fatal error at remote task manager 'wally028.cit.tu-berlin.de/130.149.249.38:53730'.
      	at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:471)
      	at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
      	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
      	at java.lang.Thread.run(Thread.java:722)
      Caused by: java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: Fatal error at remote task manager 'wally028.cit.tu-berlin.de/130.149.249.38:53730'.
      	at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:607)
      	at org.apache.flink.runtime.operators.RegularPactTask.getInput(RegularPactTask.java:1145)
      	at org.apache.flink.runtime.operators.CoGroupDriver.prepare(CoGroupDriver.java:98)
      	at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:466)
      	... 3 more
      Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: Fatal error at remote task manager 'wally028.cit.tu-berlin.de/130.149.249.38:53730'.
      	at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:784)
      Caused by: org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Fatal error at remote task manager 'wally028.cit.tu-berlin.de/130.149.249.38:53730'.
      	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:227)
      	at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:162)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
      	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
      	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
      	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
      	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
      	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:722)
      Caused by: java.io.IOException: Index: 133, Size: 0
      
      

      Code works fine for the twitter data set, for instance, which is bigger in size, but contains less vertices.

        Attachments

          Activity

            People

            • Assignee:
              uce Ufuk Celebi
              Reporter:
              andralungu Andra Lungu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: