Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1812

OutofMemoryError when group by on a non-existing field with 300k records (tweets)

    XMLWordPrintableJSON

Details

    Description

      The dataset is a sample tweet dataset provided by Cloudberry, which contains 324000 tweets (about 300M). When issuing the following query, I always get an OutofMemoryError.

      Query:

      select * from twitter.ds_tweet t
      group by t.test;
      

      Stacktrace:

      org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
      HYR0003: java.lang.OutOfMemoryError: Java heap space
      
      	at org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
      	at org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: java.lang.OutOfMemoryError: Java heap space
      	at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
      	at org.apache.hyracks.control.nc.Task.run(Task.java:330)
      	... 3 more
      Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.OutOfMemoryError: Java heap space
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
      	at org.apache.hyracks.control.nc.Task.run(Task.java:273)
      	... 3 more
      Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
      	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
      	... 5 more
      Caused by: java.lang.OutOfMemoryError: Java heap space
      	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
      	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
      	at org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
      	at org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
      	at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
      	at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
      	at org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
      	at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
      	at org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
      	at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
      	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
      	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
      	at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
      	at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
      	at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
      	at org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
      	at org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
      	at org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
      	at org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
      	at org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown Source)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown Source)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	... 3 more
      
      Feb 25, 2017 9:11:13 AM org.apache.asterix.api.http.servlet.APIServlet doPost
      SEVERE: Job failed on account of:
      HYR0003: java.lang.OutOfMemoryError: Java heap space
      
      org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
      HYR0003: java.lang.OutOfMemoryError: Java heap space
      
      	at org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
      	at org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: java.lang.OutOfMemoryError: Java heap space
      	at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
      	at org.apache.hyracks.control.nc.Task.run(Task.java:330)
      	... 3 more
      Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.OutOfMemoryError: Java heap space
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
      	at org.apache.hyracks.control.nc.Task.run(Task.java:273)
      	... 3 more
      Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
      	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
      	... 5 more
      Caused by: java.lang.OutOfMemoryError: Java heap space
      	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
      	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
      	at org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
      	at org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
      	at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
      	at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
      	at org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
      	at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
      	at org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
      	at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
      	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
      	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
      	at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
      	at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
      	at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
      	at org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
      	at org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
      	at org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
      	at org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
      	at org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown Source)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown Source)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	... 3 more
      

      Reproduce steps:
      1. Install a local AsterixDB cluster from https://asterixdb.apache.org/docs/0.9.0/install.html#Section1SingleMachineAsterixDBInstallation.
      2. Load sample data from CloudBerry.
      -2.1 Download CloudBerry project from https://github.com/ISG-ICS/cloudberry
      -2.2 Go to CloudBerry dir, and ingest sample tweets using "bin/ingestTwitterToLocalCluster.sh". You might need to change the Asterix Cluster IP address at line 23, and the cluster instance name at line 86.
      3. Issue the following SQL++ query:

      select * from twitter.ds_tweet t
      group by t.test;
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            luochen01 Chen Luo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: