Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12472

OOM when sort a table and save as parquet

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • SQL

    Description

      t = sqlContext.table('store_sales')
      t.unionAll(t).coalesce(2).sortWithinPartitions(t[0]).write.partitionBy('ss_sold_date_sk').parquet("/tmp/ttt")
      
      15/12/21 14:35:52 WARN TaskSetManager: Lost task 1.0 in stage 25.0 (TID 96, 192.168.0.143): java.lang.OutOfMemoryError: Java heap space
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:86)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:32)
      	at org.apache.spark.util.collection.TimSort$SortState.ensureCapacity(TimSort.java:951)
      	at org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:699)
      	at org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
      	at org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
      	at org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
      	at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
      	at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:226)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
      	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
      	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
      	at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:327)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:342)
      	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
      	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
      	at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
      	at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:88)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            davies Davies Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: