Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-4234

Order By error after Group By in Spark

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:

      Description

      Trying to sort after a Group By produces the following error:

      2014-10-14 16:04:55,189 [Executor task launch worker-0] ERROR org.apache.spark.executor.Executor - Exception in task 3.0 in stage 0.0 (TID 4)
      java.io.NotSerializableException: org.apache.pig.data.SelfSpillBag$MemoryLimits
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
      at java.util.ArrayList.writeObject(ArrayList.java:742)
      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
      at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
      at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
      at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
      at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
      at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
      at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
      at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
      at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      Operations like for instance Rank By are not possible with this error, since it needs to sort right after grouping the data.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Carlos Balduz Carlos Balduz
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: