Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4785

When called with arguments referring column fields, PMOD throws NPE

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.2.0
    • SQL
    • None

    Description

      Reproducible when compiled with -Phive-0.13.1, -Phive0.12.0 is OK.

      Reproduction steps with hive/console:

      scala> loadTestTable("src")
      scala> sql("SELECT PMOD(key, 10) FROM src LIMIT 1").collect()
      ...
      14/12/08 15:11:31 INFO DAGScheduler: Job 0 failed: runJob at basicOperators.scala:141, took 0.235788 s
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NullPointerException
              at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseNumeric.initialize(GenericUDFBaseNumeric.java:109)
              at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
              at org.apache.spark.sql.hive.HiveGenericUdf.returnInspector$lzycompute(hiveUdfs.scala:156)
              at org.apache.spark.sql.hive.HiveGenericUdf.returnInspector(hiveUdfs.scala:155)
              at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:174)
              at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:92)
              at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
              at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
              at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
              at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
              at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
              at scala.collection.AbstractIterator.to(Iterator.scala:1157)
              at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
              at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
              at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
              at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
              at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
              at org.apache.spark.sql.execution.Limit$$anonfun$4.apply(basicOperators.scala:141)
              at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
              at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
              at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
              at org.apache.spark.scheduler.Task.run(Task.scala:56)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      
      Driver stacktrace:
              at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
              at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
              at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
              at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
              at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
              at scala.Option.foreach(Option.scala:236)
              at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
              at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
              at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
              at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
              at akka.actor.ActorCell.invoke(ActorCell.scala:487)
              at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
              at akka.dispatch.Mailbox.run(Mailbox.scala:220)
              at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
              at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
              at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
              at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
              at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      

      This issue is introduced in PR #3109, where GenericUDF.initialize was replaced by GenericUDF.initializeAndFoldConstants, which then calls GenericUDFBaseNumeric.initialize in case of PMOD. However, GenericUDFBaseNumeric.initialize needs to access the current SessionState [1], which only exists on the driver side. Thus, when executed on executor side, an NPE is thrown.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment