Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48717

Python foreachBatch streaming query cannot be stopped gracefully after pin thread mode is enabled and is running spark queries

    XMLWordPrintableJSON

Details

    Description

      Followup of https://issues.apache.org/jira/browse/SPARK-39218

       

      It only considered the InterruptedException is thrown when time.sleep(10) is intercepted. But when a spark query is executing:

      def func(batch_df, batch_id):
          batch_df.sparkSession.range(10000000).write.saveAsTable("oops")
          print(batch_df.count()) 

      the actual error would be:

       

      py4j.protocol.Py4JJavaError: An error occurred while calling o2141502.saveAsTable.  
      : org.apache.spark.SparkException: Job aborted.  
      <REDACTED>
      <REDACTED>
      <REDACTED>
      ...
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$2(StreamExecution.scala:262)  
      at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)  
      at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:262)  
      *Caused by: java.lang.InterruptedException  
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1000)*  
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1308)   

      We should also add consideration to this scenario 

      Attachments

        Issue Links

          Activity

            People

              WweiL Wei Liu
              WweiL Wei Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: