Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48330

Fix the python streaming data source timeout issue for large trigger interval

    XMLWordPrintableJSON

Details

    Description

      Currently we run long running python worker process for python streaming source and sink to perform planning, commit and abort in driver side. Testing indicate that current implementation cause connection timeout error when streaming query has large trigger interval

      For python streaming source, keep the long running worker archaetecture but set the socket timeout to be infinity to avoid timeout error.

      For python streaming sink, since StreamingWrite is also created per microbatch in scala side, long running worker cannot be attached to s StreamingWrite instance. Therefore we abandon the long running worker architecture, simply call commit() or abort() and exit the worker and allow spark to reuse worker for us.

      Attachments

        Issue Links

          Activity

            People

              Chaoqin Chaoqin Li
              Chaoqin Chaoqin Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: