Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22959

Configuration to select the modules for daemon and worker in PySpark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • PySpark
    • None

    Description

      We are now forced to use pyspark/daemon.py and pyspark/worker.py in PySpark tests.

      This doesn't allow a custom modification for it and it's sometimes hard to debug what happens in Python worker processes.

      This is actually related with SPARK-7721 too as somehow Coverage is unable to detect the coverage from os.fork. If we have some custom fixes to force the coverage, it works fine.

      This is also related with SPARK-20368. This JIRA describes Sentry supports which (roughly) needs some changes within worker side. With this simple workaround, advanced users will be able to do a lot of pluggable workarounds.

      As an example, let's say if I configure the module coverage_daemon and had coverage_daemon.py in the python path:

      import os
      
      from pyspark import daemon
      
      
      if "COVERAGE_PROCESS_START" in os.environ:
          from pyspark.worker import main
      
          def _cov_wrapped(*args, **kwargs):
              import coverage
              cov = coverage.coverage(
                  config_file=os.environ["COVERAGE_PROCESS_START"])
              cov.start()
              try:
                  main(*args, **kwargs)
              finally:
                  cov.stop()
                  cov.save()
          daemon.worker_main = _cov_wrapped
      
      
      if __name__ == '__main__':
          daemon.manager()
      

      I can leave the main code intact but do some workarounds.

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: