Description
We are now forced to use pyspark/daemon.py and pyspark/worker.py in PySpark tests.
This doesn't allow a custom modification for it and it's sometimes hard to debug what happens in Python worker processes.
This is actually related with SPARK-7721 too as somehow Coverage is unable to detect the coverage from os.fork. If we have some custom fixes to force the coverage, it works fine.
This is also related with SPARK-20368. This JIRA describes Sentry supports which (roughly) needs some changes within worker side. With this simple workaround, advanced users will be able to do a lot of pluggable workarounds.
As an example, let's say if I configure the module coverage_daemon and had coverage_daemon.py in the python path:
import os from pyspark import daemon if "COVERAGE_PROCESS_START" in os.environ: from pyspark.worker import main def _cov_wrapped(*args, **kwargs): import coverage cov = coverage.coverage( config_file=os.environ["COVERAGE_PROCESS_START"]) cov.start() try: main(*args, **kwargs) finally: cov.stop() cov.save() daemon.worker_main = _cov_wrapped if __name__ == '__main__': daemon.manager()
I can leave the main code intact but do some workarounds.
Attachments
Issue Links
- is duplicated by
-
SPARK-20368 Support Sentry on PySpark workers
- Resolved
- links to