Details
-
Sub-task
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
-
Python 3.5
Linux
apache-beam 2.12.0 & 2.13.0
dill 0.2.9
Description
If you set save_main_session = True and have a logging.Logger instance in your _main_ module, calling a logger method after Pipeline.run has been called, the process will hang and never exit.
Python 3 Pipeline that reproduces the error (code also available at https://gist.github.com/joar/f021db55eca4fa9e9fd7dfd67cc011b9):
import logging import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions _log = logging.getLogger(__name__) def main(argv=None): logging.basicConfig(level=logging.INFO) pipeline_options = PipelineOptions(argv) setup_options = pipeline_options.view_as(SetupOptions) # type: SetupOptions setup_options.save_main_session = True _log.info("Running pipeline") with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p: p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x)) print(""" Call to _log.info will now deadlock, since the logging handler's threading.RLock() has been passed through dill. When you press Ctrl-C, the traceback should confirm that the process is stuck at: File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire self.lock.acquire() """) _log.info("Pipeline done") print("Launching nukes") if __name__ == '__main__': main()
I have opened an issue with dill as well: https://github.com/uqfoundation/dill/issues/321
This issue does (sadly) not happen on Python 2.
Just to be clear: A workaround is to not use save_main_session = True.