Description
Summary
When using PySpark through Livy on Zeppelin or Jupyter Notebook, or Linux curl, For the 1st time, it could print out the log to stdout or stderr. But for the 2nd time and afterwards, it will show the error stack: ValueError: I/O operation on closed file
If we use PySpark CLI on the master node, it works well, you could check the attachment: Works_on_PySpark_CLI.png
Reproduce Step
In Zeppelin using Livy as interpreter
%pyspark import sys import logging; // OUTPUT Spark Application Id: application_1591899500515_0002
When the 1st time, we try to print log to stdout or stderr, it works well.
%pyspark logger = logging.getLogger("log_example") logger.setLevel(logging.ERROR) ch = logging.StreamHandler(sys.stderr) ch.setLevel(logging.ERROR) logger.addHandler(ch) logger.error("test error!") // OUTPUT is expected test error!
When we try to print log to stdout or stderr 2nd time and afterwards, it will show the error stack.
%pyspark logger.error("test error again!") // OUTPUT showing error stack --- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.7/logging/__init__.py", line 1028, in emit stream.write(msg + self.terminator) File "/tmp/1262710270598062870", line 534, in write super(UnicodeDecodingStringIO, self).write(s) ValueError: I/O operation on closed file Call stack: File "/tmp/1262710270598062870", line 714, in <module> sys.exit(main()) File "/tmp/1262710270598062870", line 686, in main response = handler(content) File "/tmp/1262710270598062870", line 318, in execute_request result = node.execute() File "/tmp/1262710270598062870", line 229, in execute exec(code, global_dict) File "<stdin>", line 1, in <module> Message: 'test error again!'
For Jupyter notebook, or Linux curl command, they got the same error. You could check the attachments:
1. Zeppelin_use_Livy_bug.png
2. JupyterNotebook_use_Livy_bug.png
3. LinuxCurl_use_Livy_error.png