Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-774

Logging does not print to stdout or stderr correctly on PySpark through Livy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 0.7.0
    • 0.9.0
    • API
    • None

    Description

      Summary

      When using PySpark through Livy on Zeppelin or Jupyter Notebook, or Linux curl,  For the 1st time, it could print out the log to stdout or stderr. But for the 2nd time and afterwards, it will show the error stack:  ValueError: I/O operation on closed file

      If we use PySpark CLI on the master node, it works well, you could check the attachment: Works_on_PySpark_CLI.png

      Reproduce Step

      In Zeppelin using Livy as interpreter

      %pyspark
      
      import sys
      import logging;
      
      // OUTPUT
      Spark Application Id: application_1591899500515_0002
      
      

      When the 1st time, we try to print log to stdout or stderr, it works well.

      %pyspark
      
      logger = logging.getLogger("log_example")
      logger.setLevel(logging.ERROR)
      ch = logging.StreamHandler(sys.stderr)
      ch.setLevel(logging.ERROR)
      logger.addHandler(ch)
      logger.error("test error!")
      
      // OUTPUT is expected
      test error!

      When we try to print log to stdout or stderr 2nd time and afterwards, it will show the error stack.

      %pyspark
      
      logger.error("test error again!")
      
      // OUTPUT showing error stack
      --- Logging error ---
      Traceback (most recent call last):
        File "/usr/lib64/python3.7/logging/__init__.py", line 1028, in emit
          stream.write(msg + self.terminator)
        File "/tmp/1262710270598062870", line 534, in write
          super(UnicodeDecodingStringIO, self).write(s)
      ValueError: I/O operation on closed file
      Call stack:
        File "/tmp/1262710270598062870", line 714, in <module>
          sys.exit(main())
        File "/tmp/1262710270598062870", line 686, in main
          response = handler(content)
        File "/tmp/1262710270598062870", line 318, in execute_request
          result = node.execute()
        File "/tmp/1262710270598062870", line 229, in execute
          exec(code, global_dict)
        File "<stdin>", line 1, in <module>
      Message: 'test error again!'

      For Jupyter notebook, or Linux curl command, they got the same error. You could check the attachments:

      1. Zeppelin_use_Livy_bug.png

      2. JupyterNotebook_use_Livy_bug.png

      3. LinuxCurl_use_Livy_error.png

       

       

      Attachments

        1. JupyterNotebook_use_Livy_bug.png
          193 kB
          Chao Gao
        2. LinuxCurl_use_Livy_error.png
          762 kB
          Chao Gao
        3. Works_on_PySpark_CLI.png
          206 kB
          Chao Gao
        4. Zeppelin_use_Livy_bug.png
          473 kB
          Chao Gao

        Activity

          People

            Unassigned Unassigned
            chaoga Chao Gao
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: