Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29160

Event log file is written without specific charset which should be ideally UTF-8

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Spark Core
    • None

    Description

      This issue is from observation by vanzin : https://github.com/apache/spark/pull/25670#discussion_r325383512

      Quoting his comment here:

      This is a long standing bug in the original code, but this should be explicitly setting the charset to UTF-8 (using new PrintWriter(new OutputStreamWriter(...)).

      The reader side should too, although doing that now could potentially break old logs... we should open a bug for this.

      While EventLoggingListener writes to UTF-8 properly when converting to byte[] before writing, it doesn't deal with charset in logEvent().

      It should be fixed, but as Marcelo said, we also need to be aware of potential broken of reading old logs.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: