Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29160

Event log file is written without specific charset which should be ideally UTF-8

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      This issue is from observation by Marcelo Masiero Vanzin : https://github.com/apache/spark/pull/25670#discussion_r325383512

      Quoting his comment here:

      This is a long standing bug in the original code, but this should be explicitly setting the charset to UTF-8 (using new PrintWriter(new OutputStreamWriter(...)).

      The reader side should too, although doing that now could potentially break old logs... we should open a bug for this.

      While EventLoggingListener writes to UTF-8 properly when converting to byte[] before writing, it doesn't deal with charset in logEvent().

      It should be fixed, but as Marcelo said, we also need to be aware of potential broken of reading old logs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                kabhwan Jungtaek Lim
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: