[SPARK-29160] Event log file is written without specific charset which should be ideally UTF-8 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: Spark Core
Labels:
None

Description

This issue is from observation by vanzin : https://github.com/apache/spark/pull/25670#discussion_r325383512

Quoting his comment here:

This is a long standing bug in the original code, but this should be explicitly setting the charset to UTF-8 (using new PrintWriter(new OutputStreamWriter(...)).

The reader side should too, although doing that now could potentially break old logs... we should open a bug for this.

While EventLoggingListener writes to UTF-8 properly when converting to byte[] before writing, it doesn't deal with charset in logEvent().

It should be fixed, but as Marcelo said, we also need to be aware of potential broken of reading old logs.

Attachments

Issue Links

is related to

SPARK-38411 Use UTF-8 when doMergeApplicationListingInternal reads event logs

Resolved

links to

GitHub Pull Request #25845

Activity

People

Assignee:: Unassigned

Reporter:: Jungtaek Lim

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Sep/19 22:51

Updated:: 12/Dec/22 18:10

Resolved:: 21/Sep/19 15:00