This issue is from observation by Marcelo Masiero Vanzin : https://github.com/apache/spark/pull/25670#discussion_r325383512
Quoting his comment here:
This is a long standing bug in the original code, but this should be explicitly setting the charset to UTF-8 (using new PrintWriter(new OutputStreamWriter(...)).
The reader side should too, although doing that now could potentially break old logs... we should open a bug for this.
While EventLoggingListener writes to UTF-8 properly when converting to byte before writing, it doesn't deal with charset in logEvent().
It should be fixed, but as Marcelo said, we also need to be aware of potential broken of reading old logs.