[SPARK-39361] Stop using Log4J2's extended throwable logging pattern in default logging configurations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0
Component/s: Spark Core
Labels:
None

Description

This PR addresses a performance problem in Log4J 2 related to exception logging: in certain scenarios I observed that Log4J2's exception stacktrace logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called "extended exception", which enriches the regular stacktrace string with information on the name of the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This isn't cheap, but this information is cached and reused in future exception logging calls. In certain scenarios involving runtime-generated classes, this lookup will fail and the failed lookup result will not be cached. As a result, expensive classloading operations will be performed every time such an exception is logged. In addition to being very slow, these operations take out a lock on the classloader and thus can cause severe lock contention if multiple threads are logging errors. This issue is described in more detail in a comment on a Log4J2 JIRA and in a linked blogpost: https://issues.apache.org/jira/browse/LOG4J2-2391?focusedCommentId=16667140&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16667140 . Spark frequently uses generated classes and lambdas and thus Spark executor logs will almost always trigger this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging pattern then Log4J2 will add this "extended exception" pattern (see PatternLayout's alwaysWriteExceptions flag in Log4J's documentation, plus the code implementing that flag in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each pattern layout includes an explicit %ex so that it uses the normal (non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should probably just fix the source of the exceptions, I think it's still a good idea for us to try to fix this out-of-the-box performance difference so that users' existing workloads do not regress when upgrading to 3.3.0.

Attachments

Issue Links

links to

[Github] Pull Request #36747 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/Jun/22 02:54

Updated:: 02/Jun/22 16:29

Resolved:: 02/Jun/22 16:29