Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.0.0
-
None
Description
Spark can generate a lot of logs when running in YARN mode. The problem is already a little bad with normal configuration, but is even worse with dynamic allocation on.
The first problem is that for every executor Spark launches, it will print the whole command and all the env variables it's setting, even though those are exactly the same for every executor. That's not too bad with a handful of executors, but get annoying pretty soon after that. Dynamic allocation makes that problem worse since there executors constantly being started and then going away.
Also, there's a lot of logging generated by the dynamic allocation backend code in the YARN module. We should audit those and make sure they all make sense, and whether / how to reduce the amount of logging.