Log messages emitted by any DoFn is not logged by spark executors when the pipeline is run with Spark in cluster deployment mode (on YARN). Tested on Cloudera 6 with Spark 2.4.
I made a test project to reproduce the issue: https://github.com/ventuc/beam-log-test. Run it with:
spark-submit --class beam.tests.log.LogTesting --name LogTesting --deploy-mode cluster --master yarn --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar
To retrieve logs from YARN run:
yarn logs -applicationId <app_id>
As you can see, logs from the beam.tests.log appear only in the driver's log, and not in the executor's log.
There's not any documentation about how to handle logs in Beam with the Spark runner. Please document it as requested also by BEAM-792.