Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11735

Logging from DoFn doesn't work with Spark Runner in cluster mode

Details

    Description

      Log messages emitted by any DoFn is not logged by spark executors when the pipeline is run with Spark in cluster deployment mode (on YARN). Tested on Cloudera 6 with Spark 2.4.

      I made a test project to reproduce the issue: https://github.com/ventuc/beam-log-test. Run it with:

      spark-submit --class beam.tests.log.LogTesting --name LogTesting --deploy-mode cluster --master yarn --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar

      To retrieve logs from YARN run:

      yarn logs -applicationId <app_id>

      As you can see, logs from the beam.tests.log appear only in the driver's log, and not in the executor's log.

       

      There's not any documentation about how to handle logs in Beam with the Spark runner. Please document it as requested also by BEAM-792.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            claventu Claudio Venturini

            Dates

              Created:
              Updated:

              Slack

                Issue deployment