Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14641

Specify worker log dir separately from scratch space dir

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • None
    • Spark standalone on Univa Grid Engine

    Description

      According to

      http://spark.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging
      SPARK_WORKER_DIR Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).

      Spark scratch space and log files share the same directory. In our univa grid engine cluster configuration, we set SPARK_WORKER_DIR=/scratch/spark/work (local drive for each slave) and clean-up SPARK_WORKER_DIR on tear-down of the job to make sure there will be enough space on the drive for subsequent Spark jobs, i.e. regardless of success or fail, all files will be removed.

      For the purpose of debugging, I would like to access the slave log files after tear-down. For that purpose, writing the log files into a location different from scratch space, e.g. nfs $HOME, would allow me to keep the log files after tear-down while scratch space could still be cleared.

      Is it possible to specify the log dir separately from the scratch space dir? If it doesn't exist yet, I could imagine something like:

      SPARK_WORKER_LOG_DIR - directory for slave logs (default: SPARK_WORKER_DIR)

      A (temporary) workaround would be to set SPARK_WORKER_DIR=$HOME, which in this case would be on a network file system instead of locally on the slaves. Do you think, performance would suffer from having non-local scratch space.

      Attachments

        Activity

          People

            Unassigned Unassigned
            philipp.hanslovsky@gmail.com Philipp Hanslovsky
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: