Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26082

Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
    • 2.3.4, 2.4.1, 3.0.0
    • Mesos
    • None

    Description

      Currently in docs:

      spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs (example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the Mesos Fetcher Cache

      Currently in MesosClusterScheduler.scala (which passes parameter to driver):
      private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", false)

      Currently in MesosCourseGrainedSchedulerBackend.scala (which passes mesos caching parameter to executors):
      private val useFetcherCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false)

      This naming discrepancy dates back to version 2.0.0 (jira).

      This means that when spark.mesos.fetcherCache.enable=true is specified, the Mesos cache will be used only for executors, and not for drivers.

      IMPACT:
      Not caching these driver files (typically including at least spark binaries, custom jar, and additional dependencies) adds considerable overhead network traffic and startup time when frequently running spark Applications on a Mesos cluster. Additionally, since extracted files like spark-x.x.x-bin-*.tgz are additionally copied and left in the sandbox with the cache off (rather than extracted directly without an extra copy), this can considerably increase disk usage. Users CAN currently workaround by specifying the spark.mesos.fetchCache.enable option, but this should at least be specified in the documentation.

      SUGGESTED FIX:
      Add spark.mesos.fetchCache.enable to the documentation for versions 2 - 2.4, and update MesosClusterScheduler.scala to use spark.mesos.fetcherCache.enable going forward (literally a one-line change).

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mwlon Martin Loncaric
            mwlon Martin Loncaric
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment