Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45869

Revisit and Improve Spark Standalone Cluster

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • Spark Core

    Description

      Spark Standalone Cluster has been supported for a long time as one of the resource managers.

      As a part of Apache Spark 4.0.0, we revisit all layers of `Spark Standalone Cluster` as a long running subsystem inside K8s environment.

      1. Spark Master, Worker, History Server Web UI Layer
      2. Spark Master HA and Recovery Layer
      3. Spark Master REST API Layer (including Cluster Utilization monitoring)
      4. Spark Job Scheduling Layer
      5. Spark Worker Management by exposing Cluster Utilization monitoring for Elastic Cluster Management
      6. Spark Master/Worker dependency and classpath audit
      7. Security
      8. Documentation

      Attachments

        Issue Links

          1.
          Add PersistenceEngineBenchmark Sub-task Resolved Dongjoon Hyun
          2.
          Include `Driver/App` data in `PersistenceEngineBenchmark` Sub-task Resolved Dongjoon Hyun
          3.
          Add RocksDBPersistenceEngine Sub-task Resolved Dongjoon Hyun
          4.
          Make RocksDBPersistenceEngine to support a symbolic link Sub-task Resolved Dongjoon Hyun
          5.
          Improve `PersistenceEngine` performance with `KryoSerializer` Sub-task Resolved Dongjoon Hyun
          6.
          Improve `FileSystemPersistenceEngine.persist` error message in case of the existing file Sub-task Resolved Dongjoon Hyun
          7.
          Improve `FileSystemPersistenceEngine` to allow non-exist parents Sub-task Resolved Dongjoon Hyun
          8.
          Improve `FileSystemPersistenceEngine` to support compressions Sub-task Resolved Dongjoon Hyun
          9.
          Improve `Master` to recover quickly in case of zero workers and apps Sub-task Resolved Dongjoon Hyun
          10.
          Enable `spark.worker.cleanup.enabled` by default Sub-task Resolved Dongjoon Hyun
          11.
          Support `spark.deploy.recoveryTimeout` Sub-task Resolved Dongjoon Hyun
          12.
          Make `spark.deploy.recovery*` documentation up-to-date Sub-task Resolved Dongjoon Hyun
          13.
          Support `JWSFilter` Sub-task Resolved Dongjoon Hyun
          14.
          Support `killall` in REST Submission API Sub-task Resolved Dongjoon Hyun
          15.
          Support `spark.master.rest.host` Sub-task Resolved Dongjoon Hyun
          16.
          Support `clear` in REST Submission API Sub-task Resolved Dongjoon Hyun
          17.
          Support `readyz` in REST Submission API Sub-task Resolved Dongjoon Hyun
          18.
          Support server-side `environmentVariables` replacement in REST Submission API Sub-task Resolved Dongjoon Hyun
          19.
          Support server-side `sparkProperties` replacement in REST Submission API Sub-task Resolved Dongjoon Hyun
          20.
          Make `appArgs` and `environmentVariables` optional in REST API Sub-task Resolved Dongjoon Hyun
          21.
          Make `spark.app.name` property optional in REST API Sub-task Resolved Dongjoon Hyun
          22.
          Support `spark.deploy.maxDrivers` Sub-task Resolved Dongjoon Hyun
          23.
          Support `spark.deploy.spreadOutDrivers` Sub-task Resolved Dongjoon Hyun
          24.
          Support `spark.deploy.workerSelectionPolicy` Sub-task Resolved Dongjoon Hyun
          25.
          Support `spark.worker.idPattern` Sub-task Resolved Dongjoon Hyun
          26.
          Support `spark.deploy.driverIdPattern` Sub-task Resolved Dongjoon Hyun
          27.
          Support `spark.deploy.appIdPattern` Sub-task Resolved Dongjoon Hyun
          28.
          Support `spark.deploy.appNumberModulo` to rotate app number Sub-task Resolved Dongjoon Hyun
          29.
          Support `spark.master.useAppNameAsAppId.enabled` Sub-task Resolved Dongjoon Hyun
          30.
          Support `spark.master.useDriverIdAsAppName.enabled` Sub-task Resolved Dongjoon Hyun
          31.
          Support `spark.test.appId` in `LocalSchedulerBackend` Sub-task Resolved Dongjoon Hyun
          32.
          Support `spark.master.ui.historyServerUrl` in `ApplicationPage` Sub-task Resolved Dongjoon Hyun
          33.
          Support `spark.worker.(initial|max)RegistrationRetries` Sub-task Resolved Dongjoon Hyun
          34.
          Support `spark.driver.timeout` and `DriverTimeoutPlugin` Sub-task Resolved Dongjoon Hyun
          35.
          Support `spark.master.rest.filters` Sub-task Resolved Dongjoon Hyun
          36.
          Support Spark Master Log UI Sub-task Resolved Dongjoon Hyun
          37.
          Support Spark Worker Log UI Sub-task Resolved Dongjoon Hyun
          38.
          Support Spark History Server Log UI Sub-task Resolved Dongjoon Hyun
          39.
          Support Spark Driver Live Log UI Sub-task Resolved Dongjoon Hyun
          40.
          Support top-level filtering in MasterPage JSON API Sub-task Resolved Dongjoon Hyun
          41.
          Add `Environment` page to Master UI Sub-task Resolved Dongjoon Hyun
          42.
          Add `Environment Variables` table to Master `EnvironmentPage` Sub-task Resolved Dongjoon Hyun
          43.
          Improve `MasterPage` to support custom title Sub-task Resolved Dongjoon Hyun
          44.
          Support custom History Server UI title Sub-task Resolved Dongjoon Hyun
          45.
          Show a summary of workers in MasterPage Sub-task Resolved Dongjoon Hyun
          46.
          Show the number of drivers waiting in SUBMITTED status Sub-task Resolved Dongjoon Hyun
          47.
          Show the number of abnormally completed drivers in MasterPage Sub-task Resolved Dongjoon Hyun
          48.
          Show `Duration` in `ApplicationPage` Sub-task Resolved Dongjoon Hyun
          49.
          Improve `MasterPage` to show `Resource` column only when it exists Sub-task Resolved Dongjoon Hyun
          50.
          Show driver log location in Spark History Server Sub-task Resolved Dongjoon Hyun
          51.
          Show the number of cached RDDs in StoragePage Sub-task Resolved Dongjoon Hyun
          52.
          Hide `Thread Dump` and `Heap Histogram` of `Dead` executors in `Executors` UI Sub-task Resolved Dongjoon Hyun
          53.
          Make StandaloneRestServer add JavaModuleOptions to drivers Sub-task Resolved Dongjoon Hyun
          54.
          Fix WorkerPage to use the same pattern for `logPage` urls Sub-task Resolved Dongjoon Hyun
          55.
          Fix getBaseURI error in Spark Worker LogPage UI buttons Sub-task Resolved Dongjoon Hyun
          56.
          Fix `MasterPage` to sort `Running Drivers` table by `Duration` column correctly Sub-task Resolved Dongjoon Hyun
          57.
          Fix Spark History Server to sort `Duration` column properly Sub-task Resolved Dongjoon Hyun
          58.
          Collect and update `spark-standalone.md` with new confs Sub-task Resolved Dongjoon Hyun
          59.
          Fix `Spark Standalone` documentation table layout Sub-task Resolved Dongjoon Hyun
          60.
          Make single-pod spark jobs respect spark.app.id Sub-task Resolved Dongjoon Hyun
          61.
          Document `spark.master.*` configurations Sub-task Resolved Dongjoon Hyun
          62.
          EventLogFileReader should not read rolling logs if appStatus is missing Sub-task Resolved Dongjoon Hyun
          63.
          Redact `awsAccessKeyId` by including `accesskey` pattern Sub-task Resolved Dongjoon Hyun
          64.
          Log the final state of drivers during `Master.removeDriver` Sub-task Resolved Dongjoon Hyun
          65.
          Log Spark HA recovery duration Sub-task Resolved Dongjoon Hyun
          66.
          Warn properly when a driver exists successfully but master is disconnected Sub-task Resolved Dongjoon Hyun
          67.
          Fix Master to update worker from UNKNOWN to ALIVE on RegisterWorker message Sub-task Resolved Dongjoon Hyun
          68.
          Remove `kill` link from RELAUNCHING drivers in MasterPage Sub-task Resolved Dongjoon Hyun
          69.
          Remove `*slave*.sh` scripts Sub-task Resolved Dongjoon Hyun
          70.
          Refactor to improve `RegisterWorker` unit test Sub-task Resolved Dongjoon Hyun
          71.
          Rename spark.deploy.spreadOut to spark.deploy.spreadOutApps Sub-task Resolved Dongjoon Hyun
          72.
          Improve `MasterSuite` to use nanoTime-based appIDs and workerIDs Sub-task Resolved Dongjoon Hyun
          73.
          Fix `spark-daemon.sh` usage by adding `decommission` command Sub-task Resolved Dongjoon Hyun
          74.
          Make `WorkerResourceInfo` extend `Serializable` explicitly Sub-task Resolved Dongjoon Hyun
          75.
          Add `logrotate` to Spark docker files Sub-task Resolved Dongjoon Hyun
          76.
          Recover `log-view.js` to be non-module Sub-task Resolved Dongjoon Hyun
          77.
          Support `/json/clusterutilization` API Sub-task Resolved Dongjoon Hyun
          78.
          Fix `Master` to reject worker kill request if decommission is disabled Sub-task Resolved Dongjoon Hyun
          79.
          Ensure trailing slashes in `HistoryServer` URL redirections Sub-task Resolved huangzhir
          80.
          Validate `spark.master.ui.decommission.allow.mode` setting Sub-task Resolved Dongjoon Hyun
          81.
          Remove POST APIs from `MasterWebUI` when spark.ui.killEnabled is false Sub-task Resolved Dongjoon Hyun
          82.
          Fix `Load New` button in `Master/HistoryServer` Log UI Sub-task Resolved Dongjoon Hyun
          83.
          Check logType in Utils.getLog Sub-task Resolved Dongjoon Hyun
          84.
          Use `getTotalMemorySize` in `WorkerArguments` Sub-task Resolved Dongjoon Hyun
          85.
          Document REST API for Spark Standalone Cluster Sub-task Resolved Dongjoon Hyun
          86.
          Document `SPARK_LOG_*` and `SPARK_PID_DIR` Sub-task Resolved Dongjoon Hyun
          87.
          Document spark.network.timeoutInterval Sub-task Resolved Dongjoon Hyun
          88.
          Document Spark Driver Live Log UI Sub-task Resolved Dongjoon Hyun
          89.
          Document MasterPage custom title conf and REST API server-side env variable replacements Sub-task Resolved Dongjoon Hyun
          90.
          Document `JWSFilter` usage in Spark UI and REST API and rename parameter to `secretKey` Sub-task Resolved Dongjoon Hyun
          91.
          Update `Configuring Ports for Network Security` section with JWS Sub-task Resolved Dongjoon Hyun
          92.
          Fix `RealBrowserUISeleniumSuite` Sub-task Resolved Dongjoon Hyun
          93.
          Add `WebBrowserTest` Sub-task Resolved Dongjoon Hyun
          94.
          Exclude `CodeHaus Jackson` dependencies from Master/Worker/HistoryServer classpaths Sub-task Resolved Unassigned
          95.
          Fix `MasterSuite` to validate the number of registered workers Sub-task Resolved Dongjoon Hyun
          96.
          Reduce the number of required threads in MasterSuite Sub-task Resolved Dongjoon Hyun
          97.
          Make `PluginEndpoint` warn when plugins reply for one-way message Sub-task Resolved Dongjoon Hyun
          98.
          Make `BlockManager` warn before `removeBlockInternal` Sub-task Resolved Dongjoon Hyun
          99.
          Add `jjwt` profile Sub-task Resolved Dongjoon Hyun
          100.
          Add `submit_pi.sh` REST API example Sub-task Resolved Dongjoon Hyun
          101.
          Add `submit-sql.sh` REST API example Sub-task Resolved Dongjoon Hyun
          102.
          Move `spark.history.ui.maxApplications` config definition to `History.scala` Sub-task Resolved Dongjoon Hyun
          103.
          Redact `Spark Command` output in `launcher` module Sub-task Resolved Dongjoon Hyun
          104.
          Make Spark Deamons support `spark.log.structuredLogging.enabled` Sub-task Resolved Dongjoon Hyun
          105.
          Spark deamons should respect spark.log.structuredLogging.enabled conf Sub-task Resolved Cheng Pan
          106.
          Add JavaSparkSQLCli example Sub-task Resolved Dongjoon Hyun
          107.
          Simplify the log when Spark HybridStore hits the memory limit Sub-task Resolved Dongjoon Hyun
          108.
          Fix `ApplicationPage` to hide App UI link when UI is disabled Sub-task Resolved Dongjoon Hyun
          109.
          Fix 'MasterPage' to hide App UI links when UI is disabled Sub-task Resolved Dongjoon Hyun
          110.
          Unify `*.sh` file naming patterns by replacing `_` with `-` Sub-task Resolved Dongjoon Hyun
          111.
          Fix `StandaloneRestServer` to propagate `spark.app.name` to SparkSubmit properly Sub-task Resolved Dongjoon Hyun
          112.
          Fix `SparkSubmit` to show REST API `kill` response properly Sub-task Resolved Dongjoon Hyun
          113.
          Support `spark.submit.appName` Sub-task Resolved Dongjoon Hyun
          114.
          Define `BLOCK_MANAGER_REREGISTRATION_FAILED` as `ExecutorExitCode` Sub-task Resolved Dongjoon Hyun

          Activity

            People

              dongjoon Dongjoon Hyun
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: