Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27048

A way to execute functions on Executor Startup and Executor Exit in Standalone

    Details

    • Type: Wish
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.1, 2.3.3
    • Fix Version/s: None
    • Component/s: Deploy, Spark Submit
    • Labels:

      Description

      Background

      We have a Spark Standalone ETL workload that is heavily dependent on Apache Ignite KV store for lookup/reference data. There are hundreds (400+) of lookup data some are up to 300K records. We formerly used broadcast variables but later found out that it was not fast enough.

      So we decided implement a caching mechanism by retrieving reference data from JDBC source and put them in-memory through Apache ignite as replicated cache. Each Spark worker node is also running an Ignite node (JVM). Then we let the spark executors retrieve the data from Ignite through "shared memory port". This is very fast but is causing instability in the Ignite cluster. The reason is that when the Spark executor JVM terminates, the Ignite Data Grid is terminated abnormally. This makes the Ignite cluster wait for the client node (which is the spark executor) to reconnect making the Ignite cluster non-responsive for a while.

      Wish

      We have this need for an ability to close the ignite client node gracefully just before the Executor process ends. So a feature that makes it possible to pass an EventHandler for "executor.onStart" and "executor.exitExecutor()" would be really really useful.

      It could be a spark-submit argument or an entry in the spark-defaults.conf that looks something like:

      spark.executor.startUpClass=com.company.ExecutorInitializer
      spark.executor.shutdownClass=com.company.ExecutorCleaner

      The class will have to implement an interface provided by Spark. This class can then be loaded dynamically in the CoarseGrainedExecutorBackend and called on the onStart() and exitExecutor() methods respectively

      This is also useful for opening and closing JDBC connections per executor instead of per partition.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              briggxs Ross Brigoli
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: