Proposal for a new feature to enable NiFi users to execute Spark jobs. A natural entry point for this is to use Apache Livy, as it is a "REST service for Apache Spark". This would allow NiFi to submit Spark jobs without needing to bundle a Spark client itself (and maintain versions of Spark, e.g.).
Some of the components that could be involved include:
LivySessionController Controller Service (CS) - provides connections to available sessions in Livy
- Users could request a type of connection or to retrieve the same connection back by session id if available.
- Properties to configure Livy session such as number of executors, memory
- Property for connection pool size
- Will interact with Livy ensure that only connections that are idle/available are added to the pool and checked back in
- Key for pool could be based on session id or type
- Ensure to provide any user credentials
- Leverages SSLContext for security
- Obtains Spark JARs/files via properties and/or flow file attribute(s)
- Obtains connection information from LivySessionController
- Provides attributes to configure session, maintain session id, attach to session id
- Potential advanced UI available for testing code (probably a follow-on Jira)