Github user StephanEwen commented on a diff in the pull request:
— Diff: docs/internals/flink_security.md —
@@ -24,64 +24,109 @@ specific language governing permissions and limitations
under the License.
-This document briefly describes how Flink security works in the context of various deployment mechanism (Standalone/Cluster vs YARN)
-and the connectors that participates in Flink Job execution stage. This documentation can be helpful for both administrators and developers
-who plans to run Flink on a secure environment.
+This document briefly describes how Flink security works in the context of various deployment mechanisms (Standalone, YARN, or Mesos),
+filesystems, connectors, and state backends.
+The primary goals of the Flink Kerberos security infrastructure are:
+1. to enable secure data access for jobs within a cluster via connectors (e.g. Kafka)
+2. to authenticate to ZooKeeper (if configured to use SASL)
+3. to authenticate to Hadoop components (e.g. HDFS, HBase)
-The primary goal of Flink security model is to enable secure data access for jobs within a cluster via connectors. In a production deployment scenario,
-streaming jobs are understood to run for longer period of time (days/weeks/months) and the system must be able to authenticate against secure
-data sources throughout the life of the job. The current implementation supports running Flink clusters (Job Manager/Task Manager/Jobs) under the
-context of a Kerberos identity based on Keytab credential supplied during deployment time. Any jobs submitted will continue to run in the identity of the cluster.
+In a production deployment scenario, streaming jobs are understood to run for long periods of time (days/weeks/months) and be able to authenticate to secure
+data sources throughout the life of the job. Kerberos keytabs do not expire in that timeframe, unlike a Hadoop delegation token
+or ticket cache entry.
+The current implementation supports running Flink clusters (Job Manager/Task Manager/jobs) with either a configured keytab credential
+or with Hadoop delegation tokens. Keep in mind that all jobs share the credential configured for a given cluster.
- How Flink Security works
-Flink deployment includes running Job Manager/ZooKeeper, Task Manager(s), Web UI and Job(s). Jobs (user code) can be submitted through web UI and/or CLI.
-A Job program may use one or more connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.,) and each connector may have a specific security
-requirements (Kerberos, database based, SSL/TLS, custom etc.,). While satisfying the security requirements for all the connectors evolves over a period
-of time, at this time of writing, the following connectors/services are tested for Kerberos/Keytab based security.
+In concept, a Flink program may use first- or third-party connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary authentication methods (Kerberos, SSL/TLS, username/password, etc.). While satisfying the security requirements for all connectors is an ongoing effort,
+Flink provides first-class support for Kerberos authentication only. The following services and connectors are tested for Kerberos authentication:
– Kafka (0.9)
-Hadoop uses the UserGroupInformation (UGI) class to manage security. UGI is a static implementation that takes care of handling Kerberos authentication. The Flink bootstrap implementation
-(JM/TM/CLI) takes care of instantiating UGI with the appropriate security credentials to establish the necessary security context.
+Note that it is possible to enable the use of Kerberos independently for each service or connector. For example, the user may enable
+Hadoop security without necessitating the use of Kerberos for ZooKeeper, or vice versa. The shared element is the configuration of
+Kerbreros credentials, which is then explicitly used by each component.
+The internal architecture is based on security modules (implementing `org.apache.flink.runtime.security.modules.SecurityModule`) which
+are installed at startup. The next section describes each security module.
+### Hadoop Security Module
+This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a process-wide login user context. The login user is
+then used for all interactions with Hadoop, including HDFS, HBase, and YARN.
+If Hadoop security is enabled (in `core-site.xml`), the login user will have whatever Kerberos credential is configured. Otherwise,
+the login user conveys only the user identity of the OS account that launched the cluster.
+### JAAS Security Module
+This module provides a dynamic JAAS configuration to the cluster, making available the configured Kerberos credential to ZooKeeper,
+Kafka, and other such components that rely on JAAS.
+Note that the user may also provide a static JAAS configuration file using the mechanisms described in the [Java SE Documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html). Static entries override any
+dynamic entries provided by this module.
+### ZooKeeper Security Module
+This module configures certain process-wide ZooKeeper security-related settings, namely the ZooKeeper service name (default: `zookeeper`)
+and the JAAS login context name (default: `Client`).
+## Security Configuration
+### Flink Configuration
+The user's Kerberos ticket cache (managed with `kinit`) is used automatically, based on the following configuration option:
+- `security.kerberos.login.use-ticket-cache`: Indicates whether to read from the user's Kerberos ticket cache (default: `true`).
+A Kerberos keytab can be supplied by adding below configuration elements to the Flink configuration file:
+- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab file that contains the user credentials.
+- `security.kerberos.login.principal`: Kerberos principal name associated with the keytab.
+These configuration options establish a cluster-wide credential to be used in a Hadoop and/or JAAS context. Whether the credential is used in a Hadoop context is based on the Hadoop configuration (see next section). To be used in a JAAS context, the configuration specifies which JAAS login contexts (or applications) are enabled with the following configuration option:
+- `security.kerberos.login.contexts`: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, `Client` to use the credentials for ZooKeeper authentication).
-Services like Kafka and ZooKeeper use SASL/JAAS based authentication mechanism to authenticate against a Kerberos server. It expects JAAS configuration with a platform-specific login
-module name to be provided. Managing per-connector configuration files will be an overhead and to overcome this requirement, a process-wide JAAS configuration object is
-instantiated which serves standard ApplicationConfigurationEntry for the connectors that authenticates using SASL/JAAS mechanism.
+ZooKeeper-related configuration overrides:
-It is important to understand that the Flink processes (JM/TM/UI/Jobs) itself uses UGI's doAS() implementation to run under a specific user context, i.e. if Hadoop security is enabled
-then the Flink processes will be running under a secure user account or else it will run as the OS login user account who starts the Flink cluster.
+- `zookeeper.sasl.service-name`: The Kerberos service name that the ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates mutual-authentication between the client (Flink) and server.
- Security Configurations
+- `zookeeper.sasl.login-context-name`: The JAAS login context name that the ZooKeeper client uses to request the login context (default: `Client`). Should match
+one of the values specified in `security.kerberos.login.contexts`.
-Secure credentials can be supplied by adding below configuration elements to Flink configuration file:
+### Hadoop Configuration
– `security.keytab`: Absolute path to Kerberos keytab file that contains the user credentials/secret.
+The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment variable and by other means (see `org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`). The Kerberos credential (configured above) is used automatically if Hadoop security is enabled.
– `security.principal`: User principal name that the Flink cluster should run as.
+Note that Kerberos credentials found in the ticket cache aren't transferrable to other hosts. In this scenario, the Flink CLI acquires Hadoop
+delegation tokens (for HDFS and for HBase).
-The delegation token mechanism (kinit cache) is still supported for backward compatibility but enabling security using keytab configuration is the preferred and recommended approach.
+## Deployment Modes
+Here is some information specific to each deployment mode.
- Standalone Mode:
+### Standalone Mode
Steps to run a secure Flink cluster in standalone/cluster mode:
– Add security configurations to Flink configuration file (on all cluster nodes)
– Make sure the Keytab file exist in the path as indicated in security.keytab configuration on all cluster nodes
– Deploy Flink cluster using cluster start/stop scripts or CLI
+1. Add security-related configuration options to the Flink configuration file (on all cluster nodes).
+2. Ensure that the keytab file exists at the path indicated by `security.kerberos.login.keytab` on all cluster nodes.
+3. Deploy Flink cluster as normal.
- Yarn Mode:
+### YARN/Mesos Mode
-Steps to run secure Flink cluster in Yarn mode:
– Add security configurations to Flink configuration file (on the node from where cluster will be provisioned using Flink/Yarn CLI)
– Make sure the Keytab file exist in the path as indicated in security.keytab configuration
– Deploy Flink cluster using CLI
+Steps to run a secure Flink cluster in YARN/Mesos mode:
+1. Add security-related configuration options to the Flink configuration file on the client.
+2. Ensure that the keytab file exists at the path as indicated by `security.kerberos.login.keytab` on the client node.
+3. Deploy Flink cluster as normal.
-In Yarn mode, the user supplied keytab will be copied over to the Yarn containers (App Master/JM and TM) as the Yarn local resource file.
-Security implementation details are based on <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">Yarn security</a>
+In YARN/Mesos mode, the keytab is automatically copied from the client to the Flink containers.
- Token Renewal
+For more information, see <a href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md">YARN security</a> documentation.
-UGI and Kafka/ZK login module implementations takes care of auto-renewing the tickets upon reaching expiry and no further action is needed on the part of Flink.
\ No newline at end of file
+## Further Details
+### Ticket Renewal
+Each component that uses Kerberos is independently responsible for renewing the Kerberos TGT. Hadoop, ZooKeeper, and Kafka all do so,
— End diff –
I would add a sentence here that this requires specifying a keytab to work. Hadoop credentials will expire is only ticket cache / delegation tokens are used.