Flume
  1. Flume
  2. FLUME-2442

Need an alternative to providing clear text passwords in flume config

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.5.0.1
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:

      Description

      For some sources and sinks, currently, passwords to keystores/other are specified in clear text in the flume config file. Since flume config files are often easily accessible to a broader audience (like in source control for instance), the visibility of these passwords can be too much and risky for institutions where security is too critical (like banks)

      To help address this visibility issue it would be desirable to do the following two things :

      1) Store the password in a separate file and provide the path of that password file in the flume config. this will enable the flume config to be shared with a wider audience and reduce risk. the password file will need to be very tightly guarded. Some components like file channel & JMS source already do this.

      2) As an additional measure, obfuscate the password in the external password file. A simple command line tool can be used to generate the obfuscated password file. Flume source/sink configuration will read the password file and de-obfuscate the password before using it to access the keystore. This obfuscation step IMO is nice but unclear to me if it is essential.

      The following Sources and Sinks appear to use inline cleartext passwords:

      • Avro Source
      • Avro sink
      • HTTP(S) source
      • File Channel
      • JMS Source

      JDBC channel also uses inline passwords but i am not aware of anybody who uses JDBC channel. So it may not be an issue.

      1. FLUME-2442.v1.patch
        62 kB
        Roshan Naik
      2. FLUME-2442.v2.patch
        62 kB
        Roshan Naik
      3. FLUME-2442.v3.patch
        62 kB
        Roshan Naik
      4. FLUME-2442.v4.patch
        63 kB
        Roshan Naik
      5. FLUME-2442.v5.patch
        63 kB
        Roshan Naik

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          17d 1h 38m 1 Roshan Naik 23/Aug/14 01:55
          Roshan Naik made changes -
          Attachment FLUME-2442.v5.patch [ 12670572 ]
          Roshan Naik made changes -
          Attachment FLUME-2442.v4.patch [ 12666962 ]
          Roshan Naik made changes -
          Attachment FLUME-2442.v3.patch [ 12664696 ]
          Hide
          Roshan Naik added a comment -

          Minor changes to patch.

          • hide password being typed
          • fix to "flume-ng --outfile" command line argument processing
          Show
          Roshan Naik added a comment - Minor changes to patch. hide password being typed fix to "flume-ng --outfile" command line argument processing
          Roshan Naik made changes -
          Attachment FLUME-2442.v2.patch [ 12664300 ]
          Hide
          Roshan Naik added a comment -

          new patch with minor bug fixes

          Show
          Roshan Naik added a comment - new patch with minor bug fixes
          Roshan Naik made changes -
          Description For some sources and sinks, currently, passwords to keystores/other are specified in clear text in the flume config file. Since flume config files are often easily accessible to a broader audience (like in source control for instance), the visibility of these passwords can be too much and risky for institutions where security is too critical (like banks)

          To help address this visibility issue it would be desirable to do the following two things :

          1) Store the password in a separate file and provide the path of that password file in the flume config. this will enable the flume config to be shared with a wider audience and reduce risk. the password file will need to be very tightly guarded. Some components like file channel & JMS source already do this.

          2) As an additional measure, obfuscate the password in the external password file. A simple command line tool can be used to generate the obfuscated password file. Flume source/sink configuration will read the password file and de-obfuscate the password before using it to access the keystore. This obfuscation step IMO is nice but unclear to me if it is essential.


          The following Sources and Sinks appear to use inline cleartext passwords:
          - Avro Source
          - Avro sink
          - HTTP(S) source

          JDBC channel also uses inline passwords but i am not aware of anybody who uses JDBC channel. So it may not be an issue.
          For some sources and sinks, currently, passwords to keystores/other are specified in clear text in the flume config file. Since flume config files are often easily accessible to a broader audience (like in source control for instance), the visibility of these passwords can be too much and risky for institutions where security is too critical (like banks)

          To help address this visibility issue it would be desirable to do the following two things :

          1) Store the password in a separate file and provide the path of that password file in the flume config. this will enable the flume config to be shared with a wider audience and reduce risk. the password file will need to be very tightly guarded. Some components like file channel & JMS source already do this.

          2) As an additional measure, obfuscate the password in the external password file. A simple command line tool can be used to generate the obfuscated password file. Flume source/sink configuration will read the password file and de-obfuscate the password before using it to access the keystore. This obfuscation step IMO is nice but unclear to me if it is essential.


          The following Sources and Sinks appear to use inline cleartext passwords:
          - Avro Source
          - Avro sink
          - HTTP(S) source
          - File Channel
          - JMS Source

          JDBC channel also uses inline passwords but i am not aware of anybody who uses JDBC channel. So it may not be an issue.
          Roshan Naik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Roshan Naik made changes -
          Remote Link This issue links to "Review Board link (Web Link)" [ 17354 ]
          Roshan Naik made changes -
          Field Original Value New Value
          Attachment FLUME-2442.v1.patch [ 12663806 ]
          Hide
          Roshan Naik added a comment -

          Uploading the patch:
          Here is the solution I have implemented:

          Summary of what is implemented...

          1) Extended command line to create an obfuscated password file

          • "flume-ng password /path/passwordFile" is the command to create a new password file which contains password in obfuscated form

          2) For components which dont already have a option of external password file (Avro source/sink, HTTP source)

          • provided an config passwordFile setting that points to external file
          • user can use either the existing inline clear text password or use the external passwordFile (ensuring backward compat)
          • added another optional config setting passwordFileType. It defaults to 'TEXT' which means external password file is in clear text. It can be set to "AES" which means the password is stored in the password file in obfuscated form (using AES-CTR with a default key). Such a file can be created using the "flume-ng password" command.

          3) For components which have ability store passwords externally (JMS source, File channel)

          • provided the additional passwordFileType option, same as above. This retains backward compat while allowing one to have the external password file to store in obfuscated form
          Show
          Roshan Naik added a comment - Uploading the patch: Here is the solution I have implemented: Summary of what is implemented... 1) Extended command line to create an obfuscated password file "flume-ng password /path/passwordFile" is the command to create a new password file which contains password in obfuscated form 2) For components which dont already have a option of external password file (Avro source/sink, HTTP source) provided an config passwordFile setting that points to external file user can use either the existing inline clear text password or use the external passwordFile (ensuring backward compat) added another optional config setting passwordFileType. It defaults to 'TEXT' which means external password file is in clear text. It can be set to "AES" which means the password is stored in the password file in obfuscated form (using AES-CTR with a default key). Such a file can be created using the "flume-ng password" command. 3) For components which have ability store passwords externally (JMS source, File channel) provided the additional passwordFileType option, same as above. This retains backward compat while allowing one to have the external password file to store in obfuscated form
          Hide
          Roshan Naik added a comment - - edited

          There is an initiative in Hadoop to create an API to address clear text passwords that some projects seem to be adopting.

          https://issues.apache.org/jira/browse/HADOOP-10607
          https://issues.apache.org/jira/browse/HADOOP-10904

          To summarize my understanding of it:

          • Its an api for accessing passwords which are stored elsewhere
          • It will have several "providers" which will determine where the password actually gets stored
          • A simple provider will basically put the password in a local/hdfs file .. the api has an option to use a default "none" password to decrypt the local/hdfs file. this functionality is very similar to what we do currently when we put the keys in a local keystore.
          • A more sophisticated "provider" will user kerberos keytab to acquire the password from a remote "credential store service"
          • The api interface will support a few different providers and each provider may require some unique configuration settings.. but the API interface would remain the same.

          Benefits of going with this API for flume:

          • Give flume users the option to use more sophisticated secure providers as they become available.
          • with kerberos based provider we completely move away from clear text passwords

          Concerns that I have:

          • Initially only the simple local store provider will be available. So it will not offer any new level of security than the current local keystore based approach we have in say file channel and jms source. A minor benefit is that it has the option of using a default "none" password which means no explicit cleartext password is required in config file (although it has hard code in some java source file)
          • Once the kerberos based provider becomes available, level of security does not appear to be much stronger than the local keystore approach. Since, in both case the security is governed by the file permission of keytab or local keystore. So once the hacker gets access to the keytab or keystore file, the security has been compromised. Nevertheless it will still be a solution that does not use cleartext passwords (which is concern for some security sensitive institutions like banks).
          • Remote credential store functionality seems to be not very interesting for flume deployments especially if it spans multiple data centers, since the credential store service would typically be running on the hadoop cluster.
          • Hadoop Dependency - The credential provider api is part of hadoop-common and consequently it would require hadoop to be installed along side flume in many cases (sources/sinks that use that feature)
          Show
          Roshan Naik added a comment - - edited There is an initiative in Hadoop to create an API to address clear text passwords that some projects seem to be adopting. https://issues.apache.org/jira/browse/HADOOP-10607 https://issues.apache.org/jira/browse/HADOOP-10904 To summarize my understanding of it: Its an api for accessing passwords which are stored elsewhere It will have several "providers" which will determine where the password actually gets stored A simple provider will basically put the password in a local/hdfs file .. the api has an option to use a default "none" password to decrypt the local/hdfs file. this functionality is very similar to what we do currently when we put the keys in a local keystore. A more sophisticated "provider" will user kerberos keytab to acquire the password from a remote "credential store service" The api interface will support a few different providers and each provider may require some unique configuration settings.. but the API interface would remain the same. Benefits of going with this API for flume: Give flume users the option to use more sophisticated secure providers as they become available. with kerberos based provider we completely move away from clear text passwords Concerns that I have: Initially only the simple local store provider will be available. So it will not offer any new level of security than the current local keystore based approach we have in say file channel and jms source. A minor benefit is that it has the option of using a default "none" password which means no explicit cleartext password is required in config file (although it has hard code in some java source file) Once the kerberos based provider becomes available, level of security does not appear to be much stronger than the local keystore approach. Since, in both case the security is governed by the file permission of keytab or local keystore. So once the hacker gets access to the keytab or keystore file, the security has been compromised. Nevertheless it will still be a solution that does not use cleartext passwords (which is concern for some security sensitive institutions like banks). Remote credential store functionality seems to be not very interesting for flume deployments especially if it spans multiple data centers, since the credential store service would typically be running on the hadoop cluster. Hadoop Dependency - The credential provider api is part of hadoop-common and consequently it would require hadoop to be installed along side flume in many cases (sources/sinks that use that feature)
          Roshan Naik created issue -

            People

            • Assignee:
              Roshan Naik
              Reporter:
              Roshan Naik
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development