Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-589

Need a way to flag sensitive information in Config

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: container
    • Labels:
      None

      Description

      Currently, the full contents of a job's Config is exposed in at least a couple of places including the logs (logged by SamzaContainer), and the ApplicationMaster UI's config page. There is a security concern with doing that if sensitive information (e.g. credentials) is stored there. It would be nice to be able to mark sensitive config values so that they are not displayed in such ways. The only thing that springs to mind is a special naming convention, perhaps a "sensitive" prefix that would identify these values. Ideally such a capability would be baked into Config itself, but minimally Samza code that exposes Config could be made aware of the convention to avoid displaying the plaintext of sensitive values.

        Issue Links

          Activity

          Hide
          criccomini Chris Riccomini added a comment -

          Sentry uses this.

          Alternatively, the way Hadoop handles this is to use real security. Authentication is required to view secure AMs and logs. This prevents any leakage.

          TBH, the secure-Hadoop approach seems better, but I'm not sure how far YARN is from getting secure-YARN working for long-lived services (YARN-896).

          Show
          criccomini Chris Riccomini added a comment - Sentry uses this . Alternatively, the way Hadoop handles this is to use real security. Authentication is required to view secure AMs and logs. This prevents any leakage. TBH, the secure-Hadoop approach seems better, but I'm not sure how far YARN is from getting secure-YARN working for long-lived services ( YARN-896 ).
          Hide
          twbecker Tommy Becker added a comment -

          Yeah, I guess authentication would be the ultimate solution. Although authentication wouldn't prevent these values from being logged. Having an explicit prefix would allow us to do a fair bit better than the Sentry example where they are scrubbing things after the fact. Additionally, even in the case where authentication for say the AM UI was required, it would be nice to mask sensitive values by default.

          Show
          twbecker Tommy Becker added a comment - Yeah, I guess authentication would be the ultimate solution. Although authentication wouldn't prevent these values from being logged. Having an explicit prefix would allow us to do a fair bit better than the Sentry example where they are scrubbing things after the fact. Additionally, even in the case where authentication for say the AM UI was required, it would be nice to mask sensitive values by default.
          Hide
          criccomini Chris Riccomini added a comment -

          Although authentication wouldn't prevent these values from being logged.

          Ya. In secure-YARN, they lock down the logs directories to be 600, so no one can see them except the user that started the job.

          Additionally, even in the case where authentication for say the AM UI was required, it would be nice to mask sensitive values by default.

          I agree, masking would be handy, even if we end up with secure Hadoop.

          Show
          criccomini Chris Riccomini added a comment - Although authentication wouldn't prevent these values from being logged. Ya. In secure-YARN, they lock down the logs directories to be 600, so no one can see them except the user that started the job. Additionally, even in the case where authentication for say the AM UI was required, it would be nice to mask sensitive values by default. I agree, masking would be handy, even if we end up with secure Hadoop.
          Hide
          twbecker Tommy Becker added a comment -

          Ya. In secure-YARN, they lock down the logs directories to be 600, so no one can see them except the user that started the job.

          Ahh I see. Still, our concern is at least partly about having the values logged at all, since the log files can get distributed to others without authorization to production credentials.

          Show
          twbecker Tommy Becker added a comment - Ya. In secure-YARN, they lock down the logs directories to be 600, so no one can see them except the user that started the job. Ahh I see. Still, our concern is at least partly about having the values logged at all, since the log files can get distributed to others without authorization to production credentials.
          Hide
          criccomini Chris Riccomini added a comment -

          Still, our concern is at least partly about having the values logged at all

          I agree, it's a bit sketchy. Masking, at a minimum, would help.

          Show
          criccomini Chris Riccomini added a comment - Still, our concern is at least partly about having the values logged at all I agree, it's a bit sketchy. Masking, at a minimum, would help.
          Hide
          criccomini Chris Riccomini added a comment -

          Thought about this a bit. Masking config should be fairly easy if we just tweak the Config object's toString method to filter by a masking prefix. We'll have to do an audit, but I don't think we're dumping Config without toString in too many places--maybe just the YARN UI.

          Show
          criccomini Chris Riccomini added a comment - Thought about this a bit. Masking config should be fairly easy if we just tweak the Config object's toString method to filter by a masking prefix. We'll have to do an audit, but I don't think we're dumping Config without toString in too many places--maybe just the YARN UI.
          Hide
          twbecker Tommy Becker added a comment -

          Yeah I played with this a bit, adding a sanitize() method to Config that returns masked values for sensitive keys. The downside to this approach I guess is that you have to know to call sanitize(), where the toString() method "just works", unless of course you're iterating over the config for something like the UI.

          Show
          twbecker Tommy Becker added a comment - Yeah I played with this a bit, adding a sanitize() method to Config that returns masked values for sensitive keys. The downside to this approach I guess is that you have to know to call sanitize(), where the toString() method "just works", unless of course you're iterating over the config for something like the UI.
          Hide
          criccomini Chris Riccomini added a comment -

          Yea, I was just thinking of baking it directly into the Config.toString() method. It gets a little tricky if you want to customize the prefix, though.

          Show
          criccomini Chris Riccomini added a comment - Yea, I was just thinking of baking it directly into the Config.toString() method. It gets a little tricky if you want to customize the prefix, though.
          Hide
          twbecker Tommy Becker added a comment - - edited

          Redacted - patch attached

          Show
          twbecker Tommy Becker added a comment - - edited Redacted - patch attached
          Hide
          twbecker Tommy Becker added a comment -
          Show
          twbecker Tommy Becker added a comment - RB: https://reviews.apache.org/r/32102/
          Hide
          criccomini Chris Riccomini added a comment -

          This looks good. Few points:

          1. Could you update the {[./docs/learn/documentation/versioned/jobs/configuration.md}} file with docs on this?
          2. Could you attach the diff to this JIRA? The attachment is apparently what grants Apache license to your patch.
          3. We should be explicit in the docs that "sensitive" doesn't encrypt anything, it just prevents configs from being displayed in the UI.
          4. I wasn't able to run the tests yet because of SAMZA-602, so I haven't verified everything. If you run bin/check-all.sh, it should work.
          Show
          criccomini Chris Riccomini added a comment - This looks good. Few points: Could you update the {[./docs/learn/documentation/versioned/jobs/configuration.md}} file with docs on this? Could you attach the diff to this JIRA? The attachment is apparently what grants Apache license to your patch. We should be explicit in the docs that "sensitive" doesn't encrypt anything, it just prevents configs from being displayed in the UI. I wasn't able to run the tests yet because of SAMZA-602 , so I haven't verified everything. If you run bin/check-all.sh , it should work.
          Hide
          twbecker Tommy Becker added a comment - - edited

          Yeah, I was a bit surprised when "Submit Patch" just opened what looked like a comment form.... I'll update the docs and then attach a new patch. Regarding samza-602, I was able to fix it by adding the following to the samza-kafka section in build.gradle:

          sourceSets {
              test {
                  scala {
                      srcDirs += ['src/test/java']
                  }
              }
          }
          

          though I didn't include it since it wasn't relevant to this issue.

          Show
          twbecker Tommy Becker added a comment - - edited Yeah, I was a bit surprised when "Submit Patch" just opened what looked like a comment form.... I'll update the docs and then attach a new patch. Regarding samza-602, I was able to fix it by adding the following to the samza-kafka section in build.gradle: sourceSets { test { scala { srcDirs += ['src/test/java'] } } } though I didn't include it since it wasn't relevant to this issue.
          Hide
          twbecker Tommy Becker added a comment -

          Updated patch.

          Show
          twbecker Tommy Becker added a comment - Updated patch.
          Hide
          criccomini Chris Riccomini added a comment -

          +1 Merged and committed. This will get shipped as part of the 0.9.0 release, as a result.

          Show
          criccomini Chris Riccomini added a comment - +1 Merged and committed. This will get shipped as part of the 0.9.0 release, as a result.
          Hide
          twbecker Tommy Becker added a comment -

          Thanks!

          Show
          twbecker Tommy Becker added a comment - Thanks!

            People

            • Assignee:
              twbecker Tommy Becker
              Reporter:
              twbecker Tommy Becker
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development