Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2954

make raw data appearing in log messages explicit

    Details

      Description

      Flume has built in functionality to log out data flowing through
      mainly for debugging purposes. This functionality appears in several
      places of the codebase. I think such functionality rise security
      concerns in production environments where sensitive information might
      be ingested so it is crucial that enabling such functionality has to
      be as explicit as possible (avoid implicit side effect setup).
      Eg: setting the level of root logger to debug/trace cause that every
      other logger will start logging at debug/trace including the ones
      logging raw data.

      In this jira I would like to provide a patch capturing how I imagined solving this issue. It can be refined iteratively or used as a basis for a broader discussion.

        Issue Links

          Activity

          Hide
          mpercy Mike Percy added a comment -

          I agree that simply enabling debug or trace logging in Flume should never log actual data, unless that has been explicitly enabled. This makes debugging in secure environments practically impossible.

          Thanks for looking at this issue.

          Show
          mpercy Mike Percy added a comment - I agree that simply enabling debug or trace logging in Flume should never log actual data, unless that has been explicitly enabled. This makes debugging in secure environments practically impossible. Thanks for looking at this issue.
          Hide
          sati Attila Simon added a comment -

          After checking each of the log statements I collected the suspicious ones then started triaging them. Thanks Denes Arvay for helping in the latter. For getting the logs I used grep -i logger on the whole project then whenever the private variable name was not logger I checked that variable individually within those classes (declaration type is Logger). My recommendation is the following (sorry for being this long). I would like to provide a patch soon which might help the discussion further. I think that these changes won't loose focus (which is making logging of sensitive data available but turning on requires clear and explicit changes) but again I hope it will be an open discussion so every comment is welcomed.

          The issues can be grouped into these categories:

          1. deliberately print out whole configuration at startup as part of the validation
            • we can decide whether to drop this completely or log this completely controlled by a command line option or environment variable or jvm argument
            • or we use some heuristics to find and filter out the private information like passwords, and keys.
          2. redundantly print property configuration
            • in LifeCycleAware components it is not needed at all since validation already has it.
          3. log data on error (safe data)
            • since error is something not expected as part of a production workflow we should leave these there. (It is partially or as a whole broken so it should be considered as kind of garbage anyway)
          4. log data in dedicated components LoggerSink
            • keep it there
          5. log data in non dedicated Source (fail data):
            • since Sources are responsible converting InputStream to Events it is needed to have a print option. For this I would introduce a new property named consistently to Sources to log out the raw ByteInputStream. Also trace log the fact of Event creation (no data). And remove everything else sensitive data related.
          6. log data in Interceptors, Processors, Handlers
            • remove these statements
          7. log data in non dedicated Channels (fail data):
            • channels don't change data so identical to Sinks
          8. log data in non dedicated Sinks (fail data):
            • remove existing log statements, one can specify an additional MemoryChannel ending in a LoggerSink for debugging purposes
          9. log potentially private info as part of a URL or URI
            • provide a safe toString for URL and URI
          10. AsyncHBaseSink#641
            • further investigation is needed

          Essentially LoggerSink would remain to log customer data (so specifying it would be explicit). Besides this there would be a configuration option (default to false) on Sources (only for those which currently log raw data) to log out raw byte stream in a separately named logger on trace. Other components would not log raw data they may log that an event was passed through only. I would also update the documentation to make it clear that if one would like to see what goes through then she should use LoggerSink. Configuration should be logged at validation time during startup.

          --------------------------------------------------------------------------------
          flume-ng-auth                                 ---
            KerberosAuthenticator#167                   <- safe
          --------------------------------------------------------------------------------
          flume-ng-channel                              ---
            flume-file-channel                          ---
              JCEFileKeyProvider#111                    <- safe
              Log#335                                   <- safe
              FileChannel#276 #324                      <- safe
            flume-jdbc-channel                          ---
              JdbcChannelProviderImpl#98                <- fail properties
              JdbcChannelProviderImpl#261 #431          <- fail properties: jdbc url might include password
              DerbySchemaHandler#584 #770               <- safe
            flume-kafka-channel                         ---
              KafkaChannel#230 #253                     <- fail properties
              KafkaChannel#367 #383                     <- safe
              KafkaChannel#578                          <- safe
            flume-spillable-memory-channel              ---
              SpillableMemoryChannel#420 #425           <- safe
          --------------------------------------------------------------------------------
          flume-ng-clients                              ---
          --------------------------------------------------------------------------------
          flume-ng-configuration                        ---
            FlumeConfiguration#315 #372                 <- fail properties
            FlumeConfiguration#671                      <- safe
            FlumeConfiguration#927                      <- safe
          --------------------------------------------------------------------------------
          flume-ng-core                                 ---
            SyslogAvroEventSerializer#150               <- fail data: SyslogEvent.message gets logged
            SyslogAvroEventSerializer#171 #179          <- safe data: error logs only if date is malformed
            GangliaServer#224 #245                      <- fail data: although this might be only flume internal data
            LifecycleController#56                      <- safe
            LifecycleSupervisor#212 219 228 231 241 251 258 282 296 188 135 163 169 <- safe
            RegexExtractorInterceptor#144               <- safe
            AbstractRpcSink#287                         <- safe
            FailoverSinkProcessor#149                   <- safe
            LoadBalancingSinkProcessor#131              <- safe
            LoggerSink#95                               <- fail data: on purpose
            AvroSource#347                              <- fail data: log whole message
            ExecSource#457                              <- safe data: if execution has stderr then it will be error logged
            MultiportSyslogTCPSource#360                <- fail data: log whole message
            MultiportSyslogTCPSource#253 #264 #269      <- safe
            PollableSourceRunner#127                    <- safe
            ChannelProcessor#196 #226 #271 #298         <- safe
            BLOBHandler#70                              <- fail data: logs http request headers
          --------------------------------------------------------------------------------
          flume-ng-dist                                 ---
          --------------------------------------------------------------------------------
          flume-ng-embedded-agent                       ---
            EmbeddedAgent#155                           <- fail properties: printing all config
            EmbeddedAgent#249                           <- safe
          --------------------------------------------------------------------------------
          flume-ng-legacy-sources                       ---
          --------------------------------------------------------------------------------
          flume-ng-node                                     ---
            Application.java#100                            <- safe
            Application.java #107 #117 #127 #148 #175 #186  <- safe
            AbstractConfigurationProvider #116              <- safe
          --------------------------------------------------------------------------------
          flume-ng-sdk                                  ---
            LoadBalancingRpcClient#203                  <- safe
            FailoverRpcClient#268 #280                  <- safe
          --------------------------------------------------------------------------------
          flume-ng-sinks                                ---
            flume-dataset-sink                          ---
              DatasetSink#483 (URI)                     <- safe, Kite URIs don’t contain sensitive information 
            flume-hdfs-sink                             ---
              HDFSEventSink#163 #165                    <- safe
            flume-hive-sink                             ---
              HiveEndPoint has an URI field.            <- fail properties
                  Unfortunately it can contain private data 
                  (URI string may contain password) as it is 
                  excessively logged within this module. 
                  Appears in HiveSink#298 #342 #400 #403 #428, 
                  HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...) 
                  HiveEndPoint is also attached to exception logs as well
              HiveWriter#160                            <- safe data: log whole on parse error         
            flume-irc-sink                              ---
              IRCSink#73 #77                            <- safe data: log whole on error
            flume-ng-elasticsearch-sink                 ---
              ElasticSearchRestClient#136               <- safe data: only status response
            flume-ng-hbase-sink                         ---
              AsyncHBaseSink#641                        ?? async callback chain, exception gets logged. further investigation is needed
            flume-ng-kafka-sink                         ---
              KafkaSink#179                             <- fail data: log whole message
              KafkaSink#304                             <- fail properties
            flume-ng-morphline-solr-sink                ---
              MorphlineHandlerImpl#132                  <- safe data: log whole on process error
              BlobHandler#98 #113                       <- fail data: log http request headers
              MorphlineSink#88                          <- safe
              MorphlineSink#139                         <- fail data: logs event
          --------------------------------------------------------------------------------
          flume-ng-sources                              ---
            flume-jms-source                            ---
              JMSMessageConsumer#114                    <- safe
            flume-kafka-source                          ---
              KafkaSource#247                           <- fail data: log whole
              KafkaSource#392 #416                      <- safe
            flume-scribe-source                         ---
            flume-taildir-source                        ---
            flume-twitter-source                        ---
              TwitterSource#132                         <- safe
              TwitterSource#110-113                     <- fail properties
          --------------------------------------------------------------------------------
          flume-ng-tests                                ---
          --------------------------------------------------------------------------------
          
          Show
          sati Attila Simon added a comment - After checking each of the log statements I collected the suspicious ones then started triaging them. Thanks Denes Arvay for helping in the latter. For getting the logs I used grep -i logger on the whole project then whenever the private variable name was not logger I checked that variable individually within those classes (declaration type is Logger). My recommendation is the following (sorry for being this long). I would like to provide a patch soon which might help the discussion further. I think that these changes won't loose focus (which is making logging of sensitive data available but turning on requires clear and explicit changes) but again I hope it will be an open discussion so every comment is welcomed. The issues can be grouped into these categories: deliberately print out whole configuration at startup as part of the validation we can decide whether to drop this completely or log this completely controlled by a command line option or environment variable or jvm argument or we use some heuristics to find and filter out the private information like passwords, and keys. redundantly print property configuration in LifeCycleAware components it is not needed at all since validation already has it. log data on error (safe data) since error is something not expected as part of a production workflow we should leave these there. (It is partially or as a whole broken so it should be considered as kind of garbage anyway) log data in dedicated components LoggerSink keep it there log data in non dedicated Source (fail data): since Sources are responsible converting InputStream to Events it is needed to have a print option. For this I would introduce a new property named consistently to Sources to log out the raw ByteInputStream. Also trace log the fact of Event creation (no data). And remove everything else sensitive data related. log data in Interceptors, Processors, Handlers remove these statements log data in non dedicated Channels (fail data): channels don't change data so identical to Sinks log data in non dedicated Sinks (fail data): remove existing log statements, one can specify an additional MemoryChannel ending in a LoggerSink for debugging purposes log potentially private info as part of a URL or URI provide a safe toString for URL and URI AsyncHBaseSink#641 further investigation is needed Essentially LoggerSink would remain to log customer data (so specifying it would be explicit). Besides this there would be a configuration option (default to false) on Sources (only for those which currently log raw data) to log out raw byte stream in a separately named logger on trace. Other components would not log raw data they may log that an event was passed through only. I would also update the documentation to make it clear that if one would like to see what goes through then she should use LoggerSink. Configuration should be logged at validation time during startup. -------------------------------------------------------------------------------- flume-ng-auth --- KerberosAuthenticator#167 <- safe -------------------------------------------------------------------------------- flume-ng-channel --- flume-file-channel --- JCEFileKeyProvider#111 <- safe Log#335 <- safe FileChannel#276 #324 <- safe flume-jdbc-channel --- JdbcChannelProviderImpl#98 <- fail properties JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url might include password DerbySchemaHandler#584 #770 <- safe flume-kafka-channel --- KafkaChannel#230 #253 <- fail properties KafkaChannel#367 #383 <- safe KafkaChannel#578 <- safe flume-spillable-memory-channel --- SpillableMemoryChannel#420 #425 <- safe -------------------------------------------------------------------------------- flume-ng-clients --- -------------------------------------------------------------------------------- flume-ng-configuration --- FlumeConfiguration#315 #372 <- fail properties FlumeConfiguration#671 <- safe FlumeConfiguration#927 <- safe -------------------------------------------------------------------------------- flume-ng-core --- SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message gets logged SyslogAvroEventSerializer#171 #179 <- safe data: error logs only if date is malformed GangliaServer#224 #245 <- fail data: although this might be only flume internal data LifecycleController#56 <- safe LifecycleSupervisor#212 219 228 231 241 251 258 282 296 188 135 163 169 <- safe RegexExtractorInterceptor#144 <- safe AbstractRpcSink#287 <- safe FailoverSinkProcessor#149 <- safe LoadBalancingSinkProcessor#131 <- safe LoggerSink#95 <- fail data: on purpose AvroSource#347 <- fail data: log whole message ExecSource#457 <- safe data: if execution has stderr then it will be error logged MultiportSyslogTCPSource#360 <- fail data: log whole message MultiportSyslogTCPSource#253 #264 #269 <- safe PollableSourceRunner#127 <- safe ChannelProcessor#196 #226 #271 #298 <- safe BLOBHandler#70 <- fail data: logs http request headers -------------------------------------------------------------------------------- flume-ng-dist --- -------------------------------------------------------------------------------- flume-ng-embedded-agent --- EmbeddedAgent#155 <- fail properties: printing all config EmbeddedAgent#249 <- safe -------------------------------------------------------------------------------- flume-ng-legacy-sources --- -------------------------------------------------------------------------------- flume-ng-node --- Application.java#100 <- safe Application.java #107 #117 #127 #148 #175 #186 <- safe AbstractConfigurationProvider #116 <- safe -------------------------------------------------------------------------------- flume-ng-sdk --- LoadBalancingRpcClient#203 <- safe FailoverRpcClient#268 #280 <- safe -------------------------------------------------------------------------------- flume-ng-sinks --- flume-dataset-sink --- DatasetSink#483 (URI) <- safe, Kite URIs don’t contain sensitive information flume-hdfs-sink --- HDFSEventSink#163 #165 <- safe flume-hive-sink --- HiveEndPoint has an URI field. <- fail properties Unfortunately it can contain private data (URI string may contain password) as it is excessively logged within this module. Appears in HiveSink#298 #342 #400 #403 #428, HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...) HiveEndPoint is also attached to exception logs as well HiveWriter#160 <- safe data: log whole on parse error flume-irc-sink --- IRCSink#73 #77 <- safe data: log whole on error flume-ng-elasticsearch-sink --- ElasticSearchRestClient#136 <- safe data: only status response flume-ng-hbase-sink --- AsyncHBaseSink#641 ?? async callback chain, exception gets logged. further investigation is needed flume-ng-kafka-sink --- KafkaSink#179 <- fail data: log whole message KafkaSink#304 <- fail properties flume-ng-morphline-solr-sink --- MorphlineHandlerImpl#132 <- safe data: log whole on process error BlobHandler#98 #113 <- fail data: log http request headers MorphlineSink#88 <- safe MorphlineSink#139 <- fail data: logs event -------------------------------------------------------------------------------- flume-ng-sources --- flume-jms-source --- JMSMessageConsumer#114 <- safe flume-kafka-source --- KafkaSource#247 <- fail data: log whole KafkaSource#392 #416 <- safe flume-scribe-source --- flume-taildir-source --- flume-twitter-source --- TwitterSource#132 <- safe TwitterSource#110-113 <- fail properties -------------------------------------------------------------------------------- flume-ng-tests --- --------------------------------------------------------------------------------
          Hide
          mpercy Mike Percy added a comment -

          Hi Attila Simon, I skimmed over this report and your analysis looks quite thorough. I have the following comments:

          • On hiding Flume configuration properties: I think it makes sense to just disable printing those unless a global java -D property is set via the command line.
          • On source logging: I agree that adding a Flume configuration property to each source that might log data seems reasonable. In addition, logging that data at TRACE level seems reasonable.
          • On logging malformed data: I agree that logging "bad" data should be OK, especially if it blocks processing, since we need some way to communicate to administrators that the feed is messed up. This kind of safe data logging is necessary.

          Thanks for putting this together!

          Mike

          Show
          mpercy Mike Percy added a comment - Hi Attila Simon , I skimmed over this report and your analysis looks quite thorough. I have the following comments: On hiding Flume configuration properties: I think it makes sense to just disable printing those unless a global java -D property is set via the command line. On source logging: I agree that adding a Flume configuration property to each source that might log data seems reasonable. In addition, logging that data at TRACE level seems reasonable. On logging malformed data: I agree that logging "bad" data should be OK, especially if it blocks processing, since we need some way to communicate to administrators that the feed is messed up. This kind of safe data logging is necessary. Thanks for putting this together! Mike
          Hide
          sati Attila Simon added a comment -

          early version

          Show
          sati Attila Simon added a comment - early version
          Hide
          sati Attila Simon added a comment -

          corrected style issues, added docs, compiles, site builds, all unit test passes, distribution target handles the system properties as expected

          Show
          sati Attila Simon added a comment - corrected style issues, added docs, compiles, site builds, all unit test passes, distribution target handles the system properties as expected
          Hide
          sati Attila Simon added a comment -

          remove garbage files from diff

          Show
          sati Attila Simon added a comment - remove garbage files from diff
          Hide
          sati Attila Simon added a comment - - edited

          Changes made in the spirit of the discussed:

          --------------------------------------------------------------------------------
          flume-ng-channel                              ---
            flume-jdbc-channel                          ---
              JdbcChannelProviderImpl#98                <- fail properties <REMOVED>
              JdbcChannelProviderImpl#261 #431          <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA>
            flume-kafka-channel                         ---
              KafkaChannel#230 #253                     <- fail properties <REMOVED>
          --------------------------------------------------------------------------------
          flume-ng-configuration                        ---
            FlumeConfiguration#315 #372                 <- fail properties <DRIVE BY PROPERTY>
          --------------------------------------------------------------------------------
          flume-ng-core                                 ---
            SyslogAvroEventSerializer#150               <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY>
            GangliaServer#224 #245                      <- safe data: only flume component metrics data <KEPT>
            LoggerSink#95                               <- fail data: on purpose <KEPT>
            AvroSource#347                              <- fail data: log whole message <DRIVE BY PROPERTY>
            MultiportSyslogTCPSource#360                <- fail data: log whole message <DRIVE BY PROPERTY>
            BLOBHandler#70                              <- fail data: logs http request headers <DRIVE BY PROPERTY>
          -------------------------------------------------------------------q-------------
          flume-ng-embedded-agent                       ---
            EmbeddedAgent#155                           <- fail properties: printing all config <DRIVE BY PROPERTY>
          --------------------------------------------------------------------------------
          flume-ng-sinks                                ---
            flume-hive-sink                             ---
              HiveEndPoint has an URI field.            <- fail properties <KEPT><FOLLOWUP IN JIRA>
                  It may contain private data
                  (URI string may contain password) as it is
                  excessively logged within this module.
                  Appears in HiveSink#298 #342 #400 #403 #428,
                  HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
                  HiveEndPoint is also attached to exception logs as well
            flume-ng-hbase-sink                         ---
              AsyncHBaseSink#641                        <- safe data: error details gets logged in case of failure <KEPT>
            flume-ng-kafka-sink                         ---
              KafkaSink#179                             <- fail data: log whole message <REMOVED>
              KafkaSink#304                             <- fail properties <REMOVED>
            flume-ng-morphline-solr-sink                ---
              BlobHandler#98 #113                       <- fail data: log http request headers <DRIVE BY PROPERTY>
              MorphlineSink#139                         <- fail data: logs event <DRIVE BY PROPERTY>
          --------------------------------------------------------------------------------
          flume-ng-sources                              ---
            flume-kafka-source                          ---
              KafkaSource#247                           <- fail data: log whole <DRIVE BY PROPERTY>
            flume-twitter-source                        ---
              TwitterSource#110-113                     <- fail properties <REMOVED>
          --------------------------------------------------------------------------------
          
          Show
          sati Attila Simon added a comment - - edited Changes made in the spirit of the discussed: -------------------------------------------------------------------------------- flume-ng-channel --- flume-jdbc-channel --- JdbcChannelProviderImpl#98 <- fail properties <REMOVED> JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA> flume-kafka-channel --- KafkaChannel#230 #253 <- fail properties <REMOVED> -------------------------------------------------------------------------------- flume-ng-configuration --- FlumeConfiguration#315 #372 <- fail properties <DRIVE BY PROPERTY> -------------------------------------------------------------------------------- flume-ng-core --- SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY> GangliaServer#224 #245 <- safe data: only flume component metrics data <KEPT> LoggerSink#95 <- fail data: on purpose <KEPT> AvroSource#347 <- fail data: log whole message <DRIVE BY PROPERTY> MultiportSyslogTCPSource#360 <- fail data: log whole message <DRIVE BY PROPERTY> BLOBHandler#70 <- fail data: logs http request headers <DRIVE BY PROPERTY> -------------------------------------------------------------------q------------- flume-ng-embedded-agent --- EmbeddedAgent#155 <- fail properties: printing all config <DRIVE BY PROPERTY> -------------------------------------------------------------------------------- flume-ng-sinks --- flume-hive-sink --- HiveEndPoint has an URI field. <- fail properties <KEPT><FOLLOWUP IN JIRA> It may contain private data (URI string may contain password) as it is excessively logged within this module. Appears in HiveSink#298 #342 #400 #403 #428, HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...) HiveEndPoint is also attached to exception logs as well flume-ng-hbase-sink --- AsyncHBaseSink#641 <- safe data: error details gets logged in case of failure <KEPT> flume-ng-kafka-sink --- KafkaSink#179 <- fail data: log whole message <REMOVED> KafkaSink#304 <- fail properties <REMOVED> flume-ng-morphline-solr-sink --- BlobHandler#98 #113 <- fail data: log http request headers <DRIVE BY PROPERTY> MorphlineSink#139 <- fail data: logs event <DRIVE BY PROPERTY> -------------------------------------------------------------------------------- flume-ng-sources --- flume-kafka-source --- KafkaSource#247 <- fail data: log whole <DRIVE BY PROPERTY> flume-twitter-source --- TwitterSource#110-113 <- fail properties <REMOVED> --------------------------------------------------------------------------------
          Hide
          sati Attila Simon added a comment -

          Follow up items:
          JdbcChannelProviderImpl#261 #431 jdbc url might include password. Since JDBC Channel allows to specify username and password as a separate configuration parameter I guess it is considered safe to remain there.

          In flume-hive-sink module it is common to log out the HiveEndPoint instances (provided by an external lib). The problem with that it contains a URI string to the Hive meta store. I'm not an expert of Hive and after some search I couldn't find whether security aware configuration can be achieved by not putting secrets into that URI. If yes than it can remain there. If not then I would tackle this in a separate jira.

          Show
          sati Attila Simon added a comment - Follow up items: JdbcChannelProviderImpl#261 #431 jdbc url might include password. Since JDBC Channel allows to specify username and password as a separate configuration parameter I guess it is considered safe to remain there. In flume-hive-sink module it is common to log out the HiveEndPoint instances (provided by an external lib). The problem with that it contains a URI string to the Hive meta store. I'm not an expert of Hive and after some search I couldn't find whether security aware configuration can be achieved by not putting secrets into that URI. If yes than it can remain there. If not then I would tackle this in a separate jira.
          Hide
          mpercy Mike Percy added a comment -

          +1. I am about to commit the latest version from ReviewBoard modulo a comment I left there on RB.

          Show
          mpercy Mike Percy added a comment - +1. I am about to commit the latest version from ReviewBoard modulo a comment I left there on RB.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3ad7d276462cf9a620888ca8dbc8541f0f02bbc1 in flume's branch refs/heads/trunk from Attila Simon
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=3ad7d27 ]

          FLUME-2954. Make raw data appearing in log messages explicit

          Flume has built-in functionality to log data flowing through, mainly for
          debugging purposes. This functionality appears in several places in the
          code base. Such functionality can raise security concerns in production
          environments where sensitive information might be ingested so it is
          crucial that enabling such functionality be as explicit as possible.

          This patch adds two system properties, one to enable logging of Flume
          configuration properties and one to enable logging of raw data. If they
          are not set, these items are never logged at any log4j logging level.

          (Attila Simon via Mike Percy)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3ad7d276462cf9a620888ca8dbc8541f0f02bbc1 in flume's branch refs/heads/trunk from Attila Simon [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=3ad7d27 ] FLUME-2954 . Make raw data appearing in log messages explicit Flume has built-in functionality to log data flowing through, mainly for debugging purposes. This functionality appears in several places in the code base. Such functionality can raise security concerns in production environments where sensitive information might be ingested so it is crucial that enabling such functionality be as explicit as possible. This patch adds two system properties, one to enable logging of Flume configuration properties and one to enable logging of raw data. If they are not set, these items are never logged at any log4j logging level. (Attila Simon via Mike Percy)
          Hide
          mpercy Mike Percy added a comment -

          Pushed to trunk. Thanks for the patch, Attila!

          Show
          mpercy Mike Percy added a comment - Pushed to trunk. Thanks for the patch, Attila!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 25e4bc6d80cf475862a1686fb2c3c97fcea27278 in flume's branch refs/heads/trunk from Attila Simon
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=25e4bc6 ]

          FLUME-2954. Make raw data appearing in log messages explicit

          Flume has built-in functionality to log data flowing through, mainly for
          debugging purposes. This functionality appears in several places in the
          code base. Such functionality can raise security concerns in production
          environments where sensitive information might be ingested so it is
          crucial that enabling such functionality be as explicit as possible.

          This patch adds two system properties, one to enable logging of Flume
          configuration properties and one to enable logging of raw data. If they
          are not set, these items are never logged at any log4j logging level.

          Reviewers: Balázs Donát Bessenyei, Denes Arvay, Mike Percy

          (Attila Simon via Mike Percy)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 25e4bc6d80cf475862a1686fb2c3c97fcea27278 in flume's branch refs/heads/trunk from Attila Simon [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=25e4bc6 ] FLUME-2954 . Make raw data appearing in log messages explicit Flume has built-in functionality to log data flowing through, mainly for debugging purposes. This functionality appears in several places in the code base. Such functionality can raise security concerns in production environments where sensitive information might be ingested so it is crucial that enabling such functionality be as explicit as possible. This patch adds two system properties, one to enable logging of Flume configuration properties and one to enable logging of raw data. If they are not set, these items are never logged at any log4j logging level. Reviewers: Balázs Donát Bessenyei, Denes Arvay, Mike Percy (Attila Simon via Mike Percy)
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build Flume-trunk-hbase-1 #200 (See https://builds.apache.org/job/Flume-trunk-hbase-1/200/)
          FLUME-2954. Make raw data appearing in log messages explicit (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=25e4bc6d80cf475862a1686fb2c3c97fcea27278)

          • (edit) flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
          • (edit) flume-ng-doc/sphinx/FlumeUserGuide.rst
          • (add) flume-ng-configuration/src/main/java/org/apache/flume/conf/LogPrivacyUtil.java
          • (edit) flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
          • (edit) flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
          • (edit) flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java
          • (edit) conf/flume-env.ps1.template
          • (edit) flume-ng-channels/flume-jdbc-channel/src/main/java/org/apache/flume/channel/jdbc/impl/JdbcChannelProviderImpl.java
          • (edit) flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/MorphlineSink.java
          • (edit) flume-ng-core/src/main/java/org/apache/flume/source/MultiportSyslogTCPSource.java
          • (edit) flume-ng-sources/flume-twitter-source/src/main/java/org/apache/flume/source/twitter/TwitterSource.java
          • (edit) flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java
          • (edit) flume-ng-core/src/main/java/org/apache/flume/source/http/BLOBHandler.java
          • (edit) conf/flume-env.sh.template
          • (edit) flume-ng-embedded-agent/src/main/java/org/apache/flume/agent/embedded/EmbeddedAgent.java
          • (edit) flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java
          • (edit) flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build Flume-trunk-hbase-1 #200 (See https://builds.apache.org/job/Flume-trunk-hbase-1/200/ ) FLUME-2954 . Make raw data appearing in log messages explicit (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=25e4bc6d80cf475862a1686fb2c3c97fcea27278 ) (edit) flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java (edit) flume-ng-doc/sphinx/FlumeUserGuide.rst (add) flume-ng-configuration/src/main/java/org/apache/flume/conf/LogPrivacyUtil.java (edit) flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java (edit) flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java (edit) flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java (edit) conf/flume-env.ps1.template (edit) flume-ng-channels/flume-jdbc-channel/src/main/java/org/apache/flume/channel/jdbc/impl/JdbcChannelProviderImpl.java (edit) flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/MorphlineSink.java (edit) flume-ng-core/src/main/java/org/apache/flume/source/MultiportSyslogTCPSource.java (edit) flume-ng-sources/flume-twitter-source/src/main/java/org/apache/flume/source/twitter/TwitterSource.java (edit) flume-ng-configuration/src/main/java/org/apache/flume/conf/FlumeConfiguration.java (edit) flume-ng-core/src/main/java/org/apache/flume/source/http/BLOBHandler.java (edit) conf/flume-env.sh.template (edit) flume-ng-embedded-agent/src/main/java/org/apache/flume/agent/embedded/EmbeddedAgent.java (edit) flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java (edit) flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java

            People

            • Assignee:
              sati Attila Simon
              Reporter:
              sati Attila Simon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development