Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2273

ElasticSearchSink: Add handling for header substitution in indexName

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.6.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      The ElasticSearchSink would be improved by allowing for header substitution in the indexName property.

      A use case is where the sink is an intermediate part of a chain and the index name is required to identify the message origin, at present it can only be a hardcoded value.

      The HDFS sink already supports header substitution so a similar format would maintain consistency.

      1. new_FLUME-2273-6.patch
        13 kB
        Satoshi Iijima
      2. new_FLUME-2273-5.patch
        13 kB
        Satoshi Iijima
      3. new_FLUME-2273-2.patch
        12 kB
        Satoshi Iijima
      4. new_FLUME-2273.patch
        5 kB
        Dipesh Patel
      5. FLUME-2273.patch
        6 kB
        Paul Merry

        Issue Links

          Activity

          Hide
          juhanic Juhani Connolly added a comment -

          Committed

          Show
          juhanic Juhani Connolly added a comment - Committed
          Hide
          otis Otis Gospodnetic added a comment -

          It looks like this patch was committed to trunk already, no? It's marked as open here in JIRA...

          Show
          otis Otis Gospodnetic added a comment - It looks like this patch was committed to trunk already, no? It's marked as open here in JIRA...
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #636 (See https://builds.apache.org/job/flume-trunk/636/)
          FLUME-2273 - Add handling for header substitution in ElasticSearchSink (juhani_connolly: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=09472ba12278a0d3696b9d2e26d6d1b0d361c830)

          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchRestClient.java
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/SimpleIndexNameBuilder.java
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchIndexRequestBuilderFactory.java
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchTransportClient.java
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilder.java
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSink.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchIndexRequestBuilderFactory.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in flume-trunk #636 (See https://builds.apache.org/job/flume-trunk/636/ ) FLUME-2273 - Add handling for header substitution in ElasticSearchSink (juhani_connolly: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=09472ba12278a0d3696b9d2e26d6d1b0d361c830 ) flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchRestClient.java flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/SimpleIndexNameBuilder.java flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/AbstractElasticSearchIndexRequestBuilderFactory.java flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/client/ElasticSearchTransportClient.java flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/TimeBasedIndexNameBuilder.java flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ElasticSearchSink.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchIndexRequestBuilderFactory.java
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4ed64553d9d2e4a447dde0e4e3f6f4237a49075d in flume's branch refs/heads/flume-1.6 from Juhani Connolly
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=4ed6455 ]

          FLUME-2273 - Add handling for header substitution in ElasticSearchSink

          Satoshi Iijima via Juhani Connolly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4ed64553d9d2e4a447dde0e4e3f6f4237a49075d in flume's branch refs/heads/flume-1.6 from Juhani Connolly [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=4ed6455 ] FLUME-2273 - Add handling for header substitution in ElasticSearchSink Satoshi Iijima via Juhani Connolly
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 09472ba12278a0d3696b9d2e26d6d1b0d361c830 in flume's branch refs/heads/trunk from Juhani Connolly
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=09472ba ]

          FLUME-2273 - Add handling for header substitution in ElasticSearchSink

          Satoshi Iijima via Juhani Connolly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 09472ba12278a0d3696b9d2e26d6d1b0d361c830 in flume's branch refs/heads/trunk from Juhani Connolly [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=09472ba ] FLUME-2273 - Add handling for header substitution in ElasticSearchSink Satoshi Iijima via Juhani Connolly
          Hide
          juhanic Juhani Connolly added a comment -

          +1 looks good to me. Committing after running unit tests

          Show
          juhanic Juhani Connolly added a comment - +1 looks good to me. Committing after running unit tests
          Hide
          iijima_satoshi Satoshi Iijima added a comment - - edited

          Thank you for your review, Edward! I have attached the final patch incorporating your modification for the doc.

          Show
          iijima_satoshi Satoshi Iijima added a comment - - edited Thank you for your review, Edward! I have attached the final patch incorporating your modification for the doc.
          Hide
          ejsarge Edward Sargisson added a comment -

          Hari or other commiters,
          If you take the final patch as submitted, make the small doc modification I've just published on the review board and commit all of that then we would be:

          +1

          Show
          ejsarge Edward Sargisson added a comment - Hari or other commiters, If you take the final patch as submitted, make the small doc modification I've just published on the review board and commit all of that then we would be: +1
          Hide
          iijima_satoshi Satoshi Iijima added a comment -

          Hi
          Deepak - your error is probably a issue of EventSerializer or elasticsearch.
          The indexType of the index has to have the same data structure in elasticsearch.
          http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_document_metadata.html
          Otherwise FLUME-2126 or a new unknown issue.

          Show
          iijima_satoshi Satoshi Iijima added a comment - Hi Deepak - your error is probably a issue of EventSerializer or elasticsearch. The indexType of the index has to have the same data structure in elasticsearch. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_document_metadata.html Otherwise FLUME-2126 or a new unknown issue.
          Hide
          deepakas1 Deepak Subhramanian added a comment -

          If I want to use the patch what are the options to pass in the ESSink configuration so that it take one of the header field and append in the indexname.I get JSON and xml data. Since ES automatically parse the JSON data as an object ,my xml messages are failing as both are created in the same index and ES validate new messages with the mapping created for JSON object. I like to have it in the same index. Also I recently noticed that LogstashSerializer is not generating timestamp fields when I tried to deploy the latest version of code. It creates two time stamp field with Flume1.4 version.

          Show
          deepakas1 Deepak Subhramanian added a comment - If I want to use the patch what are the options to pass in the ESSink configuration so that it take one of the header field and append in the indexname.I get JSON and xml data. Since ES automatically parse the JSON data as an object ,my xml messages are failing as both are created in the same index and ES validate new messages with the mapping created for JSON object. I like to have it in the same index. Also I recently noticed that LogstashSerializer is not generating timestamp fields when I tried to deploy the latest version of code. It creates two time stamp field with Flume1.4 version.
          Hide
          ejsarge Edward Sargisson added a comment -

          I think that the warning in the docs is sufficient.

          Whether or not this is an es defect or feature is up to them. I suspect they might consider it a feature and ask why somebody would accept user generated data into that header. If somebody does do that then they should add a filter.

          I think we're good.

          Show
          ejsarge Edward Sargisson added a comment - I think that the warning in the docs is sufficient. Whether or not this is an es defect or feature is up to them. I suspect they might consider it a feature and ask why somebody would accept user generated data into that header. If somebody does do that then they should add a filter. I think we're good.
          Hide
          otis Otis Gospodnetic added a comment -

          Sounds like it would be wise to ask on ES ML and/or open an issue.

          Show
          otis Otis Gospodnetic added a comment - Sounds like it would be wise to ask on ES ML and/or open an issue.
          Hide
          dipthegeezer Dipesh Patel added a comment -

          Hi

          I concur with Satoshi. We are heavy users of flume and the elasticsearch sink and we find that this is an issue with elasticsearch and not really flume. Indeed if you did not use the sink and posted directly to elasticsearch you would have the same issue.

          Dip

          Show
          dipthegeezer Dipesh Patel added a comment - Hi I concur with Satoshi. We are heavy users of flume and the elasticsearch sink and we find that this is an issue with elasticsearch and not really flume. Indeed if you did not use the sink and posted directly to elasticsearch you would have the same issue. Dip
          Hide
          iijima_satoshi Satoshi Iijima added a comment -

          I tried above scenario. There is no error and warn in both elasticsearch and flume and I could get the result by search API not specified indexType.
          Using search API specified indexType I got IndexMissingException with 404.

          If relative address and '/' of indexName or indexType have the security issue, this is the problem of elasticsearch not flume.
          All - What do you think about it?

          Show
          iijima_satoshi Satoshi Iijima added a comment - I tried above scenario. There is no error and warn in both elasticsearch and flume and I could get the result by search API not specified indexType. Using search API specified indexType I got IndexMissingException with 404. If relative address and '/' of indexName or indexType have the security issue, this is the problem of elasticsearch not flume. All - What do you think about it?
          Hide
          ejsarge Edward Sargisson added a comment -

          The scenario in my head is:

          • header substitution for indexType, indexName is set in the config
          • REST client in use.

          A malicious attacker then crafts an event where the indexType header is '../../protectedIndex/protectedType'. They can then write into an index they're not supposed to.

          I don't know if ES does relative addressing in its REST server; that's why I'm raising the question.

          Show
          ejsarge Edward Sargisson added a comment - The scenario in my head is: header substitution for indexType, indexName is set in the config REST client in use. A malicious attacker then crafts an event where the indexType header is '../../protectedIndex/protectedType'. They can then write into an index they're not supposed to. I don't know if ES does relative addressing in its REST server; that's why I'm raising the question.
          Hide
          dipthegeezer Dipesh Patel added a comment -

          Hi all

          Ditto for me too I've been reading this chain and I don't really understand what the security implications are. It would be good if you could explain it.

          Thanks

          Dip

          Show
          dipthegeezer Dipesh Patel added a comment - Hi all Ditto for me too I've been reading this chain and I don't really understand what the security implications are. It would be good if you could explain it. Thanks Dip
          Hide
          PMerry Paul Merry added a comment -

          Can you elaborate on a possible attack example and how any activity in Flume is causing an issue over that of Elasticsearch itself?

          I don't understand how this exposes anything other than potential underlying Elasticsearch security concerns.

          There is validation on the index names themselves, we have seen attempts to create indexes with Uppercase characters being rejected for example.

          Show
          PMerry Paul Merry added a comment - Can you elaborate on a possible attack example and how any activity in Flume is causing an issue over that of Elasticsearch itself? I don't understand how this exposes anything other than potential underlying Elasticsearch security concerns. There is validation on the index names themselves, we have seen attempts to create indexes with Uppercase characters being rejected for example.
          Hide
          iijima_satoshi Satoshi Iijima added a comment -

          The scope of problem depends on the use case.
          i.e it may be enough to customize EventSerializer to filter or convert the event that has something problematic.
          Edward, what case do you think should be supported in the Apache Flume community?

          Show
          iijima_satoshi Satoshi Iijima added a comment - The scope of problem depends on the use case. i.e it may be enough to customize EventSerializer to filter or convert the event that has something problematic. Edward, what case do you think should be supported in the Apache Flume community?
          Hide
          ejsarge Edward Sargisson added a comment -

          With the doc change I just added to the review board I am not +1 for this.

          Show
          ejsarge Edward Sargisson added a comment - With the doc change I just added to the review board I am not +1 for this.
          Hide
          iijima_satoshi Satoshi Iijima added a comment - - edited

          Updated patch.
          I added the note of possibility that header substituion can be used for as an attack in doc.
          I think this risk should be solved in other ticket if necessary.

          Show
          iijima_satoshi Satoshi Iijima added a comment - - edited Updated patch. I added the note of possibility that header substituion can be used for as an attack in doc. I think this risk should be solved in other ticket if necessary.
          Hide
          ejsarge Edward Sargisson added a comment -

          Once we've talked through a possible security issue, then yes.

          That issue is my wondering if the header substitution would open an attack vector to elasticsearch. I could be very easily convinced that it's not but I think we should discuss it.

          Show
          ejsarge Edward Sargisson added a comment - Once we've talked through a possible security issue, then yes. That issue is my wondering if the header substitution would open an attack vector to elasticsearch. I could be very easily convinced that it's not but I think we should discuss it.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Is that a +1?

          Show
          hshreedharan Hari Shreedharan added a comment - Is that a +1?
          Hide
          ejsarge Edward Sargisson added a comment -

          Reviewed.

          Show
          ejsarge Edward Sargisson added a comment - Reviewed.
          Hide
          iijima_satoshi Satoshi Iijima added a comment -

          Hi,
          Edward, thanks for commenting. I have already put this patch into Review Board last week. Please review it.

          Show
          iijima_satoshi Satoshi Iijima added a comment - Hi, Edward, thanks for commenting. I have already put this patch into Review Board last week. Please review it.
          Hide
          ejsarge Edward Sargisson added a comment -

          Hi,
          Thanks for contributing the patch and my apologies for taking so very long to look at this.

          Our process is to put contributions into the Apache Review Board for review and comment. Perhaps you could do this with this patch so that we could take a look at it?

          Many thanks!

          Show
          ejsarge Edward Sargisson added a comment - Hi, Thanks for contributing the patch and my apologies for taking so very long to look at this. Our process is to put contributions into the Apache Review Board for review and comment. Perhaps you could do this with this patch so that we could take a look at it? Many thanks!
          Hide
          iijima_satoshi Satoshi Iijima added a comment -

          new-FLUME-2273.patch has conflict with FLUME-2225 that already have been committed.
          I attached a new patch with testcase and doc. In this patch, header substituion of not only indexName but also indexType is supported.

          Show
          iijima_satoshi Satoshi Iijima added a comment - new- FLUME-2273 .patch has conflict with FLUME-2225 that already have been committed. I attached a new patch with testcase and doc. In this patch, header substituion of not only indexName but also indexType is supported.
          Hide
          dipthegeezer Dipesh Patel added a comment - - edited

          Hi

          Is there any news on this patch either way? Good or Bad? We'd like to get it into the code base if possible.

          Thanks

          Dip

          Show
          dipthegeezer Dipesh Patel added a comment - - edited Hi Is there any news on this patch either way? Good or Bad? We'd like to get it into the code base if possible. Thanks Dip
          Hide
          dipthegeezer Dipesh Patel added a comment -

          Hi

          There was a few problems with this patch in that the object kept hold of the index name and this could vary with each event. Please find a newer patch.

          Dip

          Show
          dipthegeezer Dipesh Patel added a comment - Hi There was a few problems with this patch in that the object kept hold of the index name and this could vary with each event. Please find a newer patch. Dip
          Hide
          PMerry Paul Merry added a comment -

          Patch attached, includes tests and doc update. This has been tested and proven working on a running 1.4.0 system.

          Show
          PMerry Paul Merry added a comment - Patch attached, includes tests and doc update. This has been tested and proven working on a running 1.4.0 system.

            People

            • Assignee:
              Unassigned
              Reporter:
              PMerry Paul Merry
            • Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development