Flume
  1. Flume
  2. FLUME-1782

Elastic Search sink does not use UTC to determine the correct index to write to.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      The GUI for logs in ElasticSearch, Kibana, uses the utc date to determine which index to read for a search. The Flume ElasticSearch sink is using the local timezone to determine which index to write to. This means that events are being placed in the incorrect index and Kibana doesn't always find them.

      1. flume-1782.patch
        7 kB
        Edward Sargisson
      2. flume-1782-2.patch
        8 kB
        Edward Sargisson
      3. flume-1782-3.patch
        12 kB
        Edward Sargisson
      4. FLUME-1782-1.4.0.patch
        10 kB
        Edward Sargisson
      5. FLUME-1782-1.4.0-1.patch
        9 kB
        Edward Sargisson
      6. FLUME-1782-1.4.0-2.patch
        11 kB
        Edward Sargisson
      7. FLUME-1782-1.4.0-3.patch
        19 kB
        Edward Sargisson
      8. FLUME-1782-1.4.0-4.patch
        34 kB
        Edward Sargisson
      9. FLUME-1782-FLUME-1972-1.4.0-1.patch
        54 kB
        Edward Sargisson

        Issue Links

          Activity

          Mike Percy made changes -
          Fix Version/s v1.4.0 [ 12323372 ]
          Edward Sargisson made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Edward Sargisson added a comment -

          Thanks for your kind words, Mike. Yes this work item can now be closed.

          Show
          Edward Sargisson added a comment - Thanks for your kind words, Mike. Yes this work item can now be closed.
          Mike Percy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Hide
          Mike Percy added a comment -

          Just looked @ Review Board and there is a comment indicating that FLUME-2015 "is a better patch". So I am marking this one as "Won't Fix".

          Show
          Mike Percy added a comment - Just looked @ Review Board and there is a comment indicating that FLUME-2015 "is a better patch". So I am marking this one as "Won't Fix".
          Mike Percy made changes -
          Assignee Edward Sargisson [ ejsarge ]
          Hide
          Mike Percy added a comment -

          Edward your code reviews and contributions have been very valuable, I added you as a contributor and am assigning this JIRA to you.

          Do you want to move forward with this patch in light of the progress you and Tim have made over on FLUME-2015?

          Thanks,
          Mike

          Show
          Mike Percy added a comment - Edward your code reviews and contributions have been very valuable, I added you as a contributor and am assigning this JIRA to you. Do you want to move forward with this patch in light of the progress you and Tim have made over on FLUME-2015 ? Thanks, Mike
          Edward Sargisson made changes -
          Attachment FLUME-1782-FLUME-1972-1.4.0-1.patch [ 12579897 ]
          Hide
          Edward Sargisson added a comment -

          I have now incorporated the changes as requested and have attached FLUME-1782-FLUME-1972-1.4.0-1.patch. This code:

          • includes a new ElasticSearchEventSerializer2 with the new method.
          • allows on or other of the interfaces to be used
          • incorporates the FLUME-1972 patch because they overlap and cause merge conflicts. This patch has those conflicts resolved.
          Show
          Edward Sargisson added a comment - I have now incorporated the changes as requested and have attached FLUME-1782 - FLUME-1972 -1.4.0-1.patch. This code: includes a new ElasticSearchEventSerializer2 with the new method. allows on or other of the interfaces to be used incorporates the FLUME-1972 patch because they overlap and cause merge conflicts. This patch has those conflicts resolved.
          Mike Percy made changes -
          Link This issue relates to FLUME-2015 [ FLUME-2015 ]
          Edward Sargisson made changes -
          Attachment FLUME-1782-1.4.0-4.patch [ 12579586 ]
          Hide
          Edward Sargisson added a comment -

          New cumulative patch, FLUME-1782-1.4.0-4, which passes the provided timestamp in a method to a new serializer interface while keeping compatibility with implementations of the old interface.

          Show
          Edward Sargisson added a comment - New cumulative patch, FLUME-1782 -1.4.0-4, which passes the provided timestamp in a method to a new serializer interface while keeping compatibility with implementations of the old interface.
          Edward Sargisson made changes -
          Attachment FLUME-1782-1.4.0-3.patch [ 12579454 ]
          Hide
          Edward Sargisson added a comment -

          Modified proposed code so that events are not modified. Also use constants for timestamps in tests and use Google Guava Charsets.UTF_8. FLUME-1782-1.4.0-3 is now the single cumulative patch to use.

          Show
          Edward Sargisson added a comment - Modified proposed code so that events are not modified. Also use constants for timestamps in tests and use Google Guava Charsets.UTF_8. FLUME-1782 -1.4.0-3 is now the single cumulative patch to use.
          Hide
          Edward Sargisson added a comment -

          Hi Mike,
          To address point 2.
          > 2. In shouldSetAndWarnWhenNoHeaders() I don't think any Event should ever have null headers. Have you seen cases where that can happen?
          I've never seen null headers and I don't see how they could happen. I added the null check based on a review comment from Israel. Do we want the null check in or out?

          I will do the rework for points 3 and 4.

          Show
          Edward Sargisson added a comment - Hi Mike, To address point 2. > 2. In shouldSetAndWarnWhenNoHeaders() I don't think any Event should ever have null headers. Have you seen cases where that can happen? I've never seen null headers and I don't see how they could happen. I added the null check based on a review comment from Israel. Do we want the null check in or out? I will do the rework for points 3 and 4.
          Hide
          Edward Sargisson added a comment -

          Hi Mike,
          I've just taken a look at the new code based on your comment about not changing the event. The effect of this is a breaking change to the ElasticSearchEventSerializer interface so that the assigned timestamp can be passed along. Unless we happen to have an alternative to the concern you raise I think we're going to have to do that.
          (I had been trying to avoid making changes to that interface - that's why FLUME-1972 sets up a different interface. Maybe we should discuss joining those interfaces?)

          Show
          Edward Sargisson added a comment - Hi Mike, I've just taken a look at the new code based on your comment about not changing the event. The effect of this is a breaking change to the ElasticSearchEventSerializer interface so that the assigned timestamp can be passed along. Unless we happen to have an alternative to the concern you raise I think we're going to have to do that. (I had been trying to avoid making changes to that interface - that's why FLUME-1972 sets up a different interface. Maybe we should discuss joining those interfaces?)
          Hide
          Israel Ekpo added a comment -

          Cool,

          I did not notice you could do that.

          Good to know

          Show
          Israel Ekpo added a comment - Cool, I did not notice you could do that. Good to know
          Hide
          Mike Percy added a comment -

          Thanks Israel, I also just added a link to review board in this JIRA

          Show
          Mike Percy added a comment - Thanks Israel, I also just added a link to review board in this JIRA
          Mike Percy made changes -
          Remote Link This issue links to "Review Board (Web Link)" [ 12154 ]
          Hide
          Israel Ekpo added a comment -

          Hi Mike,

          He opened a code review here about 6 days ago.

          https://reviews.apache.org/r/10379/

          It was already in progress.

          I can copy your comments and paste it there.

          Show
          Israel Ekpo added a comment - Hi Mike, He opened a code review here about 6 days ago. https://reviews.apache.org/r/10379/ It was already in progress. I can copy your comments and paste it there.
          Hide
          Mike Percy added a comment -

          Hi Edward,
          Thanks for the patch! I have one concern about the approach and a few minor requests/questions.

          Concern: Since the sink is modifying the event itself, this can cause issues with memory channel rollback. i.e. if there is a channel rollback because ES is down, then any event processed when it comes back up will actually have the timestamp set by a previous run of the process() method. Depending on why you might want to have a local timestamp, that could be unexpected and/or desired.

          Instead of ensuring that the timestamp is set in the event itself at the sink, we could provide some other method of falling back to a local time in the sink itself if the event doesn't have a timestamp set.

          tl;dr: Don't modify the Event in the sink

          Other minor feedback:
          1. It would be great to post this onto Review Board @ https://reviews.apache.org/groups/Flume/ so that individual lines in the patch are easy to reference during review
          2. In shouldSetAndWarnWhenNoHeaders() I don't think any Event should ever have null headers. Have you seen cases where that can happen?
          3. Nit: can we use a constant for the values 1355364900011 and 1355364900079 that are used throughout this test? They are kinda magic numbers and it could use a little more explanation and/or ensuring they are used consistently by making them private static final variables.
          4. Nit: consider using Charsets.UTF_8 from Guava rather than Charset.forName("UTF-8") ... this is optional and not a big deal.

          Best,
          Mike

          Show
          Mike Percy added a comment - Hi Edward, Thanks for the patch! I have one concern about the approach and a few minor requests/questions. Concern: Since the sink is modifying the event itself, this can cause issues with memory channel rollback. i.e. if there is a channel rollback because ES is down, then any event processed when it comes back up will actually have the timestamp set by a previous run of the process() method. Depending on why you might want to have a local timestamp, that could be unexpected and/or desired. Instead of ensuring that the timestamp is set in the event itself at the sink, we could provide some other method of falling back to a local time in the sink itself if the event doesn't have a timestamp set. tl;dr: Don't modify the Event in the sink Other minor feedback: 1. It would be great to post this onto Review Board @ https://reviews.apache.org/groups/Flume/ so that individual lines in the patch are easy to reference during review 2. In shouldSetAndWarnWhenNoHeaders() I don't think any Event should ever have null headers. Have you seen cases where that can happen? 3. Nit: can we use a constant for the values 1355364900011 and 1355364900079 that are used throughout this test? They are kinda magic numbers and it could use a little more explanation and/or ensuring they are used consistently by making them private static final variables. 4. Nit: consider using Charsets.UTF_8 from Guava rather than Charset.forName("UTF-8") ... this is optional and not a big deal. Best, Mike
          Edward Sargisson made changes -
          Attachment FLUME-1782-1.4.0-2.patch [ 12578257 ]
          Hide
          Edward Sargisson added a comment -

          New cumulative patch with rework from review attached: FLUME-1782-1.4.0-2

          Show
          Edward Sargisson added a comment - New cumulative patch with rework from review attached: FLUME-1782 -1.4.0-2
          Edward Sargisson made changes -
          Attachment FLUME-1782-1.4.0-1.patch [ 12577839 ]
          Hide
          Edward Sargisson added a comment -

          FLUME-1782-1.4.0-1.patch is now the patch to use. It's based on flume-1.4 branch at commit 5b9d31f1ad228 (2013-04-07).

          Show
          Edward Sargisson added a comment - FLUME-1782 -1.4.0-1.patch is now the patch to use. It's based on flume-1.4 branch at commit 5b9d31f1ad228 (2013-04-07).
          Edward Sargisson made changes -
          Attachment FLUME-1782-1.4.0.patch [ 12577262 ]
          Hide
          Edward Sargisson added a comment -

          I have now updated and combined the patches. FLUME-1782-1.4.0.patch is now the only patch that should be applied to trunk. It contains the original fix to the defect as well as the user guide updates.

          I have decided not to include the changes Israel Ekpo suggested. I don't believe I did them correctly and I think they are more confusing than useful. A developer who needs that functionality can use the patches in this work item to do that if required.

          Show
          Edward Sargisson added a comment - I have now updated and combined the patches. FLUME-1782 -1.4.0.patch is now the only patch that should be applied to trunk. It contains the original fix to the defect as well as the user guide updates. I have decided not to include the changes Israel Ekpo suggested. I don't believe I did them correctly and I think they are more confusing than useful. A developer who needs that functionality can use the patches in this work item to do that if required.
          Hide
          Israel Ekpo added a comment -

          Cool Edward, No problem.

          Show
          Israel Ekpo added a comment - Cool Edward, No problem.
          Hide
          Edward Sargisson added a comment -

          Hi Israel,
          Well - the code is in the patch ready for when the maintainers are ready for it.

          Show
          Edward Sargisson added a comment - Hi Israel, Well - the code is in the patch ready for when the maintainers are ready for it.
          Hide
          Israel Ekpo added a comment -

          Hi Edward,

          I was not stating that it is the wisest thing to do.

          I just think it will be a good idea to be flexible with the timezones so that in the future if you need to use a different timezone other than UTC, it will just be a matter of changing the value.

          This can make future customization for other use cases easier.

          You can set UTC as the default timzeone but allow the option to set it to other ones if the uses wishes to do so.

          Show
          Israel Ekpo added a comment - Hi Edward, I was not stating that it is the wisest thing to do. I just think it will be a good idea to be flexible with the timezones so that in the future if you need to use a different timezone other than UTC, it will just be a matter of changing the value. This can make future customization for other use cases easier. You can set UTC as the default timzeone but allow the option to set it to other ones if the uses wishes to do so.
          Edward Sargisson made changes -
          Attachment flume-1782-3.patch [ 12573891 ]
          Hide
          Edward Sargisson added a comment - - edited

          Hi Israel,
          Well, if you changed the timezone then the output will not work with Kibana so this wouldn't be a terribly wise thing to do. However, I have written the code to do it and updated the docs with the details - as well as fixing capitalization and heading level issues. That code is now attached as flume-1782-3.patch which needs to be added on to flume-1782-2.patch.

          Show
          Edward Sargisson added a comment - - edited Hi Israel, Well, if you changed the timezone then the output will not work with Kibana so this wouldn't be a terribly wise thing to do. However, I have written the code to do it and updated the docs with the details - as well as fixing capitalization and heading level issues. That code is now attached as flume-1782-3.patch which needs to be added on to flume-1782-2.patch.
          Hide
          Israel Ekpo added a comment -

          It think it would be helpful if the user can configure what timezone or offset they wish to use.

          This way the logic is not tied only to UTC.

          You can use Eastern Time, Mountain Time, Pacific Time or whichever timezone you prefer.

          Any thoughts?

          Show
          Israel Ekpo added a comment - It think it would be helpful if the user can configure what timezone or offset they wish to use. This way the logic is not tied only to UTC. You can use Eastern Time, Mountain Time, Pacific Time or whichever timezone you prefer. Any thoughts?
          Edward Sargisson made changes -
          Attachment flume-1782-2.patch [ 12573741 ]
          Hide
          Edward Sargisson added a comment -

          New patch attached which will only warn about missing timestamps every 1000 times it detects the problem.

          Show
          Edward Sargisson added a comment - New patch attached which will only warn about missing timestamps every 1000 times it detects the problem.
          Hide
          Edward Sargisson added a comment -

          Hi Brock,
          Sorry for the delay in replying - I've had some family issues taking my attention.

          If the data doesn't have a timestamp then there's little point writing it to ElasticSearch as Kibana will not be able to query it from ElasticSearch. Kibana's querying expects there to be a timestamp.

          It's for this reason that it will warn you if it's not there.

          If you want I can make it only log once but I'm not sure if that's useful.

          Show
          Edward Sargisson added a comment - Hi Brock, Sorry for the delay in replying - I've had some family issues taking my attention. If the data doesn't have a timestamp then there's little point writing it to ElasticSearch as Kibana will not be able to query it from ElasticSearch. Kibana's querying expects there to be a timestamp. It's for this reason that it will warn you if it's not there. If you want I can make it only log once but I'm not sure if that's useful.
          Hide
          Brock Noland added a comment -

          Hi Edward,

          The patch looks good, I do wonder about the logger.warn() statement though. In a high throughput scenario, that could log a ton of data.

          Show
          Brock Noland added a comment - Hi Edward, The patch looks good, I do wonder about the logger.warn() statement though. In a high throughput scenario, that could log a ton of data.
          Edward Sargisson made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s v1.4.0 [ 12323372 ]
          Edward Sargisson made changes -
          Field Original Value New Value
          Attachment flume-1782.patch [ 12560880 ]
          Hide
          Edward Sargisson added a comment -

          Attached patch

          Show
          Edward Sargisson added a comment - Attached patch
          Hide
          Edward Sargisson added a comment -

          This failure will only happen if the server running the sink is not in the UTC timezone itself. I have a patch that I will contribute back once it has passed my own testing.

          Show
          Edward Sargisson added a comment - This failure will only happen if the server running the sink is not in the UTC timezone itself. I have a patch that I will contribute back once it has passed my own testing.
          Edward Sargisson created issue -

            People

            • Assignee:
              Edward Sargisson
              Reporter:
              Edward Sargisson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development