Flume
  1. Flume
  2. FLUME-2333

HTTP source handler doesn't allow for responses

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Existing HTTP source handlers recieve events via a HTTPServletRequest. This works, but because the handler doesn't have access to the HTTPServletResponse, there is no ability to return a response. This makes it unsuitable for some sort of protocol that relies on bidirectional communication.

      My solution: In addition to the existing HTTPSource interface, I've added a BidirectionalHTTPSource interface that is provided the servlet response as a parameter. I've made some changes in the HTTP source allow for both types to co-exist, and my changes shouldn't affect anyone who is already using the existing interface.

      Also includes minor documentation updates to reflect this.

      Review: https://reviews.apache.org/r/18555/

      1. FLUME-2333-CUMULATIVE.diff
        40 kB
        Jeremy Karlson
      2. FLUME-2333-4.diff
        40 kB
        Gabriel Commeau
      3. FLUME-2333-3.diff
        46 kB
        Gabriel Commeau
      4. FLUME-2333-2.diff
        43 kB
        Gabriel Commeau
      5. FLUME-2333.diff
        37 kB
        Jeremy Karlson

        Activity

        Hide
        Jeremy Karlson added a comment -

        Hari Shreedharan:

        Is there anything else I need to do to this? I'd like to get it into 1.5 if possible, but if it needs further review or work, I'd like to keep that process moving along anyway.

        Thanks!

        Show
        Jeremy Karlson added a comment - Hari Shreedharan : Is there anything else I need to do to this? I'd like to get it into 1.5 if possible, but if it needs further review or work, I'd like to keep that process moving along anyway. Thanks!
        Hide
        Jeremy Karlson added a comment -

        Updated the review: https://reviews.apache.org/r/18555

        Personally, I'm pretty happy with Gabriel's work and I also think it's ready to go.

        Show
        Jeremy Karlson added a comment - Updated the review: https://reviews.apache.org/r/18555 Personally, I'm pretty happy with Gabriel's work and I also think it's ready to go.
        Hide
        Jeremy Karlson added a comment - - edited

        Made a couple of minor tweaks to the patch. This one is cumulative and replaces all other existing patches. FLUME-2333-CUMULATIVE.diff.

        Show
        Jeremy Karlson added a comment - - edited Made a couple of minor tweaks to the patch. This one is cumulative and replaces all other existing patches. FLUME-2333 -CUMULATIVE.diff.
        Hide
        Gabriel Commeau added a comment -

        I’m done with this patch: I modified the unit test (and added one test case), as well as the documentation.
        Please feel free to modify it further, but as far as I’m concerned, this is ready to go.

        Show
        Gabriel Commeau added a comment - I’m done with this patch: I modified the unit test (and added one test case), as well as the documentation. Please feel free to modify it further, but as far as I’m concerned, this is ready to go.
        Hide
        Gabriel Commeau added a comment -

        Added missing file

        Show
        Gabriel Commeau added a comment - Added missing file
        Hide
        Hari Shreedharan added a comment -

        I am planning to roll the RC later this week. This might need more discussion - not sure at this time if we can get in to 1.5

        Show
        Hari Shreedharan added a comment - I am planning to roll the RC later this week. This might need more discussion - not sure at this time if we can get in to 1.5
        Hide
        Jeremy Karlson added a comment -

        Hey, sorry for the delay. Unfortunately, I can't figure out the second patch... I think it's missing a file.

        If we can get this sorted out by Monday, do you think it will make 1.5?

        Show
        Jeremy Karlson added a comment - Hey, sorry for the delay. Unfortunately, I can't figure out the second patch... I think it's missing a file. If we can get this sorted out by Monday, do you think it will make 1.5?
        Hide
        Hari Shreedharan added a comment -

        Jeremy Karlson - Any update on this one?

        Show
        Hari Shreedharan added a comment - Jeremy Karlson - Any update on this one?
        Hide
        Gabriel Commeau added a comment -

        So here is my contribution. I updated the javadoc, but not the documentation (yet). Also, it’s worth a few additional unit tests, but again, let’s worry about that after we agree on the changes.

        Show
        Gabriel Commeau added a comment - So here is my contribution. I updated the javadoc, but not the documentation (yet). Also, it’s worth a few additional unit tests, but again, let’s worry about that after we agree on the changes.
        Hide
        Gabriel Commeau added a comment -

        As discussed on the mailing list, here are a few additional requirements for this ticket:

        • Add in the BidirectionalHTTPSourceHandler interface methods to handle the result of the servlet committing the batch of events to the channel: whether the commit is successful or throws an exception, the HTTP response comes from the BidirectionalHTTPSourceHandler.
        • Allow through configuration for many handlers (HTTPSourceHandler and the BidirectionalHTTPSourceHandler) to be added to different paths. In order not to break the backward compatibility, keep the “.handler” configuration parameter, and add to it “.handlers”, which basically takes a list of handlers.
        Show
        Gabriel Commeau added a comment - As discussed on the mailing list, here are a few additional requirements for this ticket: Add in the BidirectionalHTTPSourceHandler interface methods to handle the result of the servlet committing the batch of events to the channel: whether the commit is successful or throws an exception, the HTTP response comes from the BidirectionalHTTPSourceHandler. Allow through configuration for many handlers (HTTPSourceHandler and the BidirectionalHTTPSourceHandler) to be added to different paths. In order not to break the backward compatibility, keep the “.handler” configuration parameter, and add to it “.handlers”, which basically takes a list of handlers.
        Hide
        Hari Shreedharan added a comment -

        In this case, I am not talking about handlers in terms of servlet handlers, I am talking about HTTP Source Handler instances. The problem with the HttpResponse going into the HTTP Source Handler and that writing stuff before the source calls sendError() is that the data could already have been pushed out from the HttpResponse buffer (I believe that this is a buffered stream and the response can get pushed out, even without calling flush). I think the Flume source should call sendError() anyway, but should also allow the handler to be able to send a response. Also, nothing really stops the Source handler from calling flush (unless we wrap the response object in a FlumeHttpResponse class object that ignores flush calls).

        I am not a web developer, so my knowledge about HTTP servlets is pretty limited.

        Show
        Hari Shreedharan added a comment - In this case, I am not talking about handlers in terms of servlet handlers, I am talking about HTTP Source Handler instances. The problem with the HttpResponse going into the HTTP Source Handler and that writing stuff before the source calls sendError() is that the data could already have been pushed out from the HttpResponse buffer (I believe that this is a buffered stream and the response can get pushed out, even without calling flush). I think the Flume source should call sendError() anyway, but should also allow the handler to be able to send a response. Also, nothing really stops the Source handler from calling flush (unless we wrap the response object in a FlumeHttpResponse class object that ignores flush calls). I am not a web developer, so my knowledge about HTTP servlets is pretty limited.
        Hide
        Jeremy Karlson added a comment -

        I understand your concerns. I did think about the error conditions and considered something like what you proposed. In the end I submitted it as-is because of this:

        http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletResponse.html#sendError%28int%29

        My understanding of the servlet spec is that the handler can write anything it wants to the response stream. In the event sendError() is called, that response is discarded. (As long as flush() was not previously called on the stream, which I don't think they should be doing anyway.) So the transactional error cases end up doing this:

        • passes the input and output to the handler
        • assuming the events are well formed, handler generates a list of events and writes a success response to the output
        • source attempts to write events to the channel and fails
        • source calls sendError() on the output, which aborts any response the handler may have made and returns a HTTP error code

        In my mind, this is actually a feature - clearer separation of concerns. The handler is only concerned with receiving and generating events. It does't know, and doesn't have to know, that something went wrong in the channel writing. That is returned as a 500 (server error) or 503 (server unavailable), which I think actually makes sense... It's not an error internal to the handler protocol, it's something in the server (Flume).

        I suppose I should have mentioned the "don't call flush()" thing in the doc update.

        What do you think?

        Show
        Jeremy Karlson added a comment - I understand your concerns. I did think about the error conditions and considered something like what you proposed. In the end I submitted it as-is because of this: http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletResponse.html#sendError%28int%29 My understanding of the servlet spec is that the handler can write anything it wants to the response stream. In the event sendError() is called, that response is discarded. (As long as flush() was not previously called on the stream, which I don't think they should be doing anyway.) So the transactional error cases end up doing this: passes the input and output to the handler assuming the events are well formed, handler generates a list of events and writes a success response to the output source attempts to write events to the channel and fails source calls sendError() on the output, which aborts any response the handler may have made and returns a HTTP error code In my mind, this is actually a feature - clearer separation of concerns. The handler is only concerned with receiving and generating events. It does't know, and doesn't have to know, that something went wrong in the channel writing. That is returned as a 500 (server error) or 503 (server unavailable), which I think actually makes sense... It's not an error internal to the handler protocol, it's something in the server (Flume). I suppose I should have mentioned the "don't call flush()" thing in the doc update. What do you think?
        Hide
        Hari Shreedharan added a comment -

        I like the idea of this jira, but I am not entirely sure of having a custom handler. I'd prefer not move the transactional semantics out of the source - I don't want broken handlers causing data loss etc. Flexibility is good until custom code starts breaking guarantees.

        One of the reasons I did not make the handler bidrectional was that there was no way to tell the handler that the channel writes may have failed, and so the last message which the handler may have sent to the client may have been incorrect. Even now, you'd have to call the handler a 2nd time saying that things failed and the handler must send out an "amendment" to the previous message - which is why the handler is limited currently.

        One way around this would be to add a 2nd Servlet for bidirectional handlers, that:

        • passes the input to the handler, and gets a list of events
        • writes events to the channel
        • successful commit: calls a success method in the handler (which gets the HttpResponse object as a parameter)
        • failed commit: calls a failure method in the handler (also gets the HttpResponse object).
          This ensures that the handler can send the correct message based on successful or failed commit.

        Does that make sense?

        Show
        Hari Shreedharan added a comment - I like the idea of this jira, but I am not entirely sure of having a custom handler. I'd prefer not move the transactional semantics out of the source - I don't want broken handlers causing data loss etc. Flexibility is good until custom code starts breaking guarantees. One of the reasons I did not make the handler bidrectional was that there was no way to tell the handler that the channel writes may have failed, and so the last message which the handler may have sent to the client may have been incorrect. Even now, you'd have to call the handler a 2nd time saying that things failed and the handler must send out an "amendment" to the previous message - which is why the handler is limited currently. One way around this would be to add a 2nd Servlet for bidirectional handlers, that: passes the input to the handler, and gets a list of events writes events to the channel successful commit: calls a success method in the handler (which gets the HttpResponse object as a parameter) failed commit: calls a failure method in the handler (also gets the HttpResponse object). This ensures that the handler can send the correct message based on successful or failed commit. Does that make sense?

          People

          • Assignee:
            Jeremy Karlson
            Reporter:
            Jeremy Karlson
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development