Avro
  1. Avro
  2. AVRO-573

Need to include Handshake Response when handling a system error on the Java server side

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: java, spec
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      System error is not handling everything right now. Details in the comments.

      1. AVRO-573.patch
        8 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          I committed this.

          Show
          Doug Cutting added a comment - I committed this.
          Hide
          Jeff Hammerbacher added a comment -

          Hey Doug,

          I can confirm that this patch works nicely with the Python client currently in trunk. When throwing an unexpected error in the Java Responder, I get this error in my client:

          avro.ipc.AvroRemoteException: org.apache.avro.AvroRuntimeException: Not in union ["string",{"type":"error","name":"AIOError","namespace":"org.apache.hadoop.hbase.avro.generated","fields":[{"name":"message","type":"string"}]}]: java.lang.NullPointerException
          

          That looks correct to me. Great work!

          Later,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Doug, I can confirm that this patch works nicely with the Python client currently in trunk. When throwing an unexpected error in the Java Responder, I get this error in my client: avro.ipc.AvroRemoteException: org.apache.avro.AvroRuntimeException: Not in union [ "string" ,{ "type" : "error" , "name" : "AIOError" , "namespace" : "org.apache.hadoop.hbase.avro.generated" , "fields" :[{ "name" : "message" , "type" : "string" }]}]: java.lang.NullPointerException That looks correct to me. Great work! Later, Jeff
          Hide
          Doug Cutting added a comment -

          > blows away our old bbo. Our old bbo, however, had some useful stuff in it
          > I did a quick glance at the patch and don't see any change to ipc/Responder.java,

          That was changed in AVRO-578.

          http://svn.apache.org/viewvc?view=revision&revision=958149

          The handshake response data is now copied from the bbo before this, so this is no longer a potential problem.
          There were several other problems handling system errors that the tests exposed that I fix in the patch.

          Show
          Doug Cutting added a comment - > blows away our old bbo. Our old bbo, however, had some useful stuff in it > I did a quick glance at the patch and don't see any change to ipc/Responder.java, That was changed in AVRO-578 . http://svn.apache.org/viewvc?view=revision&revision=958149 The handshake response data is now copied from the bbo before this, so this is no longer a potential problem. There were several other problems handling system errors that the tests exposed that I fix in the patch.
          Hide
          Jeff Hammerbacher added a comment -

          Doug,

          The problem I turned up was in ipc/Responder.java, and was described by:

          Inside of the catch block where we handle system errors, however, I believe we're doing something bad: we're creating a new ByteBufferOutputStream bbo, which blows away our old bbo. Our old bbo, however, had some useful stuff in it that we've failed to recreate: the HandshakeResponse. Thus any client trying to read a system error appears SOL

          I did a quick glance at the patch and don't see any change to ipc/Responder.java, so I'm not sure if the new tests (which are great!) are hitting the same problem I'm seeing above. I'll get out my test environment and see what I find.

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Doug, The problem I turned up was in ipc/Responder.java, and was described by: Inside of the catch block where we handle system errors, however, I believe we're doing something bad: we're creating a new ByteBufferOutputStream bbo, which blows away our old bbo. Our old bbo, however, had some useful stuff in it that we've failed to recreate: the HandshakeResponse. Thus any client trying to read a system error appears SOL I did a quick glance at the patch and don't see any change to ipc/Responder.java, so I'm not sure if the new tests (which are great!) are hitting the same problem I'm seeing above. I'll get out my test environment and see what I find. Thanks, Jeff
          Hide
          Doug Cutting added a comment -

          It seems that there were no tests for system errors in Java, and that they were broken. I'm not sure what I was proposing earlier or why, but here's a patch that adds java tests for system errors and fixes the bugs detected. With this applied, a remote system error is thrown locally as an AvroRuntimeException rather than an AvroRemoteException.

          Show
          Doug Cutting added a comment - It seems that there were no tests for system errors in Java, and that they were broken. I'm not sure what I was proposing earlier or why, but here's a patch that adds java tests for system errors and fixes the bugs detected. With this applied, a remote system error is thrown locally as an AvroRuntimeException rather than an AvroRemoteException.
          Hide
          Doug Cutting added a comment -

          > So in the client we'd skip the server protocol and hash reading

          Those are both already optional in the HandshakeResponse, so this should require no change.

          > look for a specific metadata key

          Only if you want to generate a pretty error message.

          > and then bail on trying to read the message as well (since we don't have a protocol)?

          This is in logic that's already handling each value of HandshakeResponse.match. Some cases already don't read the message (match=NONE). So that logic could add a new case to handle match=ERROR. Or, if that logic already throws an error when the match type is unrecognized then it would not need to be changed at all, unless you want a more descriptive error to be thrown.

          > Sounds okay-ish.

          Do you have an alternate proposal or suggested improvements?

          Show
          Doug Cutting added a comment - > So in the client we'd skip the server protocol and hash reading Those are both already optional in the HandshakeResponse, so this should require no change. > look for a specific metadata key Only if you want to generate a pretty error message. > and then bail on trying to read the message as well (since we don't have a protocol)? This is in logic that's already handling each value of HandshakeResponse.match. Some cases already don't read the message (match=NONE). So that logic could add a new case to handle match=ERROR. Or, if that logic already throws an error when the match type is unrecognized then it would not need to be changed at all, unless you want a more descriptive error to be thrown. > Sounds okay-ish. Do you have an alternate proposal or suggested improvements?
          Hide
          Jeff Hammerbacher added a comment -

          So in the client we'd skip the server protocol and hash reading, look for a specific metadata key, and then bail on trying to read the message as well (since we don't have a protocol)? Sounds okay-ish.

          Show
          Jeff Hammerbacher added a comment - So in the client we'd skip the server protocol and hash reading, look for a specific metadata key, and then bail on trying to read the message as well (since we don't have a protocol)? Sounds okay-ish.
          Hide
          Doug Cutting added a comment -

          > What would be the contents of the value portion?

          A system error message, in UTF-8. So if for some reason the handshake process itself generates, e.g., a NullPointerException, you'll get a HandhakeResponse with match=ERROR and "avro.error.message" set in the response's metadata to "NullPointerException". Make sense?

          Show
          Doug Cutting added a comment - > What would be the contents of the value portion? A system error message, in UTF-8. So if for some reason the handshake process itself generates, e.g., a NullPointerException, you'll get a HandhakeResponse with match=ERROR and "avro.error.message" set in the response's metadata to "NullPointerException". Make sense?
          Hide
          Jeff Hammerbacher added a comment -

          I don't think a match=ERROR should be required to include the server's protocol or hash.

          Yep that makes sense.

          We should probably specify a metadata key for an error message. Does that sound reasonable?

          What would be the contents of the value portion?

          Show
          Jeff Hammerbacher added a comment - I don't think a match=ERROR should be required to include the server's protocol or hash. Yep that makes sense. We should probably specify a metadata key for an error message. Does that sound reasonable? What would be the contents of the value portion?
          Hide
          Doug Cutting added a comment -

          I don't think a match=ERROR should be required to include the server's protocol or hash.
          Imagine a misconfigured server that cannot manage to locate its protocol for, e.g., a requested URL.
          Ideally this might throw an HTTP error, but it might also result in an NPE within Avro.
          It'd be nice if a client didn't hang or fail strangely in such cases.

          We should probably specify a metadata key for an error message. Does that sound reasonable?

          Show
          Doug Cutting added a comment - I don't think a match=ERROR should be required to include the server's protocol or hash. Imagine a misconfigured server that cannot manage to locate its protocol for, e.g., a requested URL. Ideally this might throw an HTTP error, but it might also result in an NPE within Avro. It'd be nice if a client didn't hang or fail strangely in such cases. We should probably specify a metadata key for an error message. Does that sound reasonable?
          Hide
          Jeff Hammerbacher added a comment -

          Hey Doug,

          That sounds reasonable, but could you complete the description of the match=ERROR response? Should it be the same as CLIENT (e.g. give a server protocol and hash)?

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Doug, That sounds reasonable, but could you complete the description of the match=ERROR response? Should it be the same as CLIENT (e.g. give a server protocol and hash)? Thanks, Jeff
          Hide
          Doug Cutting added a comment -

          We probably need a match=ERROR in HandshakeResponse. Arguably it's a back-compatible change, since it will cause applications to fail with an enum-out-of-bounds error rather than some other error.

          Here's a proposal:

          • move the handshake to it's own try block. in case of error, write a match=ERROR response
          • add mark() & reset() methods to ByteBufferOutputStream(), to save & restore its position
          • call bbo.mark() after the handshake response is written
          • call bbo.reset() in the catch clause before writing system error

          How's that sound?

          Show
          Doug Cutting added a comment - We probably need a match=ERROR in HandshakeResponse. Arguably it's a back-compatible change, since it will cause applications to fail with an enum-out-of-bounds error rather than some other error. Here's a proposal: move the handshake to it's own try block. in case of error, write a match=ERROR response add mark() & reset() methods to ByteBufferOutputStream(), to save & restore its position call bbo.mark() after the handshake response is written call bbo.reset() in the catch clause before writing system error How's that sound?
          Hide
          Jeff Hammerbacher added a comment -

          I'm not exactly sure what the correct behavior might be here; we may have blown up while processing the handshake, so we certainly don't want to try that again; should we just hardcode CLIENT as the match value?

          Show
          Jeff Hammerbacher added a comment - I'm not exactly sure what the correct behavior might be here; we may have blown up while processing the handshake, so we certainly don't want to try that again; should we just hardcode CLIENT as the match value?
          Hide
          Jeff Hammerbacher added a comment -

          Let's say, for example, that I have an Avro implementation of an Avro protocol that is misbehaving slightly. One of the message implementations declares that it throws, for example, an AIOError, but during some processing in that message, a RuntimeException (say an NPE) gets thrown. Now, this has never happened in my code, because I statically compile future states of the universe before shipping. But hypothetically.

          In ipc/Responder.java, on line 132, inside of the respond method, we entry a try/catch clause. Inside of this try/catch clause, we try calling whichever message is indicated by the request. If, in the body of that message, we throw an NPE, the catch clause will catch it and record the NPE as the "error" variable. Later, in line 149 of the same method, we have an if/else clause. We check to ensure the error is not null, and if it is not, we call writeError with the "error" variable as the object to be written. Since this object is of type NPE, which was not one of the errors declared by the message in the Avro protocol, we throw an error while trying to write the error, I think. Which is fine; we've now triggered a system error.

          Inside of the catch block where we handle system errors, however, I believe we're doing something bad: we're creating a new ByteBufferOutputStream bbo, which blows away our old bbo. Our old bbo, however, had some useful stuff in it that we've failed to recreate: the HandshakeResponse. Thus any client trying to read a system error appears SOL

          One thing I don't get, and I'm having trouble instrumenting to figure out why it is happening because I don't understand Maven, is that the logs write the "system error" twice; so somehow, for a single RPC, we're writing this system error log entry twice:

          10/06/10 22:21:59 WARN ipc.Responder: system error
          org.apache.avro.AvroRuntimeException: Not in union ["string",{"type":"error","name":"AIOError","namespace":"org.apache.hadoop.hbase.avro.generated","fields":[{"name":"message","type":"string"}]}]: java.lang.NullPointerException
          	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:340)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
          	at org.apache.avro.specific.SpecificResponder.writeError(SpecificResponder.java:81)
          	at org.apache.avro.ipc.Responder.respond(Responder.java:137)
          	at org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
          	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
          	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          	at org.mortbay.jetty.Server.handle(Server.java:326)
          	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
          	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
          	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
          	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
          	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          10/06/10 22:21:59 WARN ipc.Responder: system error
          org.apache.avro.AvroRuntimeException: Not in union ["string",{"type":"error","name":"AIOError","namespace":"org.apache.hadoop.hbase.avro.generated","fields":[{"name":"message","type":"string"}]}]: java.lang.NullPointerException
          	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:340)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
          	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55)
          	at org.apache.avro.specific.SpecificResponder.writeError(SpecificResponder.java:81)
          	at org.apache.avro.ipc.Responder.respond(Responder.java:137)
          	at org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
          	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
          	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
          	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
          	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
          	at org.mortbay.jetty.Server.handle(Server.java:326)
          	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
          	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
          	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
          	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
          	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
          	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
          	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
          
          Show
          Jeff Hammerbacher added a comment - Let's say, for example, that I have an Avro implementation of an Avro protocol that is misbehaving slightly. One of the message implementations declares that it throws, for example, an AIOError, but during some processing in that message, a RuntimeException (say an NPE) gets thrown. Now, this has never happened in my code, because I statically compile future states of the universe before shipping. But hypothetically. In ipc/Responder.java, on line 132, inside of the respond method, we entry a try/catch clause. Inside of this try/catch clause, we try calling whichever message is indicated by the request. If, in the body of that message, we throw an NPE, the catch clause will catch it and record the NPE as the "error" variable. Later, in line 149 of the same method, we have an if/else clause. We check to ensure the error is not null, and if it is not, we call writeError with the "error" variable as the object to be written. Since this object is of type NPE, which was not one of the errors declared by the message in the Avro protocol, we throw an error while trying to write the error, I think. Which is fine; we've now triggered a system error. Inside of the catch block where we handle system errors, however, I believe we're doing something bad: we're creating a new ByteBufferOutputStream bbo, which blows away our old bbo. Our old bbo, however, had some useful stuff in it that we've failed to recreate: the HandshakeResponse. Thus any client trying to read a system error appears SOL One thing I don't get, and I'm having trouble instrumenting to figure out why it is happening because I don't understand Maven, is that the logs write the "system error" twice; so somehow, for a single RPC, we're writing this system error log entry twice: 10/06/10 22:21:59 WARN ipc.Responder: system error org.apache.avro.AvroRuntimeException: Not in union ["string",{"type":"error","name":"AIOError","namespace":"org.apache.hadoop.hbase.avro.generated","fields":[{"name":"message","type":"string"}]}]: java.lang.NullPointerException at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:340) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55) at org.apache.avro.specific.SpecificResponder.writeError(SpecificResponder.java:81) at org.apache.avro.ipc.Responder.respond(Responder.java:137) at org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 10/06/10 22:21:59 WARN ipc.Responder: system error org.apache.avro.AvroRuntimeException: Not in union ["string",{"type":"error","name":"AIOError","namespace":"org.apache.hadoop.hbase.avro.generated","fields":[{"name":"message","type":"string"}]}]: java.lang.NullPointerException at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:340) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:55) at org.apache.avro.specific.SpecificResponder.writeError(SpecificResponder.java:81) at org.apache.avro.ipc.Responder.respond(Responder.java:137) at org.apache.avro.ipc.ResponderServlet.doPost(ResponderServlet.java:48) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development