Thrift
  1. Thrift
  2. THRIFT-601

sending random data crashes thrift service

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.3
    • Component/s: Java - Library
    • Labels:

      Description

      Sending random data to a Java thrift service causes it to crash with extreme prejudice.

      dd if=/dev/urandom count=1 | nc $host 9160

      ... produces ...

      java.lang.OutOfMemoryError: Java heap space
      at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296)
      at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:203)
      at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:615)
      at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:636)

      1. t601-p2.patch
        1 kB
        David Reiss
      2. thrift-601.patch
        2 kB
        Esteve Fernandez
      3. thrift-601-v2.patch
        0.6 kB
        Bryan Duxbury

        Issue Links

          Activity

          Hide
          David Reiss added a comment -

          Thrift is meant for for RPC between cooperating applications, not handling random data. That said, check out setReadLength in TBinaryProtocol.

          Show
          David Reiss added a comment - Thrift is meant for for RPC between cooperating applications, not handling random data. That said, check out setReadLength in TBinaryProtocol.
          Hide
          Bryan Duxbury added a comment -

          Or, use TNonblockingServer and set a max memory buffer size.

          Show
          Bryan Duxbury added a comment - Or, use TNonblockingServer and set a max memory buffer size.
          Hide
          David Reiss added a comment -

          You might actually need both, because you could have a small frame header to trick the server, then a large (fake) string size.

          Show
          David Reiss added a comment - You might actually need both, because you could have a small frame header to trick the server, then a large (fake) string size.
          Hide
          Jonathan Ellis added a comment -

          "Thrift is meant for for RPC between cooperating applications" is true but meaningless. It's easy to achieve the same result through human error (e.g. sufficiently incompatible protocols for the "same" service, or connecting to the wrong port on a machine running two thrift services, or ...). This is about being un-fragile, not protecting against malice.

          Show
          Jonathan Ellis added a comment - "Thrift is meant for for RPC between cooperating applications" is true but meaningless. It's easy to achieve the same result through human error (e.g. sufficiently incompatible protocols for the "same" service, or connecting to the wrong port on a machine running two thrift services, or ...). This is about being un-fragile, not protecting against malice.
          Hide
          David Reiss added a comment -

          "That said, check out setReadLength in TBinaryProtocol."

          Show
          David Reiss added a comment - "That said, check out setReadLength in TBinaryProtocol."
          Hide
          Jonathan Ellis added a comment -

          i did. it's not very useful.

          Show
          Jonathan Ellis added a comment - i did. it's not very useful.
          Hide
          David Reiss added a comment - - edited

          Neither is your description of the problem. To clarify, setting the maximum read size will prevent this exception. Why don't you think it is useful?

          Show
          David Reiss added a comment - - edited Neither is your description of the problem. To clarify, setting the maximum read size will prevent this exception. Why don't you think it is useful?
          Hide
          Jonathan Ellis added a comment -

          The problem is adequately described. But since you apparently want me to describe a solution too, here is what I would do given my limited understanding of the internals:

          • readMessageBegin needs to know what the range of possible message names are, and reject messages whose name length would be longer/shorter than the max/min possible.
          • string and bytes should be able to specify a max length, because one size does not fit all
          • string and byte max length should be typedef-able
          Show
          Jonathan Ellis added a comment - The problem is adequately described. But since you apparently want me to describe a solution too, here is what I would do given my limited understanding of the internals: readMessageBegin needs to know what the range of possible message names are, and reject messages whose name length would be longer/shorter than the max/min possible. string and bytes should be able to specify a max length, because one size does not fit all string and byte max length should be typedef-able
          Hide
          Esteve Fernandez added a comment -

          We agreed (well, kindof) to use the framed transport by default and leave the buffered one as optional. Does this happen using the framed transport too? It shouldn't, but I'm not sure. In any case, I'm with Eric and Jonathan here, this looks pretty serious to me.

          Show
          Esteve Fernandez added a comment - We agreed (well, kindof) to use the framed transport by default and leave the buffered one as optional. Does this happen using the framed transport too? It shouldn't, but I'm not sure. In any case, I'm with Eric and Jonathan here, this looks pretty serious to me.
          Hide
          Brandon Williams added a comment -

          Yes, this happens with either the framed or nonframed transport. I too find this issue to be pretty serious, it's far too easy to accidentally crash the server and it makes port monitoring difficult.

          Show
          Brandon Williams added a comment - Yes, this happens with either the framed or nonframed transport. I too find this issue to be pretty serious, it's far too easy to accidentally crash the server and it makes port monitoring difficult.
          Hide
          Esteve Fernandez added a comment -

          This is a very simple patch which just closes an incoming connection if the frame size is too big. It probably doesn't address all the implied issues, so be careful.

          As for adding limits to string, bytes, etc. I'm not against it, but I believe it belongs in a separate ticket, as it will affect all of the languages supported by Thrift.

          Show
          Esteve Fernandez added a comment - This is a very simple patch which just closes an incoming connection if the frame size is too big. It probably doesn't address all the implied issues, so be careful. As for adding limits to string, bytes, etc. I'm not against it, but I believe it belongs in a separate ticket, as it will affect all of the languages supported by Thrift.
          Hide
          Brandon Williams added a comment -

          +1, this patch works for me if I tweak DEFAULT_MAX_LENGTH to a sane value. The connection is closed, but the server does not crash.

          Show
          Brandon Williams added a comment - +1, this patch works for me if I tweak DEFAULT_MAX_LENGTH to a sane value. The connection is closed, but the server does not crash.
          Hide
          Jonathan Ellis added a comment -

          I don't know how valuable this is, though...

          (1) you can still crash the server with a bogus int anywhere but the frame size
          (2) it requres client-side support to warn users if they're trying to build a command that is too long for the frame size – and since the default is a no-op, the server needs to communicate what the actual limit is, if any, to the client first. is that protocol change worth it, given the limitations of (1)?

          Show
          Jonathan Ellis added a comment - I don't know how valuable this is, though... (1) you can still crash the server with a bogus int anywhere but the frame size (2) it requres client-side support to warn users if they're trying to build a command that is too long for the frame size – and since the default is a no-op, the server needs to communicate what the actual limit is, if any, to the client first. is that protocol change worth it, given the limitations of (1)?
          Hide
          Esteve Fernandez added a comment -

          You're right Jonathan, maybe the framed transports should advertise their maximum length? It's not perfect either, but it's a small step in the right direction

          Show
          Esteve Fernandez added a comment - You're right Jonathan, maybe the framed transports should advertise their maximum length? It's not perfect either, but it's a small step in the right direction
          Hide
          Juho Mäkinen added a comment -

          I have crashed an application using thrift as a server a few times by telnetting to a wrong port. This is the only case I know telnetting to a wrong port will cause a service crash and it's just stupid. Hopefully a sane fix can be found as soon as possible.

          Show
          Juho Mäkinen added a comment - I have crashed an application using thrift as a server a few times by telnetting to a wrong port. This is the only case I know telnetting to a wrong port will cause a service crash and it's just stupid. Hopefully a sane fix can be found as soon as possible.
          Hide
          James E. King, III added a comment -

          Would a simple port scan (open socket, close) cause a crash as well? This looks like it will be runtime implementation specific - are any others affected?

          Show
          James E. King, III added a comment - Would a simple port scan (open socket, close) cause a crash as well? This looks like it will be runtime implementation specific - are any others affected?
          Hide
          Bryan Duxbury added a comment -

          How about this patch? It makes sure that any kind of error while processing is caught and won't take the server down. As it was already coded, this was the objective, but OutOfMemoryErrors turn out not to be Exception descendants. Oops.

          Show
          Bryan Duxbury added a comment - How about this patch? It makes sure that any kind of error while processing is caught and won't take the server down. As it was already coded, this was the objective, but OutOfMemoryErrors turn out not to be Exception descendants. Oops.
          Hide
          David Reiss added a comment -

          I don't think catching OOM from the handler is a good idea. In C++, at least, we have found that it is rarely recoverable at the process level and keeping the process running in this state just prevents your system from recovering.

          Show
          David Reiss added a comment - I don't think catching OOM from the handler is a good idea. In C++, at least, we have found that it is rarely recoverable at the process level and keeping the process running in this state just prevents your system from recovering.
          Hide
          Jonathan Ellis added a comment -

          the same applies to the JVM in my experience wrt OOME.

          Show
          Jonathan Ellis added a comment - the same applies to the JVM in my experience wrt OOME.
          Hide
          Bryan Duxbury added a comment -

          Put that way, it's basically impossible to catch an OOM ever, and the only real solution is to change the way we allocate memory. You can already do this in some ways, but it seems cumbersome.

          We could impose a thrift-wide limit on the size of method names, which would help a bit. However, it wouldn't help in situations when the server genuinely runs out of memory, for instance as a result of a legitimately overlarge RPC call. Per-field limits don't really seem like a solution either, I think, because a lot of people are likely to set the limits to "unlimited".

          I believe that the Nonblocking server (as well as HsHa) won't really suffer from this problem, as long as you set a sane read buffer size. Maybe we should make the readbuffer setting mandatory? Or perhaps expand the readbuffer thing to the thread pool server?

          Show
          Bryan Duxbury added a comment - Put that way, it's basically impossible to catch an OOM ever, and the only real solution is to change the way we allocate memory. You can already do this in some ways, but it seems cumbersome. We could impose a thrift-wide limit on the size of method names, which would help a bit. However, it wouldn't help in situations when the server genuinely runs out of memory, for instance as a result of a legitimately overlarge RPC call. Per-field limits don't really seem like a solution either, I think, because a lot of people are likely to set the limits to "unlimited". I believe that the Nonblocking server (as well as HsHa) won't really suffer from this problem, as long as you set a sane read buffer size. Maybe we should make the readbuffer setting mandatory? Or perhaps expand the readbuffer thing to the thread pool server?
          Hide
          David Reiss added a comment -

          Sorry, I don't understand why setting a max read length on the TBinaryProtocol doesn't make this go away. Evernote (which contributed that feature) has exposed their Thrift server to the Internet, so I assume they are not getting OOMs all over the place.

          Show
          David Reiss added a comment - Sorry, I don't understand why setting a max read length on the TBinaryProtocol doesn't make this go away. Evernote (which contributed that feature) has exposed their Thrift server to the Internet, so I assume they are not getting OOMs all over the place.
          Hide
          David Reiss added a comment -

          Sorry, I wrote my response before I read yours. I guess I meant setting the read length on the binary protocol plus the patch to limit the frame size.

          Show
          David Reiss added a comment - Sorry, I wrote my response before I read yours. I guess I meant setting the read length on the binary protocol plus the patch to limit the frame size.
          Hide
          David Reiss added a comment -

          Bryan, I think that if we apply Esteve's patch and suggest that users set a sane max frame size (for the transport frame) and max message size (in TBinaryProtocol), that will solve the problem. I think we should do this. Do you agree?

          Show
          David Reiss added a comment - Bryan, I think that if we apply Esteve's patch and suggest that users set a sane max frame size (for the transport frame) and max message size (in TBinaryProtocol), that will solve the problem. I think we should do this. Do you agree?
          Hide
          Bryan Duxbury added a comment -

          Sounds OK to me.

          Show
          Bryan Duxbury added a comment - Sounds OK to me.
          Hide
          David Reiss added a comment -

          Okay. I've tested out this patch and it fixes the large frame problem for the thread pool server. There is still the problem of protocol-level problems (like the one in the bug report). It turns out that TBinaryProtocol.Factory doesn't have a way to call setReadLength on the protocol objects it creates, so I'll have to add that.

          Show
          David Reiss added a comment - Okay. I've tested out this patch and it fixes the large frame problem for the thread pool server. There is still the problem of protocol-level problems (like the one in the bug report). It turns out that TBinaryProtocol.Factory doesn't have a way to call setReadLength on the protocol objects it creates, so I'll have to add that.
          Hide
          David Reiss added a comment -

          This implements the change I mentioned. With this, the HsHa server and threadpool server appear to be noise-proof, including the case of a valid frame followed by noise.

          Show
          David Reiss added a comment - This implements the change I mentioned. With this, the HsHa server and threadpool server appear to be noise-proof, including the case of a valid frame followed by noise.
          Hide
          Bryan Duxbury added a comment -

          Looks good to me.

          Show
          Bryan Duxbury added a comment - Looks good to me.
          Hide
          David Reiss added a comment -

          Sorry, I messed up the commit message for r938206, which is also relevant to this issue.

          Show
          David Reiss added a comment - Sorry, I messed up the commit message for r938206, which is also relevant to this issue.
          Hide
          Ryan King added a comment -

          It appears that this doesn't cover every case. If you give it less garbage than the frame size (or what it things the frame size is) it will hang forever in the selector thread, which will stall the whole server.

          Show
          Ryan King added a comment - It appears that this doesn't cover every case. If you give it less garbage than the frame size (or what it things the frame size is) it will hang forever in the selector thread, which will stall the whole server.
          Hide
          Bryan Duxbury added a comment -

          @Ryan - I'm not sure I follow your comment. If you send a large frame size and then never provide the data, it's true that that socket will "hang", insofar as there won't ever be a timeout or whatnot, but it won't prevent the selector from servicing other sockets. The socket channel read methods never block - they'll only read as much as is available.

          Show
          Bryan Duxbury added a comment - @Ryan - I'm not sure I follow your comment. If you send a large frame size and then never provide the data, it's true that that socket will "hang", insofar as there won't ever be a timeout or whatnot, but it won't prevent the selector from servicing other sockets. The socket channel read methods never block - they'll only read as much as is available.

            People

            • Assignee:
              David Reiss
              Reporter:
              Eric Evans
            • Votes:
              5 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development