Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-3805

Golang server susceptible to memory spike from malformed message

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.9.3
    • 0.10.0
    • Go - Library
    • None
    • OSX 10.10.5, Debian 7.10

    Description

      When a thrift server is started through the Go library, using the default TServerSocket, the processor from the generated files, and a SimpleServer2 (which uses BinaryProtocol), it listens on the created socket for incoming messages. However, if the incoming message is not from another Thrift server, and thus the data submitted over the connection does not conform to the Thrift message serialization format (for us, this was triggered by automated security scanners connecting to the exposed port and sending it data to see how it would react), it triggers a massive memory spike.

      This appears to happen because the BinaryProtocol expects the first 4 bytes of the file to indicate the size of the following message, interpreted as a signed Int32 in BigEndian format. If this value is positive, it reads that many bytes from the connection.

      However, to do this, it first allocates a slice of that many bytes. If the provided message is not a thrift serialization, this signed Int32 can be massive (we're typically seeing it have a value of 1-2 billion), causing it to allocate hundreds of megabytes of data. This slice is retained until the expected number of bytes are received (which generally doesn't happen for a malformed connection), or until the connection is closed from the client end. When either happen, the BinaryProtocol then type-casts the entire array to a String type, which causes a second allocation of equal size, before returning the message to be processed.

      Since this processing fails, both the byte slice and the string are discarded and reclaimed on the next garbage collection cycle, but in the mean type, it will have allocated an amount of memory equal to double the value of that Int32, in bytes. We're typically seeing this allocation exceed 800-1000 MB per variable (so 800 MB when the connection is received, and another 800-1000 MB when it's closed). The first allocation will persist as long as the connection is maintained, and other connections that are similarly malformed will cause concurrent allocation spikes, which can easily crash the server.

      To demonstrate this, one can make an extremely simple single-method thrift contract, start a server, then netcat to the host port and feed it at least 4 characters of data.

      To fix this, we implemented an iterative buffered read into the BinaryProtocol itself. Instead of allocating the entire array at once, if the message indicates it is larger than a certain size (we have it set at 32kb, but the actual value is largely arbitrary), it instead reads the message in increments of that size, and joins them using the standard library bytes.Buffer type. Here's a diff of our changes:

      https://gist.github.com/kaedys/53b8fd05690d5d8f202afa7878d6e3d5

      This fixes the issue of having a massive initial allocation due to a malformed request, while still allowing messages of arbitrary incoming size. It will likely marginally slow down the reading of extremely large messages (ones substantially larger than the constant defined), but the bytes.Buffer type is well optimized for growing itself as it is written to. I also realize that this only occurs with malformed requests to the server, but I would think a server package should be relatively hardened against malformed data being submitted to it.

      Attachments

        1. thrift_golang_memory_spike_fix.patch
          2 kB
          Michael Scott Leuthaeuser

        Activity

          People

            kaedys Michael Scott Leuthaeuser
            kaedys Michael Scott Leuthaeuser
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: