[HBASE-19215] Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.1, 1.2.6
Fix Version/s: 1.4.0, 1.3.2, 2.0.0-beta-1, 2.0.0, 1.2.7
Component/s: rpc
Labels:
None

Hadoop Flags:

Reviewed

Description

Ran into the situation of oome on the client : java.lang.OutOfMemoryError: Direct buffer memory.
When we encounter an unhandled exception during channel write at RpcClientImpl

checkIsOpen(); // Now we're checking that it didn't became idle in between.

        try {
          call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, call.param,
              cellBlock));
        } catch (IOException e) {

we end up leaving the connection open. This becomes especially problematic when we get an unhandled exception between writing the length of our request on the channel and subsequently writing the params and cellblocks

   *dos.write(Bytes.toBytes(totalSize));*
    // This allocates a buffer that is the size of the message internally.
    header.writeDelimitedTo(dos);
    if (param != null) param.writeDelimitedTo(dos);
    if (cellBlock != null) dos.write(cellBlock.array(), 0, cellBlock.remaining());
    dos.flush();
    return totalSize;

After reading the length rs allocates a bb and expects data to be filled. However when we encounter an exception during param write we release the writelock in rpcclientimpl and do not close the connection, the exception is handled at AbstractRpcClient.callBlockingMethod and retried. Now the next client request to the same rs writes to the channel however the server interprets this as part of the previous request and errors out during proto conversion when processing the request since its considered malformed(in the worst case this might be misinterpreted as wrong data?). Now the remaining data of the current request is read(the current request's size > prev request's allocated partially filled bytebuffer) and is misinterpreted as the size of new request, in my case this was in gbs. All the client requests time out since this bytebuffer is never completely filled. We should close the connection for any Throwable and not just ioexception.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-19215.branch-1.001.patch
13/Nov/17 11:47
1 kB
Abhishek Singh Chouhan
HBASE-19215-branch-1.3.patch
14/Nov/17 02:17
2 kB
Andrew Kyle Purtell

Activity

People

Assignee:: Abhishek Singh Chouhan

Reporter:: Abhishek Singh Chouhan

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 08/Nov/17 11:26

Updated:: 01/Aug/18 06:22

Resolved:: 14/Nov/17 02:22