[AVRO-1013] NettyTransceiver can hang after server restart - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.6.1
Fix Version/s: 1.6.2
Component/s: java
Labels:
None

Description

I ran into a very specific scenario today which can lead to NettyTransceiver hanging indefinitely:

Start up a NettyServer
Initialize a NettyTransceiver and SpecificRequestor
Execute an RPC to establish the connection/handshake with the server
Shut down the server
Immediately execute another RPC

After Step 4, NettyTransceiver will detect that the connection has been closed and call NettyTransceiver#disconnect(boolean, boolean, Throwable), which sets 'remote' to null, indicating to Requestor that the NettyTransceiver is now disconnected. However, if an RPC is executed just after the server has closed its socket (Step 5) and before disconnect() has been called, NettyTransceiver may still try to send this RPC because 'remote' has not yet been set to null. This race condition is normally ok because NettyTransceiver#getChannel() will detect that the socket has been closed and then try to reestablish the connection. Unfortunately, in this scenario getChannel() blocks forever when it attempts to acquire the write lock because the read lock has been acquired twice rather than once as getChannel() expects. The read lock is acquired once by transceive(List<ByteBuffer>, Callback<List<ByteBuffer>>) and again by writeDataPack(NettyDataPack).

The fix is fairly simple. The writeDataPack(NettyDataPack) method (which is private) does not acquire the read lock but specifies in its contract that the read lock must acquired before calling this method. This change prevents the read lock from being acquired more than once by any single thread. Another change is to have NettyTransceiver#isConnected() perform two checks instead of one: remote != null && isChannelReady(channel). This second change should allow NettyTransceiver to detect disconnect events more quickly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AVRO-1013.patch
30/Jan/12 00:44
3 kB
James Baldassari

Activity

People

Assignee:: James Baldassari

Reporter:: James Baldassari

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 30/Jan/12 00:02

Updated:: 15/Feb/12 00:46

Resolved:: 08/Feb/12 03:42