[TS-3714] TS seems to stall between ua_begin and ua_first_read on some transactions resulting in high response times. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 5.3.0
Fix Version/s: 6.0.0
Component/s: Core
Labels:
None

Description

An example slow log showing very high ua_first_read.

ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url: http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client state: 0 serv
er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224 cache_open_rea
d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224 dns_lookup_end: 42.224
server_connect: 42.224 server_first_read: 42.229 server_read_header_done: 42.229 server_clos
e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229

Initially, I suspected that it might be caused by browser's connecting early before sending any bytes to TS. However, this seems to be happening too frequently and with unrealistically high delay between ua_begin and ua_first_read.

I suspect it's caused due to the code that disables the read temporarily before calling TXN_START hook and re-enables it after the API call out. The read disable is done via a 0-byte do_io_read on the client vc, but, the problem is that a valid mbuf is still passed. Based on what I am seeing, it's possible this results in actually enabling the read_vio all the way uptil ssl_read_from_net for instance (if there's a race condition and there were bytes already from the client resulting in an epoll read event), which would finally disable the read since, the ntodo (nbytes) is 0. However, this may result in the epoll event being lost until a new i/o happens from the client. I'm trying out further experiments to confirm the theory. In most cases, the read buffer already has some bytes by the time the http session and http sm is created, which makes it just work. But, if there's a slight delay in the receiving bytes after making a connection, the epoll mechanism should read it, but, due to the way the temporary read disable is being done, the event may be lost (this is coz, ATS uses the epoll edge triggered mode).

Some history on this issue -

This issue has been a problem for a long time and affects both http and https requests. When this issue was first reported, our router operations team eventually closed it indicating that disabling accept threads resolved it (yzlai also reported similar observations and conclusions). It's possible that the race condition window may be slightly reduced by disabling accept threads, but, to the overall performance and through put, accept threads are very important. I now notice that the issue still exists (regardless of whether or not accept threads are enabled/disabled) and am testing further to confirm the issue.

Attachments

Issue Links

is required by

TS-3794 Enhance post SSL handshake read race condition for SPDY and H2 scenarios.

Closed

relates to

TS-3442 Delay seen in handling the client's request while running with accept thread

Open

Activity

People

Assignee:: Sudheer Vinukonda

Reporter:: Sudheer Vinukonda

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 23/Jun/15 01:04

Updated:: 16/Apr/16 14:31

Resolved:: 29/Jul/15 00:16