Description
This ticket is for the correct long term fix to TS-4207
pulled from a comment, which wraps up the issue
Leif Hedstrom I have spent a decent amount of time on this while I was OOO on vacation the last couple of weeks. It seems that the root cause of this issue has always existed, and that the addition of always doing hostname storing (https://github.com/apache/trafficserver/commit/0e703e1e) we are just causing the issue to happen all the time.
To understand the issue I'll give a little background in how hostdb is currently working. Basically hostdb is just a wrapper around this templated struct called MultiCache. MultiCache is "multi" not because it is templated, but because it has two types of storage (static-- blocks and dynamic-- alloc). The static side of the cache can hold N HostDBInfo structs (the results of DNS queries). The dynamic side is used to store the round robin records and various strings associated with the record. The size of this dynamic space is defined as (N x [estimated_heap_bytes_per_entry. The basic problem we are running into is that we are putting too much preassure on the dynamic heap-- such that the heap is getting re-used while people still have references to items in that space.
So, I've actually been working on re-writing MultiCache to allocate the entire required block at once (so we don't have this problem where the parent exists but not the children), but I'm not certain if we want such a change to go into the 6.x branch (I'm willing to discuss if we want). If we aren't comfortable with such a large change I suggest just accounting for the hostname size in the estimated_heap_bytes_per_entry as a stopgap solution. The maximum allowable size is 253 (so 254 with null terminator), but we could pick a smaller number (~120 or so seems to be more reasonable). Alternatively you can increase the number of records in hostdb (and the size accordingly) to increase the dynamic heap size.
TLDR; almost done with the long term solution, but I'm not sure if we want to merge that into 6.x-- alternatively we can do a simple workaround in 6.x (https://github.com/apache/trafficserver/pull/553)
Attachments
Issue Links
- contains
-
TS-2403 Segfault when HostDB full
- Closed
- is related to
-
TS-4645 traffic_top doesn't start
- Closed
- is required by
-
TS-4207 Crash in HostDB, likely a regression from 5.x
- Closed
- relates to
-
TS-2403 Segfault when HostDB full
- Closed
-
TS-4232 Crash in HostDB,During debug message generating
- Closed
-
TS-4276 Segmentation fault when hostdb runs out of space
- Closed
-
TS-4278 HostDB sync causes active transactions to block for 100's of ms
- Closed
-
TS-4602 Cleanup HttpSM's references to HostDBInfo
- Open
-
TS-3166 HostDB Upgrade
- Closed
- links to