[ARTEMIS-2317] Long TTSP caused by Page::read using mmap read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.0, 2.8.0
Fix Version/s: 2.8.1
Component/s: None
Labels:
None

Description

Page::read is using a readonly mmap to read paged messages:
if the OS mapped regions accessed are not into the OS page cache it can cause several major page faults that would lead to suffer very long time to safepoint pauses (it can be seen by enabling -XX:+PrintGCApplicationStoppedTime).
Such pauses can delay significantly the GC work in a way similar to long Stop-Of-The-World pauses, blocking the broker long enough that any connected client will consider it dead or making the broker itself to suicide by shutdown.

The original proposal to use mmap read has been used to avoid Page::read to allocate big direct ByteBuffers just to read entirely the paged messages from the filesystem: implementing chunked reading of those files while re-using the read ByteBuffer would allow to reduce the number of syscalls to read the file, avoiding the long time to safepoint pauses too.
Any OS pauses on JNI (ie NIO FileChannel::read) won't cause any safepoint delay (ie JNI calls are IN a safepoint,not between).

Attachments

Issue Links

links to

GitHub Pull Request #2633

GitHub Pull Request #2646

Activity

People

Assignee:: Unassigned

Reporter:: Francesco Nigro

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Apr/19 14:39

Updated:: 16/May/19 15:12

Resolved:: 16/May/19 15:12

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 10m