Details
Description
OOME happens in Ozone integration tests. Currently Xmx=2g, but increasing it does not help.
[INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA Error: java.lang.OutOfMemoryError: Java heap space Error: Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
java.lang.OutOfMemoryError: Java heap space at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source) at java.lang.Thread.run(Thread.java:750)
Ozone registers JMX reporter (this is not new):
MetricRegistries.global() .addReporterRegistration(MetricsReporting.jmxReporter(), MetricsReporting.stopJmxReporter());
Based on the heap dump and test log, SegmentedRaftLogWorker instances are retained by JmxMBeanServer after close().
The problem is probably not new, but its effect is much worse now, because SegmentedRaftLogWorker recently got a shared buffer (RATIS-1717).
raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
See screenshot for GC root.
Attachments
Attachments
Issue Links
- relates to
-
RATIS-1717 Perf: Use global serialize buf to avoid temp buf
- Resolved
-
RATIS-1741 Add a removeReporterRegistration method
- Resolved
- links to