Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1743

Memory leak in SegmentedRaftLogWorker due to metrics

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.0.0, 2.4.1
    • 3.0.0, 2.4.1
    • server
    • None

    Description

      OOME happens in Ozone integration tests. Currently Xmx=2g, but increasing it does not help.

      [INFO] Running org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
      Error:  java.lang.OutOfMemoryError: Java heap space
      Error:  Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 426.774 s <<< FAILURE! - in org.apache.hadoop.ozone.scm.TestStorageContainerManagerHA
      
      java.lang.OutOfMemoryError: Java heap space
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.lambda$new$4(SegmentedRaftLogWorker.java:223)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$603/1771708635.get(Unknown Source)
      	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.write(SegmentedRaftLogOutputStream.java:101)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$WriteLog.execute(SegmentedRaftLogWorker.java:568)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:320)
      	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$$Lambda$595/1626598428.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:750)
      

      Ozone registers JMX reporter (this is not new):

      Based on the heap dump and test log, SegmentedRaftLogWorker instances are retained by JmxMBeanServer after close().

      The problem is probably not new, but its effect is much worse now, because SegmentedRaftLogWorker recently got a shared buffer (RATIS-1717).

      config in Ozone
      raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
      

      See screenshot for GC root.

      CC Tsz-wo Sze, Song Ziyang

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            szetszwo Tsz-wo Sze
            adoroszlai Attila Doroszlai
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4.5h
                4.5h

                Slack

                  Issue deployment