Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7386

Duplicate Strings in various places in Yarn memory

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None
    • Reviewed

    Description

      Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a big cluster. The tool uncovered several sources of memory waste. One problem is duplicate strings:

      Total strings 	  Unique strings 	  Duplicate values 	 Overhead 
       361,506	 86,672	 5,928	22,886K (7.6%)
      

      They are spread across a number of locations. The biggest source of waste is the following reference chain:

      
      7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays:
      ↖{j.u.HashMap}.values
      ↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment
      ↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer
      ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext
      ↖{java.util.concurrent.ConcurrentHashMap}.values
      ↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications
      ↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext
      ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext
      ↖Java Local@3ed9ef820 (org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor)
      

      However, there are also many others. Mostly they are strings in proto buffer or proto buffer builder objects. I plan to get rid of at least the worst offenders by inserting String.intern() calls. String.intern() used to consume memory in PermGen and was not very scalable up until about the early JDK 7 versions, but has greatly improved since then, and I've used it many times without any issues.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            misha@cloudera.com Misha Dmitriev Assign to me
            misha@cloudera.com Misha Dmitriev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment