From 068778ba2ea2b30a80e33d314bdd4da3fb9aed88 Mon Sep 17 00:00:00 2001 From: Biju Nair Date: Tue, 28 May 2019 21:49:55 -0400 Subject: [PATCH] HBASE-15898 Document G1 GC Recommendations --- src/main/asciidoc/_chapters/performance.adoc | 201 +++++++++++++++++++ 1 file changed, 201 insertions(+) diff --git src/main/asciidoc/_chapters/performance.adoc src/main/asciidoc/_chapters/performance.adoc index 866779ca78..bc7a8c94fc 100644 --- src/main/asciidoc/_chapters/performance.adoc +++ src/main/asciidoc/_chapters/performance.adoc @@ -122,6 +122,207 @@ Robert Yokota used an automated testing framework called link:https://aphyr.com/ [[gc]] === The Garbage Collector and Apache HBase +[[g1gc]] +==== G1 Garbage Collector Best Practices + +If you use Java 1.7 or higher, the new G1 garbage collector is available to you. The +G1 garbage collector is optimized to reduce long pauses and increase throughput on +hosts with a large number of processors and a lot of memory. + +The G1 design organizes the Java heap memory into regions, based on the age of the +object in memory. The regions for the youngest objects are called _Eden_, followed + by _survivor_ and _tenured_. The number of regions assigned to the Eden, survivor +and tenured generation roles can change dynamically. The collector focuses on +regions that contain the fewest live objects (in other words, the most "garbage", +hence the name "garbage-first"), and processes those first, theoretically obtaining + the best results for its efforts. + +Objects are compacted during collection by evacuation (copying them to unused +regions), which reduces fragmentation. Evacuation of multiple regions occurs in +parallel for maximum throughput, and multiple regions can ideally be compacted +into a single target region. The CMS (concurrent mark and sweep) collector does +not compact when processing old generation objects, and thus has no strategy to +address fragmentation, apart from a global-pause full-GC. + +[[g1_internals]] +===== G1 Collector Internals + +This section provides some details to help explain the <>. +If you already understand these details, you can go straight there. + +The G1 collector typically divides the heap into around 2000 regions. All regions +have the same size, between 1MB and 32MB. References into each region are tracked, +so that they can be collected independently, without the entire heap needing to +be scanned. + +In general, the strategy for GC tuning with any collector is to avoid full +garbage collections, which are single-threaded and require all applications to +be suspended. This is often referred to as a "stop-the-world" or STW pause. Full +GCs occur when the heap is 95% full or an allocation request cannot be fulfilled. +A STW pause is more likely to occur with CMS than with G1 due the higher likelihood +of a fragmented heap, because a contiguous area of the heap is not available for a +larger allocation. + +The earlier collectors divide the heap into Eden, survivor and old generation +chunks as well, but these are each contiguous sections that are static in size. +They also have a section for "permanent objects" (permgen), which does not exist +in the G1 collector. Instead, G1 has regions allocated for "humongous" objects, which +are defined as objects comparable to or larger than the region size. These humongous +objects are treated like old-generation regions and are never defragmented except +during a full GC. + +Young generation collection is similar in both CMS and G1. Live objects +from Eden and existing survivor spaces are copied into a new survivor space. If +they have aged sufficiently, they get copied to the old-generation space. Eden +and the original survivor space are empty when this phase is complete. With G1, +collection performance is tracked and used to adjust the Eden and survivor size +to meet pause time goals. + +Old generation collection occurs when the specified occupancy limit (45% by +default for G1) is hit. The old-generation initial-mark phase first marks +reachable objects (including those reached via the young generation). With G1, +this phase is piggy-backed onto a young-GC mark phase. Next, the object graph is +transitively traversed from these roots and all reachable objects (concurrently) +marked. In the case of G1, this phase can be interrupted by young GCs as well. +Although concurrent, this phase must be completed in a timely manner, since an +unsatisfied allocation request will otherwise result in a full STW GC. + +Next, since some objects may have been updated in the prior phase, they are +marked in a short STW "re-mark" phase. G1 uses more efficient algorithm for this +purpose. + +Finally, the concurrent sweep phase follows, in which dead objects are collected +(and coalesced if contiguous). However, in the case of G1, there is no sweep. Instead +, live objects in the chosen subset of "least-live" regions are copied during +the next young GC. This functions as a type of "incremental" processing since +multiple distinct evacuations can occur prior to the next marking cycle. + +For even more details, see Oracle's +link:http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html[G1 Garbage Collector tutorial]. + +[[g1_tuning_recommendations]] +===== G1 Collector Tuning Recommendations + +====== Whether to Move to G1 + +If your application has been tuned for the CMS collector, full garbage +collections may never (or seldom) be encountered. In such situations, most +allocations might be occurring in the young generation and fragmentation may not +be a problem in the old generation. Thus, pause times may always be very short. In +this case, you won't benefit much from G1. + +On the other hand, if you do encounter long GC pauses with CMS, consider moving to +G1. The focus of G1 is to provide more predictable, uniform and smaller pauses. +However, the price can be lower throughput and additional collection overhead. +This additional overhead may be a reasonable burden in exchange for more consistent +collection behavior. + +In the case of larger (and newer) applications that have not gone through +extensive tuning, runtime memory fragmentation may not have been addressed during +application development. The typical profile of such applications is that the +rate at which objects are allocated, or the size of the objects, varies +significantly when the application is running at steady-state. The CMS allocator +has no mechanism to address fragmentation except the mechanism of last resort, +an inefficient single-threaded full GC. The G1 collector can handle this situation +much better. + +Finally, another case where the G1 collector might help is when the JVM heap is +both large and well-utilized. Newer hardware is often shipped with memory sizes +of 96 GB and above. The CMS overhead to manage objects in such heaps will +increase disproportionately; whereas G1 has the capability to focus on well-defined +subsets of the heap at a time. + +It is also important to use a current version of the JDK. Newer versions of JDK 7 +exhibit significantly improved G1 performance from the perspective of collection +pause times. + +====== Practical Tuning Advice +.G1 Garbage Collector Flags + +[options="header", cols="m,d"] +|=== +| Flag | Description +| -XX:+UseG1GC| enable the G1 collector + +| -Xms128g -Xmx128g +| specify the desired amount of heap memory to be managed. This + example uses 128 GB. + +| -XX:+UnlockExperimentalVMOptions | required to enable some of the following options + +| -XX:MaxGCPauseMillis=100 + +| the default target GC pause is 200 ms, which may be too high + for applications like HBase + +| -XX:-ResizePLAB +| use statically-sized thread-local promotion allocation + buffers, which are used when evacuating young-gen objects. + Resizing incurs a communication cost among GC threads and also + results in less consistent GC performance. + +| -XX:+ParallelRefProcEnabled +| enables multi-threaded reference processing during collection + for better performance, esp. during the re-mark phase + +| -XX:+AlwaysPreTouch +| demand-zero heap pages during initialization, rather than during + application execution; this helps reduce large GC spikes + during allocations just after application startup + +| -XX:ParallelGCThreads=(8 + (num. of logical processors - 8) * 5/8) +| recommended setting by Oracle + +| -XX:ConcGCThreads=(ParallelGCThreads / 3) +| increase from 1/4, for faster concurrent marking to ensure that + this phase completes in a timely manner + +| -XX:G1HeapWastePercent=3 +| lower from 10; this ensures mixed GCs are not aborted because + of the collector deciding there is insufficient garbage to + proceed + +| -XX:InitiatingHeapOccupancyPercent=35 (for large heaps > 100 GB) +| decrease from 45 to start collection earlier for larger heap + sizes + +| -XX:G1MixedGCLiveThresholdPercent=85 +| include regions with more than 15% garbage for collection, by + default, a lower number of regions are included, which can + potentially degrade performance (default is 65) + +| -XX:G1NewSizePercent=3 +| reduce initial minimum Eden size from 5 for smaller young GC + pause times; use 2 for 64 GB heaps, 1 for heaps >= 100 GB. + +| -XX:G1MaxNewSizePercent=10 +| lower initial max Eden size from 60 for smaller young GC pause + times +|=== + +For best performance, an explicit garbage collection should be invoked at points +in the application when it is running in a non-peak mode, such as at night, when +demand may be low. It also helps to monitor the GC logs and system utilization, and +tune the settings iteratively. See <>. + +[[g1_logging]] +====== G1 Logging + +G1 provides the ability to log a lot of information and gather operational statistics, +such as when different phases go into effect, pause times and issues encountered. +The following JVM flags can be used to enable log output. There is a small overhead +in generating this output, but the data can be invaluable in tuning the collector +for the specific use-case. + +---- + -XX:+PrintFlagsFinal + -XX:+PrintGCDateStamps + -XX:+PrintGCTimeStamps + -XX:+PrintGCDetails + -XX:+PrintReferenceGC + -XX:+PrintAdaptiveSizePolicy +---- + [[gcpause]] ==== Long GC pauses -- 2.20.1 (Apple Git-117)