[PIVOT-778] Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0, 2.0.1
Fix Version/s: 2.0.1
Component/s: wtk
Labels:
- DisplayHost
- caching
- gc
- paint
- performance
- repaint

Description

We are writing sort of a game, which continually calls Component.repaint method, at 60 FPS. We noticed excessive CPU usage, although the actual amount of painting done by our component (actually in an overriden Panel.paint) is ridiculously small. The profiler pointed us to the paintVolatileBuffered method in the DisplayHost. What you are doing there is:

1. obtain a new, fresh BufferedImage of size equal to the actual clip region, let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory region (not in the L2 cache)
2. then you call actual paint on that buffered image (this is touching at least 5.2 MB again)
3. then you copy that to the onscreen buffer (which means copying 5.2 MB for another time)
4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in memory to compact young generation (= touching 5.2 MB fourth time)

The whole process means allocating from cold memory 5.2 MB per each frame and touching about 20 MB per frame.
For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. It also makes the GC go crazy.

We have found that caching the buffer between the subsequent paint calls improves performance a lot:

<code>
/** Stores the prepared offscreen buffer */
private BufferedImage bufferedImage;

/**

Attempts to paint the display using an offscreen buffer.
*
@param graphics
The source graphics context.
*
@return
<tt>true</tt> if the display was painted using the offscreen
buffer; <tt>false</tt>, otherwise.
*/
private boolean paintBuffered(Graphics2D graphics) {
boolean painted = false;

// Paint the display into an offscreen buffer
GraphicsConfiguration gc = graphics.getDeviceConfiguration();
java.awt.Rectangle clipBounds = graphics.getClipBounds();
if (bufferedImage == null ||
bufferedImage.getWidth() < clipBounds.width ||
bufferedImage.getHeight() < clipBounds.height)

bufferedImage = gc.createCompatibleImage(clipBounds.width, clipBounds.height,
Transparency.OPAQUE);

if (bufferedImage != null) {
Graphics2D bufferedImageGraphics = (Graphics2D)bufferedImage.getGraphics();
bufferedImageGraphics.setClip(0, 0, clipBounds.width,
...
</code>

Advantages:
1. it saves from costly allocation of a large object from possibly not-cached memory region
2. after a few repaints the GC moves this object to the tenured generation, so that the young generation collector is much more efficient (longer times between runs)
3. the image probably stays most of the time in the L2 or L3 cache, which saves on memory bandwidth and speeds up painting

Disadvantages:
1. uses some memory that is probably not required all the time, when the app doesn't need to repaint anything large, however this is almost completely shadowed by the excessive GC overhead due to continuous recreation of the offscreen buffered image

Anyway, we observed about 2-4x performance increase by this simple change - now when running at 60 FPS it uses only about 25% of CPU for painting, and the rest can be used by the application logic (AI, etc.). Previously 60 FPS was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this change won't affect any "business applications" that don't do animations etc.

Attachments

Activity

People

Assignee:: Noel Grandin

Reporter:: Piotr Kolaczkowski

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/Jul/11 10:25

Updated:: 19/Nov/11 08:36

Resolved:: 19/Nov/11 08:36