Pivot
  1. Pivot
  2. PIVOT-778

Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

    Details

      Description

      We are writing sort of a game, which continually calls Component.repaint method, at 60 FPS. We noticed excessive CPU usage, although the actual amount of painting done by our component (actually in an overriden Panel.paint) is ridiculously small. The profiler pointed us to the paintVolatileBuffered method in the DisplayHost. What you are doing there is:

      1. obtain a new, fresh BufferedImage of size equal to the actual clip region, let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory region (not in the L2 cache)
      2. then you call actual paint on that buffered image (this is touching at least 5.2 MB again)
      3. then you copy that to the onscreen buffer (which means copying 5.2 MB for another time)
      4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in memory to compact young generation (= touching 5.2 MB fourth time)

      The whole process means allocating from cold memory 5.2 MB per each frame and touching about 20 MB per frame.
      For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. It also makes the GC go crazy.

      We have found that caching the buffer between the subsequent paint calls improves performance a lot:

      <code>
      /** Stores the prepared offscreen buffer */
      private BufferedImage bufferedImage;

      /**

      • Attempts to paint the display using an offscreen buffer.
        *
      • @param graphics
      • The source graphics context.
        *
      • @return
      • <tt>true</tt> if the display was painted using the offscreen
      • buffer; <tt>false</tt>, otherwise.
        */
        private boolean paintBuffered(Graphics2D graphics) {
        boolean painted = false;

      // Paint the display into an offscreen buffer
      GraphicsConfiguration gc = graphics.getDeviceConfiguration();
      java.awt.Rectangle clipBounds = graphics.getClipBounds();
      if (bufferedImage == null ||
      bufferedImage.getWidth() < clipBounds.width ||
      bufferedImage.getHeight() < clipBounds.height)

      bufferedImage = gc.createCompatibleImage(clipBounds.width, clipBounds.height,
      Transparency.OPAQUE);

      if (bufferedImage != null) {
      Graphics2D bufferedImageGraphics = (Graphics2D)bufferedImage.getGraphics();
      bufferedImageGraphics.setClip(0, 0, clipBounds.width,
      ...
      </code>

      Advantages:
      1. it saves from costly allocation of a large object from possibly not-cached memory region
      2. after a few repaints the GC moves this object to the tenured generation, so that the young generation collector is much more efficient (longer times between runs)
      3. the image probably stays most of the time in the L2 or L3 cache, which saves on memory bandwidth and speeds up painting

      Disadvantages:
      1. uses some memory that is probably not required all the time, when the app doesn't need to repaint anything large, however this is almost completely shadowed by the excessive GC overhead due to continuous recreation of the offscreen buffered image

      Anyway, we observed about 2-4x performance increase by this simple change - now when running at 60 FPS it uses only about 25% of CPU for painting, and the rest can be used by the application logic (AI, etc.). Previously 60 FPS was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this change won't affect any "business applications" that don't do animations etc.

        Activity

        Hide
        Chris Bartlett added a comment -

        Piotr - This looks interesting, but it would really help if you could supply an example that can be used as a simple benchmark. Something that paints with the current method and then with the optimised method.

        Show
        Chris Bartlett added a comment - Piotr - This looks interesting, but it would really help if you could supply an example that can be used as a simple benchmark. Something that paints with the current method and then with the optimised method.
        Hide
        Greg Brown added a comment -

        Way back, I believe we were doing something like this but switched to the current approach to save on memory. It may be reasonable to offer a switch to toggle this behavior.

        But I'm wondering why you need to repaint the entire 1280x1024 frame every time. Is it not possible for you to only update the dirty region of your frame during your animation?

        Show
        Greg Brown added a comment - Way back, I believe we were doing something like this but switched to the current approach to save on memory. It may be reasonable to offer a switch to toggle this behavior. But I'm wondering why you need to repaint the entire 1280x1024 frame every time. Is it not possible for you to only update the dirty region of your frame during your animation?
        Hide
        Noel Grandin added a comment -

        We could always use a WeakReference to allow the GC to collect it if necessary.

        But yes, it sounds like Piotr's application should be using partial repaints.

        Show
        Noel Grandin added a comment - We could always use a WeakReference to allow the GC to collect it if necessary. But yes, it sounds like Piotr's application should be using partial repaints.
        Hide
        Greg Brown added a comment -

        Actually, since it is a volatile image, a weak reference might not even be necessary. The graphics system will reclaim any memory consumed by a volatile image buffer if it needs to.

        Show
        Greg Brown added a comment - Actually, since it is a volatile image, a weak reference might not even be necessary. The graphics system will reclaim any memory consumed by a volatile image buffer if it needs to.
        Hide
        Sandro Martini added a comment -

        So, given the great speedup that Piotr say, what do you think it's the case to do:
        use a workaround in application code to reduce the area for repaints, and move this issue in the 2.1 and there implement a flag for the desired behavior ? Or other ?

        Show
        Sandro Martini added a comment - So, given the great speedup that Piotr say, what do you think it's the case to do: use a workaround in application code to reduce the area for repaints, and move this issue in the 2.1 and there implement a flag for the desired behavior ? Or other ?
        Hide
        Sandro Martini added a comment -

        Piotr, excuse me, can you post here a minimal sample (and maybe a minimal benchmark), so we can see what to do (if possible in 2.0.1) ? Otherwise we have to move it to 2.1 for timing constraints ...

        If possible (and this still has to be verified) I'm thinking if keep the current behavior as default, BUT add a startup flag to enable your pipeline as "alternative" ... what do you think (and others) ?

        Show
        Sandro Martini added a comment - Piotr, excuse me, can you post here a minimal sample (and maybe a minimal benchmark), so we can see what to do (if possible in 2.0.1) ? Otherwise we have to move it to 2.1 for timing constraints ... If possible (and this still has to be verified) I'm thinking if keep the current behavior as default, BUT add a startup flag to enable your pipeline as "alternative" ... what do you think (and others) ?
        Hide
        Piotr Kołaczkowski added a comment -

        "But I'm wondering why you need to repaint the entire 1280x1024 frame every time. Is it not possible for you to only update the dirty region of your frame during your animation? "

        I'm not repainting the entire frame, but only one of the components, which unfortunately usually is one of the biggest visible on the screen. The animations span the whole component, so it is not possible to make it "repaint less". I think the idea to leave the current behaviour as default and add a flag for enabling of "caching" the image would be ok.

        Show
        Piotr Kołaczkowski added a comment - "But I'm wondering why you need to repaint the entire 1280x1024 frame every time. Is it not possible for you to only update the dirty region of your frame during your animation? " I'm not repainting the entire frame, but only one of the components, which unfortunately usually is one of the biggest visible on the screen. The animations span the whole component, so it is not possible to make it "repaint less". I think the idea to leave the current behaviour as default and add a flag for enabling of "caching" the image would be ok.
        Hide
        Piotr Kołaczkowski added a comment -

        BTW: I checked recently how they do that in Swing. And I was right - they cache the VolatileImages, and even the API documentation of the VolatileImage mentions example code with caching assumed (see the example code here: http://download.oracle.com/javase/6/docs/api/java/awt/image/VolatileImage.html)

        Part of the relevant Swing code:

        <code>
        public Image getVolatileOffscreenBuffer(Component c,
        int proposedWidth,int proposedHeight) {
        RepaintManager delegate = getDelegate(c);
        if (delegate != null)

        { return delegate.getVolatileOffscreenBuffer(c, proposedWidth, proposedHeight); }

        // If the window is non-opaque, it's double-buffered at peer's level
        Window w = (c instanceof Window) ? (Window)c : SwingUtilities.getWindowAncestor(c);
        if (!AWTAccessor.getWindowAccessor().isOpaque(w)) {
        Toolkit tk = Toolkit.getDefaultToolkit();
        if ((tk instanceof SunToolkit) && (((SunToolkit)tk).needUpdateWindow()))

        { return null; }

        }

        GraphicsConfiguration config = c.getGraphicsConfiguration();
        if (config == null)

        { config = GraphicsEnvironment.getLocalGraphicsEnvironment(). getDefaultScreenDevice().getDefaultConfiguration(); }

        Dimension maxSize = getDoubleBufferMaximumSize();
        int width = proposedWidth < 1 ? 1 :
        (proposedWidth > maxSize.width? maxSize.width : proposedWidth);
        int height = proposedHeight < 1 ? 1 :
        (proposedHeight > maxSize.height? maxSize.height : proposedHeight);
        VolatileImage image = volatileMap.get(config); // <-- ******** HERE they get the cached image for the current config **************
        if (image == null || image.getWidth() < width ||
        image.getHeight() < height) {
        if (image != null)

        { image.flush(); }

        image = config.createCompatibleVolatileImage(width, height);
        volatileMap.put(config, image); // ********* And here they cache the new image for future use ***********
        }
        return image;
        }
        </code>

        The caching is very important, because you can have absolutely no guarantee the VolatileImage is truly created in VRAM. I noticed that for example on my system, VolatileImages are created on the heap and are not hardware accelerated (by inspecting the isAcceleratedFlag of ImageCapabilities). Therefore if they are not cached, they cause excessive stress on the GC, which is what I observed at the beginning.

        Anyway, this is interesting thing, why actually the VolatileImages are not accelerated on my system, but this is another issue, not related to Pivot. The drivers are up to date, the dxdiag utility reports acceleration is enabled, but the hardware checks of the D3D Java2D pipeline fail...

        Show
        Piotr Kołaczkowski added a comment - BTW: I checked recently how they do that in Swing. And I was right - they cache the VolatileImages, and even the API documentation of the VolatileImage mentions example code with caching assumed (see the example code here: http://download.oracle.com/javase/6/docs/api/java/awt/image/VolatileImage.html ) Part of the relevant Swing code: <code> public Image getVolatileOffscreenBuffer(Component c, int proposedWidth,int proposedHeight) { RepaintManager delegate = getDelegate(c); if (delegate != null) { return delegate.getVolatileOffscreenBuffer(c, proposedWidth, proposedHeight); } // If the window is non-opaque, it's double-buffered at peer's level Window w = (c instanceof Window) ? (Window)c : SwingUtilities.getWindowAncestor(c); if (!AWTAccessor.getWindowAccessor().isOpaque(w)) { Toolkit tk = Toolkit.getDefaultToolkit(); if ((tk instanceof SunToolkit) && (((SunToolkit)tk).needUpdateWindow())) { return null; } } GraphicsConfiguration config = c.getGraphicsConfiguration(); if (config == null) { config = GraphicsEnvironment.getLocalGraphicsEnvironment(). getDefaultScreenDevice().getDefaultConfiguration(); } Dimension maxSize = getDoubleBufferMaximumSize(); int width = proposedWidth < 1 ? 1 : (proposedWidth > maxSize.width? maxSize.width : proposedWidth); int height = proposedHeight < 1 ? 1 : (proposedHeight > maxSize.height? maxSize.height : proposedHeight); VolatileImage image = volatileMap.get(config); // <-- ******** HERE they get the cached image for the current config ************** if (image == null || image.getWidth() < width || image.getHeight() < height) { if (image != null) { image.flush(); } image = config.createCompatibleVolatileImage(width, height); volatileMap.put(config, image); // ********* And here they cache the new image for future use *********** } return image; } </code> The caching is very important, because you can have absolutely no guarantee the VolatileImage is truly created in VRAM. I noticed that for example on my system, VolatileImages are created on the heap and are not hardware accelerated (by inspecting the isAcceleratedFlag of ImageCapabilities). Therefore if they are not cached, they cause excessive stress on the GC, which is what I observed at the beginning. Anyway, this is interesting thing, why actually the VolatileImages are not accelerated on my system, but this is another issue, not related to Pivot. The drivers are up to date, the dxdiag utility reports acceleration is enabled, but the hardware checks of the D3D Java2D pipeline fail...
        Hide
        Noel Grandin added a comment -

        I modified ApplicationContext to cache the VolatileImage and checked in a change in rev 1185040
        Please can you test with this and see if it helps your situation.

        Show
        Noel Grandin added a comment - I modified ApplicationContext to cache the VolatileImage and checked in a change in rev 1185040 Please can you test with this and see if it helps your situation.
        Hide
        Sandro Martini added a comment -

        Hi Noel, thanks for looking at it.

        Show
        Sandro Martini added a comment - Hi Noel, thanks for looking at it.
        Hide
        Sandro Martini added a comment -

        Reassigned to 2.0.1 to see if we are able to put some enhancement in 2.0.1, and maybe after put a long-term solution in 2.1 (if needed).

        Show
        Sandro Martini added a comment - Reassigned to 2.0.1 to see if we are able to put some enhancement in 2.0.1, and maybe after put a long-term solution in 2.1 (if needed).
        Hide
        Piotr Kołaczkowski added a comment -

        It is ok with the version from the repository, you can close it as fixed.

        Show
        Piotr Kołaczkowski added a comment - It is ok with the version from the repository, you can close it as fixed.
        Hide
        Chris Bartlett added a comment -

        Piotr - Which version did you test with? r1186817 is the latest of 3 commits relating to this issue.

        Author: noelgrandin
        Date: Thu Oct 20 14:03:31 2011
        New Revision: 1186817

        (See this dev mailing list thread for more info)
        http://apache-pivot-developers.417237.n3.nabble.com/Re-svn-commit-r1185040-pivot-trunk-wtk-src-org-apache-pivot-wtk-ApplicationContext-java-tp3436848p3436848.html

        Show
        Chris Bartlett added a comment - Piotr - Which version did you test with? r1186817 is the latest of 3 commits relating to this issue. Author: noelgrandin Date: Thu Oct 20 14:03:31 2011 New Revision: 1186817 (See this dev mailing list thread for more info) http://apache-pivot-developers.417237.n3.nabble.com/Re-svn-commit-r1185040-pivot-trunk-wtk-src-org-apache-pivot-wtk-ApplicationContext-java-tp3436848p3436848.html
        Hide
        Sandro Martini added a comment -

        Latest version of the fix (committed) from Noel probably solves this completely, but a critical fix like this needs more tests, so moved to 2.1 (and need to be reverted from committed sources).

        Noel, can you attach here a patch with related changes (so if someone wants to patch Pivot 2.0.1) could do it ?

        Thank tyou very much,
        Sandro

        Show
        Sandro Martini added a comment - Latest version of the fix (committed) from Noel probably solves this completely, but a critical fix like this needs more tests, so moved to 2.1 (and need to be reverted from committed sources). Noel, can you attach here a patch with related changes (so if someone wants to patch Pivot 2.0.1) could do it ? Thank tyou very much, Sandro
        Hide
        Piotr Kołaczkowski added a comment -

        Works for me in the latest trunk.

        Show
        Piotr Kołaczkowski added a comment - Works for me in the latest trunk.

          People

          • Assignee:
            Noel Grandin
            Reporter:
            Piotr Kołaczkowski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development