Commons Imaging
  1. Commons Imaging
  2. IMAGING-66

proposed enhancement reduces load time for some image files by 40 percent

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      Tested in Windows, Linux, MacOS

      Description

      I have identified an enhancement that reduces the time required to load TIFF image by 40 percent. I have tested a modified version of Sanselan under Windows, Linux, and MacOS with consistent savings on each platform. Additionally, I suspect that this technique may be applicable to other areas of the Sanselan code base, including more popular image formats supported by Sanselan such as JPEG, PNG, etc.

      I propose to add the relevant code changes to the Sanselan code base. Once these modifications are in place, there would be an opportunity for others to look at the pro's and cons' of applying the techniques to other data formats.

      The Enhancement

      To load an image from a TIFF file, Sanselan performs extensive data processing in order to obtain RGB values for the pixels in the output image. The code for that processing appears to be well written and efficient. Once the RGB value are obtained, they are stored in a Java BufferedImage using a call to the setRGB() method.
      Unfortunately, setRGB() is an extremely inefficient method. A much, much better approach is to store the data into an integer array and defer the creation of the buffered image until all information for the image has been collected. Java has a nice (though somewhat obscure) API that lets memory in an integer array be transferred directly to a BufferedImage so that the system does not have to allocate additional memory for this procedure (a very nice feature when dealing with huge images). This change virtually eliminated the overhead for transferring data to images, which accounted for 40 percent of the time required to load images. For TIFF files, this was a reasonable approach because the TiffImageParser class always loads 4-byte image and the getGrayscaleBufferedImage() method is never used. I have not investigated the code for the other renders, but some refinement might be needed for the one-byte grayscale images.

      Steps to Integration

      In sanselan.common, a new class called ImagePrep was created. ImagePrep carries a width, height, and an integer array for storing pixels. It provides its own setRGB() method which looks just like the one in BufferedImage. Finally, it provides a method called getBufferedImage() which creates a BufferedImage from its internal the integer array when the processing is complete.

      In the TiffImageParser classes, data is read from input stream and transferred to pixel values in a series of classes known as PhotometricInterpreters. These were modified to operate on ImagePrep objects rather than BufferedImage objects. The DataReader and TiffImageParser classes were modified to pass ImagePrep objects into the photometric interpreters rather than using BufferedImages.

      At the very last step, before passing its result back to the calling method (the Sanselan main class, etc.), the TiffImageParser used the ImagePrep.getBufferedImage() to convert the result to the expected form.

        Activity

        Hide
        Gary Lucas added a comment -

        Here is an attachment of the pre-release version of the ImagePrep class. I am providing this for purposes of discussion.

        Show
        Gary Lucas added a comment - Here is an attachment of the pre-release version of the ImagePrep class. I am providing this for purposes of discussion.
        Hide
        Gary Lucas added a comment -

        The attached patch adds logic to enhance loading speed of TIFF images by about 40 percent. I tested it using the unit tests bundled with the code distribution. Javadoc was added for new features. The logic applied here could potentially be extended to other image formats, but right now only applies to TIFF.

        Show
        Gary Lucas added a comment - The attached patch adds logic to enhance loading speed of TIFF images by about 40 percent. I tested it using the unit tests bundled with the code distribution. Javadoc was added for new features. The logic applied here could potentially be extended to other image formats, but right now only applies to TIFF.
        Hide
        Sebb added a comment -

        The attachment ImagePrep.java dated 28/Sep/11 has "ASF granted license" = No, so we cannot use it.
        However, the same class is present in the patch dated 04/Oct/11 so perhaps the file is obsolete.

        Either delete the file, or delete and re-attach with grant to ASF if it is still required.

        The patch appears to contain more than just the loading speed fix. There is a change to BinaryFileFunctions.java which appears to be a separate bug-fix.

        There are also various spacing and other changes which make it quite hard to review the actual changes.
        It's a lot easier for committers if patches are restricted to the minimum required to implement the change.

        The package.html additions don't appear to be directly relevant, and the header ought to be the standard AL header.

        Show
        Sebb added a comment - The attachment ImagePrep.java dated 28/Sep/11 has "ASF granted license" = No, so we cannot use it. However, the same class is present in the patch dated 04/Oct/11 so perhaps the file is obsolete. Either delete the file, or delete and re-attach with grant to ASF if it is still required. The patch appears to contain more than just the loading speed fix. There is a change to BinaryFileFunctions.java which appears to be a separate bug-fix. There are also various spacing and other changes which make it quite hard to review the actual changes. It's a lot easier for committers if patches are restricted to the minimum required to implement the change. The package.html additions don't appear to be directly relevant, and the header ought to be the standard AL header.
        Hide
        Gary Lucas added a comment -

        BinaryFileFunctions.java was an accident, I had already submitted it under Sanselan-54 and did not realize that my Subversion client had picked it up. I also apologize for the space changes, which were likewise operator error.

        You are, of course, absolutely correct in your assessment that he package.html files are not relevant to the trouble report. I thought it would at least be a start to remedying the lack of documentation in the Sanselan code if I added Javadoc to those packages that I was working on. However, I can see an argument for segregrating code changes and documentation. If there is a policy that we do not add documentation and code changes in the same patch, then I will certainly not make that mistake in the future. Please feel free to remove them if you feel they are not helpful.

        Show
        Gary Lucas added a comment - BinaryFileFunctions.java was an accident, I had already submitted it under Sanselan-54 and did not realize that my Subversion client had picked it up. I also apologize for the space changes, which were likewise operator error. You are, of course, absolutely correct in your assessment that he package.html files are not relevant to the trouble report. I thought it would at least be a start to remedying the lack of documentation in the Sanselan code if I added Javadoc to those packages that I was working on. However, I can see an argument for segregrating code changes and documentation. If there is a policy that we do not add documentation and code changes in the same patch, then I will certainly not make that mistake in the future. Please feel free to remove them if you feel they are not helpful.
        Hide
        Sebb added a comment -

        It's not policy to segregate documentation and code patches, however in this case the doc patches are not directly related to the issue.
        If you wish to provide Javadoc patches, please raise a separate JIRA for these.

        Each patch should deal with a single issue, and should be the minimum needed to solve the issue; any docs in the patch should relate only to the issue being addressed.

        Otherwise it makes reviewing the patch harder, and if the patch is applied with spurious changes, it makes reviewing the commit message harder as well.
        A single patch may be seen by lots of people, so any extra work trying to understand it may be multiplied many times.

        [This is getting rather off-topic for this issue; probably ought to be documented elsewhere]

        Show
        Sebb added a comment - It's not policy to segregate documentation and code patches, however in this case the doc patches are not directly related to the issue. If you wish to provide Javadoc patches, please raise a separate JIRA for these. Each patch should deal with a single issue, and should be the minimum needed to solve the issue; any docs in the patch should relate only to the issue being addressed. Otherwise it makes reviewing the patch harder, and if the patch is applied with spurious changes, it makes reviewing the commit message harder as well. A single patch may be seen by lots of people, so any extra work trying to understand it may be multiplied many times. [This is getting rather off-topic for this issue; probably ought to be documented elsewhere]
        Hide
        Gary Lucas added a comment -

        Since I am claiming some pretty large performance gains with this change
        (and also advocating it as a possible solution for other data formats
        such as PNG and JPEG) I thought it appropriate to describe how I got
        the numbers I posted.

        I test mostly under Windows with some Linux. Since a modern OS is a
        noisy test environment, I try to run these tests when there are as
        few additional things as possible running on the system.

        My test methods always repeat the test procedure in a loop
        that performs the same operation multiple times. I disregard
        the timing results from the first few iterations of the loop
        because (a) the initial operations reflect the timing overhead
        for class loaders and the JIT compiler and (b) because
        the Windows file cache pretty much guarantees that the
        second time a program loads a file takes a lot less time
        than the first.

        The Sanselan.getBufferedImage(File) method has logic to
        determine which of its "image parsers" to use based on the
        file extension and other criteria. To eliminate this process
        from the time measurements, I always create an instance
        of the image parser specifically.

        Between iterations, I also explicitly put any objects
        created by the test program out of scope and
        run the garbage collector. This approach is intended to
        reduce the probability of significant garbage collection
        running during the main operation that I am trying to time.

        I have been testing with larger files (multiple megapixels)
        so the resolution and noisiness of the system clock does
        not affect the timing.

        Here's a snippet of code I use for testing

        public void testImageLoadTime(File file)
        throws ImageReadException, IOException
        {
        long nPixel = 0;
        long sumTimeMS = 0;
        int nTests = 0;
        for (int i = 0; i < 5; i++) {

        // load the image
        // Sanselan has logic to pick the right parser for the
        // image format based on the file extension. But, to isolate
        // performace costs to specific functionality, we bypass
        // all of that and create an instance of a specific parser
        HashMap params = new HashMap();
        TiffImageParser tiffImageParser = new TiffImageParser();
        ByteSourceFile byteSource = new ByteSourceFile(file);
        // record the start time
        long time0 = System.nanoTime();
        BufferedImage bImage =
        tiffImageParser.getBufferedImage(byteSource, params);
        // record the completion time
        long time1 = System.nanoTime();
        // compute difference and print elapsed time
        // for accumulated statistics, ignore the first two trials
        long deltaMS = (time1 - time0) / 1000000;
        if (i > 2)

        { nTests++; sumTimeMS += deltaMS; }

        nPixel = bImage.getWidth() * bImage.getHeight();
        System.out.println("time (ms) =" + deltaMS);
        // put all relevant objects out-of-scope and
        // run garbage collector
        bImage = null;
        tiffImageParser = null;
        byteSource = null;
        Runtime.getRuntime().gc();
        }
        System.out.println("Number of pixels " + nPixel);
        System.out.println("Average load time (ms) "
        + ((double) sumTimeMS / nTests));
        }

        Show
        Gary Lucas added a comment - Since I am claiming some pretty large performance gains with this change (and also advocating it as a possible solution for other data formats such as PNG and JPEG) I thought it appropriate to describe how I got the numbers I posted. I test mostly under Windows with some Linux. Since a modern OS is a noisy test environment, I try to run these tests when there are as few additional things as possible running on the system. My test methods always repeat the test procedure in a loop that performs the same operation multiple times. I disregard the timing results from the first few iterations of the loop because (a) the initial operations reflect the timing overhead for class loaders and the JIT compiler and (b) because the Windows file cache pretty much guarantees that the second time a program loads a file takes a lot less time than the first. The Sanselan.getBufferedImage(File) method has logic to determine which of its "image parsers" to use based on the file extension and other criteria. To eliminate this process from the time measurements, I always create an instance of the image parser specifically. Between iterations, I also explicitly put any objects created by the test program out of scope and run the garbage collector. This approach is intended to reduce the probability of significant garbage collection running during the main operation that I am trying to time. I have been testing with larger files (multiple megapixels) so the resolution and noisiness of the system clock does not affect the timing. Here's a snippet of code I use for testing public void testImageLoadTime(File file) throws ImageReadException, IOException { long nPixel = 0; long sumTimeMS = 0; int nTests = 0; for (int i = 0; i < 5; i++) { // load the image // Sanselan has logic to pick the right parser for the // image format based on the file extension. But, to isolate // performace costs to specific functionality, we bypass // all of that and create an instance of a specific parser HashMap params = new HashMap(); TiffImageParser tiffImageParser = new TiffImageParser(); ByteSourceFile byteSource = new ByteSourceFile(file); // record the start time long time0 = System.nanoTime(); BufferedImage bImage = tiffImageParser.getBufferedImage(byteSource, params); // record the completion time long time1 = System.nanoTime(); // compute difference and print elapsed time // for accumulated statistics, ignore the first two trials long deltaMS = (time1 - time0) / 1000000; if (i > 2) { nTests++; sumTimeMS += deltaMS; } nPixel = bImage.getWidth() * bImage.getHeight(); System.out.println("time (ms) =" + deltaMS); // put all relevant objects out-of-scope and // run garbage collector bImage = null; tiffImageParser = null; byteSource = null; Runtime.getRuntime().gc(); } System.out.println("Number of pixels " + nPixel); System.out.println("Average load time (ms) " + ((double) sumTimeMS / nTests)); }
        Hide
        Damjan Jovanovic added a comment -

        This is a good idea, but this is a big change that I don't want to make just before a release.

        Show
        Damjan Jovanovic added a comment - This is a good idea, but this is a big change that I don't want to make just before a release.
        Hide
        Damjan Jovanovic added a comment -

        Hi Lucas

        I couldn't use your patch as is - it's big and messy, and makes unrelated code and formatting changes. But commit 1293543, which improves TIFF reading performance, is largely based on parts of your patch, and you were given credit for it.

        As for the other issues you try to fix:

        • The "AlphaMask | (r<<16) | (g<<8) | b;" change. AlphaMask is static final - it should be inlined at compile time, so you gain nothing performance-wise by factoring it out. JVMs should be able to optimize (b << 0) into just b, and even if they can't at the moment, is saving 1 millisecond per megapixel really worth it? If so, please file a separate bug with that patch rebased against the latest SVN.
        • The palette is meant to have 16 bits per color. If some badly written apps are writing only the lowest 8 bits, or writing in the wrong byte order, do other TIFF readers like libtiff also correct this error?

        Otherwise thank you for your contribution, it will be in the 1.0 release.

        Show
        Damjan Jovanovic added a comment - Hi Lucas I couldn't use your patch as is - it's big and messy, and makes unrelated code and formatting changes. But commit 1293543, which improves TIFF reading performance, is largely based on parts of your patch, and you were given credit for it. As for the other issues you try to fix: The "AlphaMask | (r<<16) | (g<<8) | b;" change. AlphaMask is static final - it should be inlined at compile time, so you gain nothing performance-wise by factoring it out. JVMs should be able to optimize (b << 0) into just b, and even if they can't at the moment, is saving 1 millisecond per megapixel really worth it? If so, please file a separate bug with that patch rebased against the latest SVN. The palette is meant to have 16 bits per color. If some badly written apps are writing only the lowest 8 bits, or writing in the wrong byte order, do other TIFF readers like libtiff also correct this error? Otherwise thank you for your contribution, it will be in the 1.0 release.
        Hide
        Gary Lucas added a comment -

        Hi Damjan,

        I'm not emotionally attached to most of my changes, and am willing to yield to your judgement on most matters.

        The only one that I have any comment on is the issue about the 16 byte palette. The issue is whether Sanselan should include logic to handle a mistake in a data palette format. The question is how common is the mistake. I myself don't know. I've noticed that Microsoft Picture and Fax viewer handles files that have the 8-byte palettes, but I have no insights into what drove that particular design change.

        The only other thing I would disagree with is the characterization of my code changes as messy. Overreaching? Overly agressive? Reckless? Okay. But messy... say it ain't so

        g.


        Gary W. Lucas, Senior Software Engineer
        Sonalysts, Inc
        215 Parkway North
        Waterford, CT 06320
        (860) 326-3682

        Show
        Gary Lucas added a comment - Hi Damjan, I'm not emotionally attached to most of my changes, and am willing to yield to your judgement on most matters. The only one that I have any comment on is the issue about the 16 byte palette. The issue is whether Sanselan should include logic to handle a mistake in a data palette format. The question is how common is the mistake. I myself don't know. I've noticed that Microsoft Picture and Fax viewer handles files that have the 8-byte palettes, but I have no insights into what drove that particular design change. The only other thing I would disagree with is the characterization of my code changes as messy. Overreaching? Overly agressive? Reckless? Okay. But messy... say it ain't so g. — Gary W. Lucas, Senior Software Engineer Sonalysts, Inc 215 Parkway North Waterford, CT 06320 (860) 326-3682
        Hide
        Damjan Jovanovic added a comment -

        Of course it isn't so . It's just that the formatting changes make the patch seem larger and more complex than it really is.

        Do you have a sample image with that particular palette problem that I can look at and maybe add to the tests?

        Show
        Damjan Jovanovic added a comment - Of course it isn't so . It's just that the formatting changes make the patch seem larger and more complex than it really is. Do you have a sample image with that particular palette problem that I can look at and maybe add to the tests?
        Hide
        Gary Lucas added a comment -

        Damjan,

        So far, all the samples I've located have copyrights or are proprietary. I'm still looking.

        From a purely personal motivation, it looks like the system that produces the TIFF files that I'm required to process is going to be end-of-lifed pretty soon. So I won't be pressing for the feature if nobody else sees a need.

        Gary

        Show
        Gary Lucas added a comment - Damjan, So far, all the samples I've located have copyrights or are proprietary. I'm still looking. From a purely personal motivation, it looks like the system that produces the TIFF files that I'm required to process is going to be end-of-lifed pretty soon. So I won't be pressing for the feature if nobody else sees a need. Gary
        Hide
        Gary Lucas added a comment -

        Damjan,

        I see that you've integrated the most important change from this proposal into the main trunk.

        There is still a significant improvement to be realized in terms of the photometric interpreter. I took your comments to heart about me overloading my last submission with too many changes. So I have implemented a new version of the interpreter that is derived from the current code trunk and focuses on only the relevant changes. I would like to submit it.

        Would it make sense to close out this tracker item and have me enter a new item that addresses just the change to the photometric interpreter? I am ready to submit the new patch as soon as I hear from you.

        Gary

        Show
        Gary Lucas added a comment - Damjan, I see that you've integrated the most important change from this proposal into the main trunk. There is still a significant improvement to be realized in terms of the photometric interpreter. I took your comments to heart about me overloading my last submission with too many changes. So I have implemented a new version of the interpreter that is derived from the current code trunk and focuses on only the relevant changes. I would like to submit it. Would it make sense to close out this tracker item and have me enter a new item that addresses just the change to the photometric interpreter? I am ready to submit the new patch as soon as I hear from you. Gary
        Hide
        Damjan Jovanovic added a comment -

        Gary

        You can do either, I don't really mind. I look forward to your patch.

        Damjan

        Show
        Damjan Jovanovic added a comment - Gary You can do either, I don't really mind. I look forward to your patch. Damjan
        Hide
        Gary Lucas added a comment -

        Damjan,

        I entered a new Tracker Item (#75) for the change to the photometric interpreter. That completes the last of the changes I originally posted for this item. So I think you can close or reject this item as you see fit.

        Gary

        Show
        Gary Lucas added a comment - Damjan, I entered a new Tracker Item (#75) for the change to the photometric interpreter. That completes the last of the changes I originally posted for this item. So I think you can close or reject this item as you see fit. Gary
        Hide
        Gary Lucas added a comment -

        Since Damjan applied the change I proposed to all the TIFF parsing related routines and a number of classes related to other formats, I am closing this issue. There are other places where the BufferedImage setRGB method is still used, but those are probably best treated as a separate tracker item.

        The performance enhancements described under tracker item 69 are a separate issue.

        Show
        Gary Lucas added a comment - Since Damjan applied the change I proposed to all the TIFF parsing related routines and a number of classes related to other formats, I am closing this issue. There are other places where the BufferedImage setRGB method is still used, but those are probably best treated as a separate tracker item. The performance enhancements described under tracker item 69 are a separate issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Gary Lucas
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 48h
              48h
              Remaining:
              Remaining Estimate - 48h
              48h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development