Details

      Description

      The path handling code in spi-commons shows quite often in thread dumps and profiling results, as the current implementation does quite a bit of repetitive allocating and copying of path element arrays. We should be able to streamline and simplify the path handling code by only tracking the latest path element and a reference to the parent path. To do this efficiently we may need to adjust some of the Path and PathFactory method declarations (that currently assume element array -based paths) also in the SPI.

      1. TransientManyChildNodesTest-2.png
        32 kB
        Jukka Zitting
      2. TransientManyChildNodesTest.png
        32 kB
        Jukka Zitting

        Activity

        Jukka Zitting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        angela made changes -
        Fix Version/s 2.2.0 [ 12314844 ]
        angela made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Jukka Zitting [ jukkaz ]
        Resolution Fixed [ 1 ]
        Hide
        angela added a comment -

        i think this issue should have been resolved, shouldn't it?

        Show
        angela added a comment - i think this issue should have been resolved, shouldn't it?
        Jukka Zitting made changes -
        Attachment TransientManyChildNodesTest-2.png [ 12454754 ]
        Hide
        Jukka Zitting added a comment -

        I've now gotten rid of the separate Element objects. The AbstractPath class now implements both the Path and Element interfaces (the latter properly only when dealing with a single-element path).

        I also improved the getElements() method to avoid the O(n^2) behaviour.

        The attached plot shows the current performance improvements (about 20% on large transient operations) that are mostly due to these changes. There's still room for additional improvements, but I believe the biggest wins have already been achieved here.

        Show
        Jukka Zitting added a comment - I've now gotten rid of the separate Element objects. The AbstractPath class now implements both the Path and Element interfaces (the latter properly only when dealing with a single-element path). I also improved the getElements() method to avoid the O(n^2) behaviour. The attached plot shows the current performance improvements (about 20% on large transient operations) that are mostly due to these changes. There's still room for additional improvements, but I believe the biggest wins have already been achieved here.
        Jukka N committed 997384 (17 files)
        Reviews: none

        JCR-2744: Avoid element arrays in PathImpl

        Get rid of separate Path.Element objects by making AbstractPath implement both Path and Path.Element

        jackrabbit trunk
        Jukka N committed 997303 (41 files)
        Reviews: none

        JCR-2744: Avoid element arrays in PathImpl

        Avoid using Path.getNameElement() where possible

        jackrabbit trunk
        Jukka N committed 997227 (14 files)
        Reviews: none

        JCR-2744: Avoid element arrays in PathImpl

        Use relative paths instead of path elements in event states

        Jukka N committed 997062 (1 file)
        Reviews: none

        JCR-2744: Avoid element arrays in PathImpl

        Correct handling of a special case in computeRelativePath()

        Hide
        Michael Dürig added a comment -

        I'm in favor of further optimizing Path.Element or even trying to get rid of it when it turns out to be feasible. Since getElements() is called from quite many places, we might gain some quick insight into the impact of further optimization from changing the method to an iterative implementation and running the performance test suite.

        Also - if it turn out to be beneficial performance wise - I'd prefer to have an iterative implementation of getElements() in 2.2 if further optimizations won't make it into the release.

        Show
        Michael Dürig added a comment - I'm in favor of further optimizing Path.Element or even trying to get rid of it when it turns out to be feasible. Since getElements() is called from quite many places, we might gain some quick insight into the impact of further optimization from changing the method to an iterative implementation and running the performance test suite. Also - if it turn out to be beneficial performance wise - I'd prefer to have an iterative implementation of getElements() in 2.2 if further optimizations won't make it into the release.
        Jukka Zitting made changes -
        Field Original Value New Value
        Attachment TransientManyChildNodesTest.png [ 12454450 ]
        Hide
        Jukka Zitting added a comment -

        The attached graph shows already some performance improvement that I believe is mostly due to these changes. I see some gains also in other benchmarks, but it's hard to say whether it's this or some of the other recent improvements at work.

        Re: getElements(); I'm actually hoping to get rid of this method entirely, which is why I so far haven't put too much effort into optimizing it.

        Going even further, I believe we could (and should) merge the Path and Path.Element interfaces to further drop the number of objects we need to instantiate and track.

        Show
        Jukka Zitting added a comment - The attached graph shows already some performance improvement that I believe is mostly due to these changes. I see some gains also in other benchmarks, but it's hard to say whether it's this or some of the other recent improvements at work. Re: getElements(); I'm actually hoping to get rid of this method entirely, which is why I so far haven't put too much effort into optimizing it. Going even further, I believe we could (and should) merge the Path and Path.Element interfaces to further drop the number of objects we need to instantiate and track.
        Hide
        Michael Dürig added a comment -

        This looks very good to me. Do you have some performance figures already?

        However I think that RelativePath#getElements() need to improve. The current implementations recursively calls getElements() on the parent and adds its own element to the returned array. To do so the returned array needs to be copied. Copying the array once for each recursive invocation effectively results in O(n^2) run time characteristics. I suggest to change the recursive implementation into an iterative one which collects the name elements of all ancestors of the current path into a pre allocated array.

        Show
        Michael Dürig added a comment - This looks very good to me. Do you have some performance figures already? However I think that RelativePath#getElements() need to improve. The current implementations recursively calls getElements() on the parent and adds its own element to the returned array. To do so the returned array needs to be copied. Copying the array once for each recursive invocation effectively results in O(n^2) run time characteristics. I suggest to change the recursive implementation into an iterative one which collects the name elements of all ancestors of the current path into a pre allocated array.
        Jukka Zitting created issue -

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Jukka Zitting
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development