Jackrabbit Oak
  1. Jackrabbit Oak
  2. OAK-80

Implement batched writing for KernelNodeStore

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.1
    • Fix Version/s: 0.3
    • Component/s: core
    • Labels:
      None

      Description

      Currently KernelNodeStore and KernelNodeStateBuilder directly apply every operation on the content tree to the private branch of the Microkernel. There have been some concerns re. performance hits due to network latency in the case where the Microkernel is not co-located.

      I suggest to add batching capabilities such that operations are only written through to the Microkernel on certain limits.

        Activity

        Hide
        Michael Dürig added a comment -

        Fixed at revision 1334059

        Show
        Michael Dürig added a comment - Fixed at revision 1334059
        Hide
        Thomas Mueller added a comment -

        >> the long term goal
        > ... Anyway, I don't see how that's relevant to this issue.

        The question is how would you build an an efficient oak-core remoting, if every Node.setProperty requires a TCP/IP roundtrip, because it goes through oak-core and down to oak-mk?

        But I see the advantage to use the MicroKernel API if that simplifies the implementation, and allows large transactions (that don't fit in memory).

        > let's just forget about pushing this below the MK interface for now.

        OK, just want to make sure we are on the same page.

        Show
        Thomas Mueller added a comment - >> the long term goal > ... Anyway, I don't see how that's relevant to this issue. The question is how would you build an an efficient oak-core remoting, if every Node.setProperty requires a TCP/IP roundtrip, because it goes through oak-core and down to oak-mk? But I see the advantage to use the MicroKernel API if that simplifies the implementation, and allows large transactions (that don't fit in memory). > let's just forget about pushing this below the MK interface for now. OK, just want to make sure we are on the same page.
        Hide
        Jukka Zitting added a comment -

        the long term goal

        Ah yes, you're right. Anyway, I don't see how that's relevant to this issue.

        the implementation in the MK would be quite complicated

        The complexity is more or less the same regardless of whether we implement this above or below the MK interface.

        Anyway, Michael made a good point above, so let's just forget about pushing this below the MK interface for now.

        Show
        Jukka Zitting added a comment - the long term goal Ah yes, you're right. Anyway, I don't see how that's relevant to this issue. the implementation in the MK would be quite complicated The complexity is more or less the same regardless of whether we implement this above or below the MK interface. Anyway, Michael made a good point above, so let's just forget about pushing this below the MK interface for now.
        Hide
        Thomas Mueller added a comment -

        >> remoting between oak-jcr and oak-core
        > I don't think we have current plans for that

        I was the opinion that's the long term goal (allow other programming languages such as PHP to use the oak-core API), and the spi-based remoting was only a temporary solution until oak-core remoting is available. If that's not the plan, then I wonder why we would need a separation between oak-jcr and oak-core.

        Show
        Thomas Mueller added a comment - >> remoting between oak-jcr and oak-core > I don't think we have current plans for that I was the opinion that's the long term goal (allow other programming languages such as PHP to use the oak-core API), and the spi-based remoting was only a temporary solution until oak-core remoting is available. If that's not the plan, then I wonder why we would need a separation between oak-jcr and oak-core.
        Hide
        Stefan Guggisberg added a comment -

        FWIW:

        > Personally I'd rather see such batching happening under the MicroKernel API where the implementation actually knows about things like whether network roundtrips are involved and what kind of batch sizes are most useful.

        i imagine that the implementation in the MK would be quite complicated (due to the MVCC contract) whereas supporting batched writes in the upper layer is probably not that difficult.

        Show
        Stefan Guggisberg added a comment - FWIW: > Personally I'd rather see such batching happening under the MicroKernel API where the implementation actually knows about things like whether network roundtrips are involved and what kind of batch sizes are most useful. i imagine that the implementation in the MK would be quite complicated (due to the MVCC contract) whereas supporting batched writes in the upper layer is probably not that difficult.
        Hide
        Jukka Zitting added a comment -

        OK, makes sense.

        Show
        Jukka Zitting added a comment - OK, makes sense.
        Hide
        Michael Dürig added a comment -

        Yes this would be another option. However, I'd rather push NodeStore down and make it the Microkernel API. To that respect batching is in the right place here.

        Anyway, having batching capabilities at that level does not hurt since it can easily be disabled.

        Show
        Michael Dürig added a comment - Yes this would be another option. However, I'd rather push NodeStore down and make it the Microkernel API. To that respect batching is in the right place here. Anyway, having batching capabilities at that level does not hurt since it can easily be disabled.
        Hide
        Jukka Zitting added a comment -

        remoting between oak-jcr and oak-core

        I don't think we have current plans for that. Do we need it (instead of the existing spi-based remoting we already have in Jackrabbit 2.x)?

        Show
        Jukka Zitting added a comment - remoting between oak-jcr and oak-core I don't think we have current plans for that. Do we need it (instead of the existing spi-based remoting we already have in Jackrabbit 2.x)?
        Hide
        Thomas Mueller added a comment -

        If we want to support an efficient remoting between the oak-jcr (the 'upper bun') and oak-core (the 'patty'), then I guess it makes sense to keep the transient space within oak-jcr, at least for changes that fit in memory. That means oak-mk (the 'lower bun') wouldn't be involved for such changes, until session.save(). And session.save() would (in the best case) only result in one call to oak-core.

        Unless if the 'upper bun' includes its own MicroKernel implementation for the transient space.

        Show
        Thomas Mueller added a comment - If we want to support an efficient remoting between the oak-jcr (the 'upper bun') and oak-core (the 'patty'), then I guess it makes sense to keep the transient space within oak-jcr, at least for changes that fit in memory. That means oak-mk (the 'lower bun') wouldn't be involved for such changes, until session.save(). And session.save() would (in the best case) only result in one call to oak-core. Unless if the 'upper bun' includes its own MicroKernel implementation for the transient space.
        Hide
        Jukka Zitting added a comment -

        Personally I'd rather see such batching happening under the MicroKernel API where the implementation actually knows about things like whether network roundtrips are involved and what kind of batch sizes are most useful.

        Show
        Jukka Zitting added a comment - Personally I'd rather see such batching happening under the MicroKernel API where the implementation actually knows about things like whether network roundtrips are involved and what kind of batch sizes are most useful.
        Hide
        Michael Dürig added a comment -

        Revision 1333010 adds new implementations for NodeStateBuilder and NodeStore named KernelNodeStateBuilder2 and KernelNodeStore2. These implementations keep the transient space partially in memory and batch it back to the Microkernel as soon as as the commit size (jsop string) exceed 1024 characters.

        The new implementations are not yet wired to the rest of oak-core yet. I will do that in subsequent commits and remove the old implementations.

        Show
        Michael Dürig added a comment - Revision 1333010 adds new implementations for NodeStateBuilder and NodeStore named KernelNodeStateBuilder2 and KernelNodeStore2. These implementations keep the transient space partially in memory and batch it back to the Microkernel as soon as as the commit size (jsop string) exceed 1024 characters. The new implementations are not yet wired to the rest of oak-core yet. I will do that in subsequent commits and remove the old implementations.

          People

          • Assignee:
            Michael Dürig
            Reporter:
            Michael Dürig
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development