Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.2
    • Component/s: None
    • Labels:
      None

      Description

      Index aggregates could contain other index aggregates. JR should be able to handle a complete hierarchy of aggregates.

      I'm working on a patch.

      1. JCR-2989_v3.patch
        40 kB
        Alex Parvulescu
      2. JCR-2989_v2.patch
        10 kB
        Alex Parvulescu
      3. JCR-2989.patch
        7 kB
        Alex Parvulescu

        Activity

        Hide
        Alex Parvulescu added a comment -

        Fixed in revision 1182367 and 1182368.

        Show
        Alex Parvulescu added a comment - Fixed in revision 1182367 and 1182368.
        Hide
        Alex Parvulescu added a comment -

        take 3, action!

        Updated the patch.
        Now the same type node aggregates are not included by default.
        Followed Jukka's advice and included a 'recursive' flag on the aggregate definition (default is false).
        I also added 'recursiveLimit' to control the number of levels up the indexing should go (default 100 / 0 for infinite recusion).

        This is how the config looks like now:
        <aggregate primaryType="nt:folder" recursive="true" recursiveLimit="10">

        So basically, now we are sure that the aggregates will not just sneak up on you

        For more info, you can also look at the tests.

        Show
        Alex Parvulescu added a comment - take 3, action! Updated the patch. Now the same type node aggregates are not included by default. Followed Jukka's advice and included a 'recursive' flag on the aggregate definition (default is false). I also added 'recursiveLimit' to control the number of levels up the indexing should go (default 100 / 0 for infinite recusion). This is how the config looks like now: <aggregate primaryType="nt:folder" recursive="true" recursiveLimit="10"> So basically, now we are sure that the aggregates will not just sneak up on you For more info, you can also look at the tests.
        Hide
        Jukka Zitting added a comment -

        The recursive behaviour I'm worried about is due to the way aggregates are built by the indexer. For example say that you have a subtree of 1m nt:folders organized in three levels (100 child nodes for each non-leaf folder). Now, given the above example aggregate definition, whenever any one of these folders gets updated, the search index will also trigger an update of the index entry of the top-level folder. The SearchIndex.createDocument() call would then have to recursively traverse the entire subtree of 1m folders just to build the correct new index entry. That's a performance nightmare just waiting to happen.

        Recursive aggregates can be a pretty useful tool and I'd certainly like to see them fully supported, but I'd like to avoid a situation where it's too easy to shoot yourself in the foot with careless configuration.

        Show
        Jukka Zitting added a comment - The recursive behaviour I'm worried about is due to the way aggregates are built by the indexer. For example say that you have a subtree of 1m nt:folders organized in three levels (100 child nodes for each non-leaf folder). Now, given the above example aggregate definition, whenever any one of these folders gets updated, the search index will also trigger an update of the index entry of the top-level folder. The SearchIndex.createDocument() call would then have to recursively traverse the entire subtree of 1m folders just to build the correct new index entry. That's a performance nightmare just waiting to happen. Recursive aggregates can be a pretty useful tool and I'd certainly like to see them fully supported, but I'd like to avoid a situation where it's too easy to shoot yourself in the foot with careless configuration.
        Hide
        Alex Parvulescu added a comment -

        Attached a reworked patch.

        It includes a soft limit for recursive aggregates.
        After talking with others about this, I've decided to put the limit at 100, this should give us plenty of room to breathe.
        So for the same type nodes, it only goes 100 (same type) parents up, then it stops updating.

        The trick is that the limit is upward, because the way aggregation works. You start with a child node that has been updated, and go up it's hierarchy.

        The patch also includes the tiny optimisation I mentioned earlier, if it runs into a known parent, it will stop, as the entire hierarchy is already on the update list.

        Show
        Alex Parvulescu added a comment - Attached a reworked patch. It includes a soft limit for recursive aggregates. After talking with others about this, I've decided to put the limit at 100, this should give us plenty of room to breathe. So for the same type nodes, it only goes 100 (same type) parents up, then it stops updating. The trick is that the limit is upward, because the way aggregation works. You start with a child node that has been updated, and go up it's hierarchy. The patch also includes the tiny optimisation I mentioned earlier, if it runs into a known parent, it will stop, as the entire hierarchy is already on the update list.
        Hide
        Alex Parvulescu added a comment -

        Yes, on update the structure will be traversed up, so the parents will be updated. For each change, you'll have as many updates as there are levels of inclusion.

        But I'm not sure how big this problem is. I'd say it depends on the number of levels you hierarchy has, and the number of fields a node has indexed. But you can easily abuse both

        I'll tweak the code a bit so that it tries to optimize the upward update. Basically when checking a node's root aggregate, if this is already on the update list, stop there, as all its parents are already included in this update.

        I agree on having a soft rule in place for recursive aggregates (lets say max 5 levels deep), but I think having a recursive=true rule doesn't help the average copy-paste-from-the-wiki user
        (and by the way that includes me too, as the docs are not excellent

        Show
        Alex Parvulescu added a comment - Yes, on update the structure will be traversed up, so the parents will be updated. For each change, you'll have as many updates as there are levels of inclusion. But I'm not sure how big this problem is. I'd say it depends on the number of levels you hierarchy has, and the number of fields a node has indexed. But you can easily abuse both I'll tweak the code a bit so that it tries to optimize the upward update. Basically when checking a node's root aggregate, if this is already on the update list, stop there, as all its parents are already included in this update. I agree on having a soft rule in place for recursive aggregates (lets say max 5 levels deep), but I think having a recursive=true rule doesn't help the average copy-paste-from-the-wiki user (and by the way that includes me too, as the docs are not excellent
        Hide
        Jukka Zitting added a comment -

        I'm a bit worried about the possibility of accidentally creating huge aggregates with this. For example, will the following create index aggregates of an entire folder structure, traversed recursively whenever any of the folders gets modified?

        <aggregate primaryType="nt:folder">
        <include primaryType="nt:folder">*</include>
        </aggregate>

        It might be good to either prevent recursive aggregates (only allow different aggregate definitions to be embedded) or to require some explicit recursive="true" annotation on such aggregate definitions to make sure the user knows what they're doing.

        Show
        Jukka Zitting added a comment - I'm a bit worried about the possibility of accidentally creating huge aggregates with this. For example, will the following create index aggregates of an entire folder structure, traversed recursively whenever any of the folders gets modified? <aggregate primaryType="nt:folder"> <include primaryType="nt:folder">*</include> </aggregate> It might be good to either prevent recursive aggregates (only allow different aggregate definitions to be embedded) or to require some explicit recursive="true" annotation on such aggregate definitions to make sure the user knows what they're doing.
        Hide
        Alex Parvulescu added a comment -

        attached support for embedded index aggregates (fix + test)

        any comments?

        Show
        Alex Parvulescu added a comment - attached support for embedded index aggregates (fix + test) any comments?

          People

          • Assignee:
            Alex Parvulescu
            Reporter:
            Alex Parvulescu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development