[OAK-1236] Query: optimize for sling's i18n support - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13
Component/s: query
Labels:
None

Description

There are some performance issues with sling's internationalization support query [0].

The query for a specific locale looks like the following

//element(*,mix:language)[@jcr:language='en']//element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)

This turns into a join and it looks like it cannot properly leverage the index on the left side to filter out content on the right side of the join.

I'm going to use a standard CQ setup for the following analysis.

The left side of the join is quite efficient with a property index

//element(*,mix:language)[@jcr:language='en']
/libs/foundation/components/search/i18n/en
/libs/foundation/components/mobilefooter/i18n/en
/libs/commerce/components/search/i18n/en
/libs/cq/searchpromote/components/pagination/i18n/en

fast query, so far so good.

Now the trouble begins running the right side

//element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)

As far as I see the biggest issue here is that the second query doesn't leverage the left side join info. This affects the overall query time twice

first it doesn't know that we're only looking for 'en' so the query will traverse all the existing translations in all the languages (goes up to 91k rows). So it will fetch 91k rows each time, filtering out for english at a later phase

second it appears to run the query for each of the left side hit, in our case 4 times making the first issue 4 times worse.

[0] http://sling.apache.org/site/internationalization-support.html

Attachments

Issue Links

depends upon

OAK-1150 NodeType index: don't index all primary and mixin types

Open

Activity

Ascending order - Click to sort in descending order

Thomas Mueller added a comment - 28/Nov/13 11:12

Do you know how many messages there are with language = 'en'?

The full query is converted to the SQL-2 statement:

//element(*,mix:language)[@jcr:language='en']
  //element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)
  
select b.[jcr:path] as [jcr:path], b.[jcr:score] as [jcr:score], 
  b.[sling:key] as [sling:key], b.[sling:message] as [sling:message] 
from [mix:language] as a 
inner join [sling:Message] as b 
on isdescendantnode(b, a) 
where a.[jcr:language] = 'en' 
and b.[sling:message] is not null

As far as I know, the left hand side (selector a) is using an index, and the right hand side (selector b) is evaluated by traversing all child nodes of the result of a, and then checking if sling:message is not null. The alternative I see would be to use an index on selector b (the index on sling:message), and then traversing all those nodes, and check for each node whether one of the parent nodes has jcr:language = 'en'. But I don't currently see an easy way to somehow use both indexes at the same time.

Thomas Mueller added a comment - 28/Nov/13 11:12 Do you know how many messages there are with language = 'en'? The full query is converted to the SQL-2 statement: //element(*,mix:language)[@jcr:language= 'en' ] //element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message) select b.[jcr:path] as [jcr:path], b.[jcr:score] as [jcr:score], b.[sling:key] as [sling:key], b.[sling:message] as [sling:message] from [mix:language] as a inner join [sling:Message] as b on isdescendantnode(b, a) where a.[jcr:language] = 'en' and b.[sling:message] is not null As far as I know, the left hand side (selector a) is using an index, and the right hand side (selector b) is evaluated by traversing all child nodes of the result of a, and then checking if sling:message is not null. The alternative I see would be to use an index on selector b (the index on sling:message), and then traversing all those nodes, and check for each node whether one of the parent nodes has jcr:language = 'en'. But I don't currently see an easy way to somehow use both indexes at the same time.

Jukka Zitting added a comment - 28/Nov/13 16:08

The join engine in Jackrabbit 2.x would handle the query by first executing the left side of the join:

SELECT a.[jcr:path] FROM [mix:language] AS a  WHERE a.[jcr:language] = 'en'

So far it's equivalent to what Oak does. But the right side is then handled more efficiently, by using the left-side results to rewrite it to:

SELECT b.[jcr:path] FROM [sling:Message] AS b 
WHERE b.[sling:message] IS NOT NULL AND
    (ISDESCENDANTNODE(b, '/libs/foundation/components/search/i18n/en') OR
     ISDESCENDANTNODE(b, '/libs/foundation/components/mobilefooter/i18n/en') OR
     ISDESCENDANTNODE(b, '/libs/commerce/components/search/i18n/en') OR
     ISDESCENDANTNODE(b, '/libs/cq/searchpromote/components/pagination/i18n/en'))

Finally the results of the two sides are merged back together. I would suggest that we do something similar also in Oak.

Jukka Zitting added a comment - 28/Nov/13 16:08 The join engine in Jackrabbit 2.x would handle the query by first executing the left side of the join: SELECT a .[jcr: path ] FROM [mix: language ] AS a WHERE a .[jcr: language ] = 'en' So far it's equivalent to what Oak does. But the right side is then handled more efficiently, by using the left-side results to rewrite it to: SELECT b.[jcr: path ] FROM [sling:Message] AS b WHERE b.[sling:message] IS NOT NULL AND (ISDESCENDANTNODE(b, '/libs/foundation/components/ search /i18n/en' ) OR ISDESCENDANTNODE(b, '/libs/foundation/components/mobilefooter/i18n/en' ) OR ISDESCENDANTNODE(b, '/libs/commerce/components/ search /i18n/en' ) OR ISDESCENDANTNODE(b, '/libs/cq/searchpromote/components/pagination/i18n/en' )) Finally the results of the two sides are merged back together. I would suggest that we do something similar also in Oak.

Tobias Bocanegra added a comment - 28/Nov/13 16:13

would it be faster, to just search for all language roots and then traverse the subtree instead of querying it?

Tobias Bocanegra added a comment - 28/Nov/13 16:13 would it be faster, to just search for all language roots and then traverse the subtree instead of querying it?

Thomas Mueller added a comment - 04/Dec/13 08:55

I wonder what would happen if there is no index on the mixin type sling:Message? Wouldn't that make the query fast?

Thomas Mueller added a comment - 04/Dec/13 08:55 I wonder what would happen if there is no index on the mixin type sling:Message? Wouldn't that make the query fast?

Alex Deparvu added a comment - 06/Dec/13 15:08

Funny enough, I think the 2 following statements have the same effect:

would it be faster, to just search for all language roots and then traverse the subtree instead of querying it?

and

I wonder what would happen if there is no index on the mixin type sling:Message? Wouldn't that make the query fast?

I've tested this (and fixed ~~OAK-1269~~ in the process) and it looks like it would solve this issue: removing the node type index for the sling:Message causes a traversal which has minimal impact compared to the original issue.

On a more broader scope, I agree with Jukka that we should look into applying a similar optimization like the jackrabbit case: buffer the left side results and push the intermediate values on the right side of the join as a filter, but this could be tracked in a dedicated issue.

This issue is now a matter of index config which is outside the indexing code, so I will mark is as resolved soon if nobody objects.

Alex Deparvu added a comment - 06/Dec/13 15:08 Funny enough, I think the 2 following statements have the same effect: would it be faster, to just search for all language roots and then traverse the subtree instead of querying it? and I wonder what would happen if there is no index on the mixin type sling:Message? Wouldn't that make the query fast? I've tested this (and fixed OAK-1269 in the process) and it looks like it would solve this issue: removing the node type index for the sling:Message causes a traversal which has minimal impact compared to the original issue. On a more broader scope, I agree with Jukka that we should look into applying a similar optimization like the jackrabbit case: buffer the left side results and push the intermediate values on the right side of the join as a filter, but this could be tracked in a dedicated issue. This issue is now a matter of index config which is outside the indexing code, so I will mark is as resolved soon if nobody objects.

Alex Deparvu added a comment - 13/Dec/13 09:01

bulk close for the 0.13 release

Alex Deparvu added a comment - 13/Dec/13 09:01 bulk close for the 0.13 release

People

Assignee:: Alex Deparvu

Reporter:: Alex Deparvu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Nov/13 10:05

Updated:: 13/Dec/13 09:01

Resolved:: 09/Dec/13 20:27

Jackrabbit Oak