Description
There are some performance issues with sling's internationalization support query [0].
The query for a specific locale looks like the following
//element(*,mix:language)[@jcr:language='en']//element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)
This turns into a join and it looks like it cannot properly leverage the index on the left side to filter out content on the right side of the join.
I'm going to use a standard CQ setup for the following analysis.
The left side of the join is quite efficient with a property index
//element(*,mix:language)[@jcr:language='en'] /libs/foundation/components/search/i18n/en /libs/foundation/components/mobilefooter/i18n/en /libs/commerce/components/search/i18n/en /libs/cq/searchpromote/components/pagination/i18n/en
fast query, so far so good.
Now the trouble begins running the right side
//element(*,sling:Message)[@sling:message]/(@sling:key|@sling:message)
As far as I see the biggest issue here is that the second query doesn't leverage the left side join info. This affects the overall query time twice
- first it doesn't know that we're only looking for 'en' so the query will traverse all the existing translations in all the languages (goes up to 91k rows). So it will fetch 91k rows each time, filtering out for english at a later phase
- second it appears to run the query for each of the left side hit, in our case 4 times making the first issue 4 times worse.
[0] http://sling.apache.org/site/internationalization-support.html
Attachments
Issue Links
- depends upon
-
OAK-1150 NodeType index: don't index all primary and mixin types
- Open
Do you know how many messages there are with language = 'en'?
The full query is converted to the SQL-2 statement:
As far as I know, the left hand side (selector a) is using an index, and the right hand side (selector b) is evaluated by traversing all child nodes of the result of a, and then checking if sling:message is not null. The alternative I see would be to use an index on selector b (the index on sling:message), and then traversing all those nodes, and check for each node whether one of the parent nodes has jcr:language = 'en'. But I don't currently see an easy way to somehow use both indexes at the same time.