[XALANJ-2295] XSLTC performance problems with xsl:key in Muenchian grouping - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7
Fix Version/s: 2.7.1
Component/s: XSLTC
Labels:
None

Xalan info:

PatchAvailable

Description

Performance scales poorly for Muenchian grouping in XSLTC. For an instruction like the following, the user would like only the first node in the node set returned by each reference to the key function to be processed in the predicate, which tests for node identity.

<xsl:for-each select="item[generate-id() = generate-id(key('k', value)[1])]">

However, in XSLTC all nodes associated with each key value are copied into a temporary IntegerArray by the KeyIndex object used to represent the result of the key function, and then that result is always put through a DupFilterIterator, which sorts the nodes returned by the key function, even though the nodes produced by KeyIndex are always in document order; the DupFilterIterator seems to be used just as a mechanism for copying nodes from the KeyIndex object that was used to evaluate the result of the key function. So every node in the result of the key function is always copied at least twice.

Using a heap mechanism like that used in UnionPathIterator to sort the nodes would be more efficient, as it would only process each node once, and only as each node was needed. Each node in the heap would be the set of nodes (which are already in document order because that's how KeyIndex is built to begin with) associated with a single key value for that reference to the key function, so pulling a single node would require at most O(log2(num_kv)) comparisons, where num_kv is the number of key values.

On top of that, a reference to key that has a positional predicate does not take advantage of the NthIterator class to directly retrieve the node. This means that in an expression like

key('k',keyvalues)[1]

the position of every node produced by the key is tested for equality to the value one. Using NthIterator would mean as few nodes as possible would be retrieved - in this case, just one.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

gendata2295.xml
25/Apr/06 10:19
0.3 kB
Henry Zongaro
gendata2295.xsl
25/Apr/06 10:19
0.8 kB
Henry Zongaro
j2295.xsl
25/Apr/06 10:19
0.5 kB
Henry Zongaro
patch.j2295.txt
10/May/06 01:55
75 kB
Henry Zongaro
patch.j2295.v2.txt
07/Jun/06 03:24
77 kB
Henry Zongaro

Activity

People

Assignee:: Henry Zongaro

Reporter:: Henry Zongaro

Reviewer:: Christine Li

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/Apr/06 10:02

Updated:: 11/Dec/07 16:57

Resolved:: 08/Jun/06 03:29