
|
If you were logged in you would be able to see more operations.
|
|
|
| Xalan info: |
PatchAvailable
|
| Reviewer: |
Christine Li
|
| Resolution Date: |
08/Jun/06 03:29 AM
|
|
Performance scales poorly for Muenchian grouping in XSLTC. For an instruction like the following, the user would like only the first node in the node set returned by each reference to the key function to be processed in the predicate, which tests for node identity.
<xsl:for-each select="item[generate-id() = generate-id(key('k', value)[1])]">
However, in XSLTC all nodes associated with each key value are copied into a temporary IntegerArray by the KeyIndex object used to represent the result of the key function, and then that result is always put through a DupFilterIterator, which sorts the nodes returned by the key function, even though the nodes produced by KeyIndex are always in document order; the DupFilterIterator seems to be used just as a mechanism for copying nodes from the KeyIndex object that was used to evaluate the result of the key function. So every node in the result of the key function is always copied at least twice.
Using a heap mechanism like that used in UnionPathIterator to sort the nodes would be more efficient, as it would only process each node once, and only as each node was needed. Each node in the heap would be the set of nodes (which are already in document order because that's how KeyIndex is built to begin with) associated with a single key value for that reference to the key function, so pulling a single node would require at most O(log2(num_kv)) comparisons, where num_kv is the number of key values.
On top of that, a reference to key that has a positional predicate does not take advantage of the NthIterator class to directly retrieve the node. This means that in an expression like
key('k',keyvalues)[1]
the position of every node produced by the key is tested for equality to the value one. Using NthIterator would mean as few nodes as possible would be retrieved - in this case, just one.
|
|
Description
|
Performance scales poorly for Muenchian grouping in XSLTC. For an instruction like the following, the user would like only the first node in the node set returned by each reference to the key function to be processed in the predicate, which tests for node identity.
<xsl:for-each select="item[generate-id() = generate-id(key('k', value)[1])]">
However, in XSLTC all nodes associated with each key value are copied into a temporary IntegerArray by the KeyIndex object used to represent the result of the key function, and then that result is always put through a DupFilterIterator, which sorts the nodes returned by the key function, even though the nodes produced by KeyIndex are always in document order; the DupFilterIterator seems to be used just as a mechanism for copying nodes from the KeyIndex object that was used to evaluate the result of the key function. So every node in the result of the key function is always copied at least twice.
Using a heap mechanism like that used in UnionPathIterator to sort the nodes would be more efficient, as it would only process each node once, and only as each node was needed. Each node in the heap would be the set of nodes (which are already in document order because that's how KeyIndex is built to begin with) associated with a single key value for that reference to the key function, so pulling a single node would require at most O(log2(num_kv)) comparisons, where num_kv is the number of key values.
On top of that, a reference to key that has a positional predicate does not take advantage of the NthIterator class to directly retrieve the node. This means that in an expression like
key('k',keyvalues)[1]
the position of every node produced by the key is tested for equality to the value one. Using NthIterator would mean as few nodes as possible would be retrieved - in this case, just one. |
Show » |
| Repository |
Revision |
Date |
User |
Message |
| ASF |
#412501 |
Wed Jun 07 19:54:03 UTC 2006 |
zongaro |
Part of fix for XALANJ-2295. Changed DupFilterIterator.setStartNode to take
advantage of the fact that if the source iterator is a KeyIndex, it will
return its nodes in document order, so no additional IntegerArray.sort operation
is necessary.
Reviewed by Christine Li (jycli () ca ! ibm ! com)
|
| Files Changed |
MODIFY
/xalan/java/trunk/src/org/apache/xalan/xsltc/dom/DupFilterIterator.java
|
| Repository |
Revision |
Date |
User |
Message |
| ASF |
#412504 |
Wed Jun 07 19:54:24 UTC 2006 |
zongaro |
Part of fix for XALANJ-2294 and XALANJ-2295.
Each KeyIndex for a particular key or for the id function was set up with a
Hashtable mapping Strings to nodes. However, the set of nodes returned are
only supposed to be those in the same input document as the context node. The
code was accepting nodes as node IDs and putting them all into the same table,
so node IDs from different documents were being mixed together. Fixed this by
adding another Hashtable from the root node of a document to the Hashtable that
maps Strings to node handles in that document. This affects insertion of nodes
into the KeyIndex (in add) and look-up of nodes for patterns (in containsID,
containsKey, getDOMNodeById).
Generated byte code previously looked up nodes to be retrieved by a reference
to the key or id function by cloning an IntegerArray containing the first set
of nodes and merging in subsequent nodes retrieved. The generated code
contained any required looping code to loop over nodes in a node set that
appeared in a call to key or id. The effect of all this was that every node in
the resulting node set was processed at least once, regardless of whether all
the node returned were actually used - they might not need to be if a positional
predicate is used, for instance.
The old KeyIndex.lookupId and KeyIndex.lookupKey are now deprecated, but
preserved for any previously compiled translets. Instead, new code will use
the KeyIndex.getKeyIndexIterator methods to get an iterator that will return the
nodes for a particular reference to the key or id function.
The iterator returned by getKeyIndexIterator is an instance of an inner class -
KeyIndex.KeyIndexIterator - which extends the new MultiValuedNodeHeapIterator.
Each node in the heap refers to an IntegerArray that contains the nodes for
each key value or id value that was looked up. It's sensitive to the context
node (or more importantly, the root of the context node) and retrieves node
handles for the function reference lazily to avoid unnecessarily greedy and
potentially duplicate processing of the nodes.
Also, fix for XALANJ-2292. The byte code generation assumed that if the
second argument to a reference to the key function was not a node set or a
string, that it had to be converted to a string. However, if the argument is
a parameter whose value is a node set, all the nodes in the node set should
play a role in computing the result of the function, not just the first.
The KeyIndex.KeyIndexIterator is responsible for processing the argument to the
key or id function, iterating over nodes in a node set if required, rather than
leaving that responsibility to generated byte code, because we don't generally
know whether the argument will be an iterator.
Reviewed by Christine Li (jycli () ca ! ibm ! com)
|
| Files Changed |
MODIFY
/xalan/java/trunk/src/org/apache/xalan/xsltc/dom/KeyIndex.java
|
| Repository |
Revision |
Date |
User |
Message |
| ASF |
#412507 |
Wed Jun 07 19:54:45 UTC 2006 |
zongaro |
Part of fix for XALANJ-2294 and XALANJ-2295.
Generated byte code previously looked up nodes to be retrieved by a reference
to the key or id function by cloning an IntegerArray containing the first set
of nodes and merging in subsequent nodes retrieved. The generated code
contained any required looping code to loop over nodes in a node set that
appeared in a call to key or id. The effect of all this was that every node in
the resulting node set was processed at least once, regardless of whether all
the node returned were actually used - they might not need to be if a positional
predicate is used, for instance.
The old KeyIndex.lookupId and KeyIndex.lookupKey are now deprecated, but
preserved for any previously compiled translets. Instead, new generated code
uses the KeyIndex.getKeyIndexIterator methods to get an iterator that will
return the nodes for a particular reference to the key or id function. The
nodes returned by KeyIndexIterator will be in document order with duplicates
filtered, so the DupFilterIterator that was previously created is no longer
necessary.
Also, fix for XALANJ-2292. The byte code generation assumed that if the
second argument to a reference to the key function was not a node set or a
string, that it had to be converted to a string. However, if the argument is
a parameter whose value is a node set, all the nodes in the node set should
play a role in computing the result of the function, not just the first.
The KeyIndex.KeyIndexIterator is responsible for processing the argument to the
key or id function, iterating over nodes in a node set if required, rather than
leaving that responsibility to generated byte code, because we don't generally
know whether the argument will be an iterator.
Reviewed by Christine Li (jycli () ca ! ibm ! com)
|
| Files Changed |
MODIFY
/xalan/java/trunk/src/org/apache/xalan/xsltc/compiler/KeyCall.java
|
|