Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2295

XSLTC performance problems with xsl:key in Muenchian grouping

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7
    • 2.7.1
    • XSLTC
    • None
    • PatchAvailable

    Description

      Performance scales poorly for Muenchian grouping in XSLTC. For an instruction like the following, the user would like only the first node in the node set returned by each reference to the key function to be processed in the predicate, which tests for node identity.

      <xsl:for-each select="item[generate-id() = generate-id(key('k', value)[1])]">

      However, in XSLTC all nodes associated with each key value are copied into a temporary IntegerArray by the KeyIndex object used to represent the result of the key function, and then that result is always put through a DupFilterIterator, which sorts the nodes returned by the key function, even though the nodes produced by KeyIndex are always in document order; the DupFilterIterator seems to be used just as a mechanism for copying nodes from the KeyIndex object that was used to evaluate the result of the key function. So every node in the result of the key function is always copied at least twice.

      Using a heap mechanism like that used in UnionPathIterator to sort the nodes would be more efficient, as it would only process each node once, and only as each node was needed. Each node in the heap would be the set of nodes (which are already in document order because that's how KeyIndex is built to begin with) associated with a single key value for that reference to the key function, so pulling a single node would require at most O(log2(num_kv)) comparisons, where num_kv is the number of key values.

      On top of that, a reference to key that has a positional predicate does not take advantage of the NthIterator class to directly retrieve the node. This means that in an expression like

      key('k',keyvalues)[1]

      the position of every node produced by the key is tested for equality to the value one. Using NthIterator would mean as few nodes as possible would be retrieved - in this case, just one.

      Attachments

        1. gendata2295.xml
          0.3 kB
          Henry Zongaro
        2. gendata2295.xsl
          0.8 kB
          Henry Zongaro
        3. j2295.xsl
          0.5 kB
          Henry Zongaro
        4. patch.j2295.txt
          75 kB
          Henry Zongaro
        5. patch.j2295.v2.txt
          77 kB
          Henry Zongaro

        Activity

          People

            zongaro@ca.ibm.com Henry Zongaro
            zongaro@ca.ibm.com Henry Zongaro
            Christine Li Christine Li
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: