Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Solr configuration throws an UnsupportedConfigurationException for Node.getTextContent() if it is started in server environments that uses older implementations of the DOM API. This can be improved by checking wich node type is actually handled an calling the appropriate methods.

      A patch that fixes this problem is attached.

        Activity

        Hide
        alexxx Alexander Saar added a comment -

        Patch that fixes the problem described by this issue.

        Show
        alexxx Alexander Saar added a comment - Patch that fixes the problem described by this issue.
        Hide
        hossman Hoss Man added a comment -

        FWIW: I don't know a lot about DOM2 vs DOM3, but loking over the documentation on the meaning of textContent I see...

        http://www.w3.org/TR/DOM-Level-3-Core/core.html
        ...
        [forsome node types]
        concatenation of the textContent attribute value of every child node, excluding COMMENT_NODE and
        PROCESSING_INSTRUCTION_NODE nodes. This is the empty string if the node has no children.
        [for other node types including comments and processing instructions]
        nodeValue

        If I'm reading that right, then if we replace usages of getTextContent with our own method for walking the child nodes and pulling out the nodeValue, we need to explicitly watch out for processing instructions and comments or we will start slurping them in my accident right?

        either way: DOMUtil.getText(Node) should have some javadoc clarifying it's purpsoe (support DOM 2) so it doesn't accidently get optimized away by someone down the road who says "hey, we can just use getTextContent()!"

        Show
        hossman Hoss Man added a comment - FWIW: I don't know a lot about DOM2 vs DOM3, but loking over the documentation on the meaning of textContent I see... http://www.w3.org/TR/DOM-Level-3-Core/core.html ... [forsome node types] concatenation of the textContent attribute value of every child node, excluding COMMENT_NODE and PROCESSING_INSTRUCTION_NODE nodes. This is the empty string if the node has no children. [for other node types including comments and processing instructions] nodeValue If I'm reading that right, then if we replace usages of getTextContent with our own method for walking the child nodes and pulling out the nodeValue, we need to explicitly watch out for processing instructions and comments or we will start slurping them in my accident right? either way: DOMUtil.getText(Node) should have some javadoc clarifying it's purpsoe (support DOM 2) so it doesn't accidently get optimized away by someone down the road who says "hey, we can just use getTextContent()!"
        Hide
        hossman Hoss Man added a comment -

        I tried to produce an example of the type of thing i'm worried about with the body of a comment getting slurped up ... but i could not – as far as i can tell that's actually a defect of the DOM parser, the spec says it should be there.

        Either way it should be safe and easy to check the Node type and only include nodes that aren't comments or processing instructions, i'll try to do that tomorow.

        Show
        hossman Hoss Man added a comment - I tried to produce an example of the type of thing i'm worried about with the body of a comment getting slurped up ... but i could not – as far as i can tell that's actually a defect of the DOM parser, the spec says it should be there. Either way it should be safe and easy to check the Node type and only include nodes that aren't comments or processing instructions, i'll try to do that tomorow.
        Hide
        hossman Hoss Man added a comment -

        revised version of patch that attempts to impliment the DOM Spec for "textContent" as precicesly as possible....

        http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent

        It seems to work properly, but I'd appreciate some eyeballs on this before I commit.

        Show
        hossman Hoss Man added a comment - revised version of patch that attempts to impliment the DOM Spec for "textContent" as precicesly as possible.... http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent It seems to work properly, but I'd appreciate some eyeballs on this before I commit.
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        I'm no DOM expert, but it looks OK.
        If it works on all our current tests/configs then it's a low risk change since that's all it's used for.

        Hmmm, but how do we know it's reading all the values OK, instead of getting a default or something.
        Perhaps run a startup test with the debugging level pumped all the way up, one before the change, and one after, then diff the logs (minus timestamps/linenumbers)?

        Oh, and StringBuffer could be StringBuilder, but it's private so I don't really care.

        Show
        yseeley@gmail.com Yonik Seeley added a comment - I'm no DOM expert, but it looks OK. If it works on all our current tests/configs then it's a low risk change since that's all it's used for. Hmmm, but how do we know it's reading all the values OK, instead of getting a default or something. Perhaps run a startup test with the debugging level pumped all the way up, one before the change, and one after, then diff the logs (minus timestamps/linenumbers)? Oh, and StringBuffer could be StringBuilder, but it's private so I don't really care.
        Hide
        hossman Hoss Man added a comment -

        Changed StringBuffer to StringBuilder (i keep forgetting about that class) and then tried out Yonik's suggestion (using the full test suite we have since some config access is done on an as needed basis) and the only diffs were in default object toStrings, and the names of temp directories created for indexes (AbstractSolrTestcase puts the time and a random number in the name when making the directory)

        commited.

        (Thanks for the initial patch Alexander, this is the kind of thing that never would have occured to me, but shoudl really help reduce frustrations new users might have)

        Show
        hossman Hoss Man added a comment - Changed StringBuffer to StringBuilder (i keep forgetting about that class) and then tried out Yonik's suggestion (using the full test suite we have since some config access is done on an as needed basis) and the only diffs were in default object toStrings, and the names of temp directories created for indexes (AbstractSolrTestcase puts the time and a random number in the name when making the directory) commited. (Thanks for the initial patch Alexander, this is the kind of thing that never would have occured to me, but shoudl really help reduce frustrations new users might have)
        Hide
        hossman Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked ("Resolved" or "Closed") and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.1

        The Fix Version for all 38 issues found was set to 1.1, email notification
        was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this
        (hopefully) unique string: 20080415hossman3

        Show
        hossman Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.1 The Fix Version for all 38 issues found was set to 1.1, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman3

          People

          • Assignee:
            hossman Hoss Man
            Reporter:
            alexxx Alexander Saar
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development