Derby
  1. Derby
  2. DERBY-4903

Plan exporter tool produces broken output if query contains less-than operator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.7.1.1
    • Fix Version/s: 10.7.1.1, 10.8.1.2
    • Component/s: Tools
    • Labels:
      None

      Description

      The plan exporter tool fails to produce well-formed output for the following query:

      ij> select * from sysibm.sysdummy1 where ibmreqd < 'Z';

      The generation of XML doesn't fail, but when I open the file in Firefox, I see this message:

      XML Parsing Error: not well-formed
      Location: file:///tmp/plan.xml
      Line Number 9, Column 11:

      Operator: <
      ----------^

      HTML generation prints the following error, and produces an empty file:

      ERROR: 'The value of attribute "scan_qualifiers" associated with an element type "null" must not contain the '<' character.'

      1. derby-4903-a.diff
        8 kB
        Nirmal Fernando
      2. test.diff
        2 kB
        Knut Anders Hatlen
      3. derby-4903-b.diff
        2 kB
        Nirmal Fernando
      4. derby-4903-c.diff
        5 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          I guess I handled this inside "AccessDatabase.Statement()" method. But not sure what went wrong here.

          Show
          Nirmal Fernando added a comment - Hi Knut, I guess I handled this inside "AccessDatabase.Statement()" method. But not sure what went wrong here.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          I didn't notice the latter part of the description (i.e. ERROR: 'The value of attribute "scan_qualifiers" associated with an element type "null" must not contain the '<' character.'), earlier. Sorry about that.

          So, it seems like there's a bug here.

          I'll look into this if you're not going to take it.

          Thanks for pointing this out!

          Show
          Nirmal Fernando added a comment - Hi Knut, I didn't notice the latter part of the description (i.e. ERROR: 'The value of attribute "scan_qualifiers" associated with an element type "null" must not contain the '<' character.'), earlier. Sorry about that. So, it seems like there's a bug here. I'll look into this if you're not going to take it. Thanks for pointing this out!
          Hide
          Knut Anders Hatlen added a comment -

          Hi Nirmal,
          Thanks for taking a look at this bug. And for volunteering to fix it.

          Show
          Knut Anders Hatlen added a comment - Hi Nirmal, Thanks for taking a look at this bug. And for volunteering to fix it.
          Hide
          Knut Anders Hatlen added a comment -

          Note that the problem is not limited to the character '<'. You'll see similar problems if the queries run against a table with '"' in its name, for example with these queries:

          ij> create table "A ""quoted"" table name" (x int);
          0 rows inserted/updated/deleted
          ij> select * from "A ""quoted"" table name";
          X
          -----------

          0 rows selected

          Show
          Knut Anders Hatlen added a comment - Note that the problem is not limited to the character '<'. You'll see similar problems if the queries run against a table with '"' in its name, for example with these queries: ij> create table "A ""quoted"" table name" (x int); 0 rows inserted/updated/deleted ij> select * from "A ""quoted"" table name"; X ----------- 0 rows selected
          Hide
          Nirmal Fernando added a comment -

          Hi,

          I'm attaching a patch which addresses the occurrences of special XML characters such as >(>), <(<), '(') and "("), in both <statement> nodes and attribute nodes.

          I've attached few test cases as well.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi, I'm attaching a patch which addresses the occurrences of special XML characters such as >(>), <(<), '(') and "("), in both <statement> nodes and attribute nodes. I've attached few test cases as well. Thanks.
          Hide
          Bryan Pendleton added a comment -

          This seems like a pretty reasonable approach, and the test cases look appropriate, too.

          It seems like it would be nice to perform the special character encoding of the
          XML attribute strings prior to composing them into quoted substrings in the XML tag string,
          since then replaceInAttribute() wouldn't have to go searching for the start and end of
          the attribute string to find it.

          But this diff is small and focused on the problem at hand, and that's always good.

          Show
          Bryan Pendleton added a comment - This seems like a pretty reasonable approach, and the test cases look appropriate, too. It seems like it would be nice to perform the special character encoding of the XML attribute strings prior to composing them into quoted substrings in the XML tag string, since then replaceInAttribute() wouldn't have to go searching for the start and end of the attribute string to find it. But this diff is small and focused on the problem at hand, and that's always good.
          Hide
          Knut Anders Hatlen added a comment -

          It may also be worthwhile to factor out the code that encodes the special characters, and put it in a helper method instead of repeating the code.

          I think we also need to escape & -> & in these strings. The query below results in XML that's not well-formed, and the HTML generation fails with a NullPointerException:

          select * from sysibm.sysdummy1 where ibmreqd = 'a, b & c'

          Show
          Knut Anders Hatlen added a comment - It may also be worthwhile to factor out the code that encodes the special characters, and put it in a helper method instead of repeating the code. I think we also need to escape & -> & in these strings. The query below results in XML that's not well-formed, and the HTML generation fails with a NullPointerException: select * from sysibm.sysdummy1 where ibmreqd = 'a, b & c'
          Hide
          Knut Anders Hatlen added a comment -

          By the way, it looks like the HTML generation uses javax.xml.* classes in order to transform XML to HTML. Is there a reason why we couldn't use those classes to generate the XML in the first place? I suppose they would escape the strings automatically (and correctly).

          Show
          Knut Anders Hatlen added a comment - By the way, it looks like the HTML generation uses javax.xml.* classes in order to transform XML to HTML. Is there a reason why we couldn't use those classes to generate the XML in the first place? I suppose they would escape the strings automatically (and correctly).
          Hide
          Nirmal Fernando added a comment -

          Kunt: As I remember we had few issues on doing this;

          1) A way to add a pointer to the XSL style sheet
          2) Difficulty of finding all classes needed, in all JREs.

          Are these making sense? If you think otherwise please let us know.

          Bryan: Hi Bryan! Since we're composing XML attribute strings into quoted substrings, from the query it self, I can't see away, where we can avoid doing this. Please forgive me if I misunderstood your question.

          Show
          Nirmal Fernando added a comment - Kunt: As I remember we had few issues on doing this; 1) A way to add a pointer to the XSL style sheet 2) Difficulty of finding all classes needed, in all JREs. Are these making sense? If you think otherwise please let us know. Bryan: Hi Bryan! Since we're composing XML attribute strings into quoted substrings, from the query it self, I can't see away, where we can avoid doing this. Please forgive me if I misunderstood your question.
          Hide
          Knut Anders Hatlen added a comment -

          > 1) A way to add a pointer to the XSL style sheet

          I think the org.w3c.dom.Document.createProcessingInstruction() method
          could do that. The following code fragment seems to create a pointer
          to the style sheet for me, and it automatically escapes the '<', '>'
          and '&' characters in the query text:

          Document doc =
          DocumentBuilderFactory.newInstance().
          newDocumentBuilder().newDocument();

          doc.appendChild(
          doc.createProcessingInstruction(
          "xml-stylesheet",
          "type=\"text/xsl\" href=\"...\""));

          Element plan = doc.createElement("plan");
          doc.appendChild(plan);

          Element statement = doc.createElement("statement");
          statement.setTextContent("SELECT * FROM T WHERE X <> 'a & b'");
          plan.appendChild(statement);

          Element time = doc.createElement("time");
          time.setTextContent("2010-11-24 18:27:14.466");
          plan.appendChild(time);

          Source source = new DOMSource(doc);
          Result result = new StreamResult(System.out);

          Transformer trans = TransformerFactory.newInstance().newTransformer();
          trans.transform(source, result);

          > 2) Difficulty of finding all classes needed, in all JREs.

          That's what I thought too, and that's why I was surprised to see that
          the generation of HTML files used those classes. But since the tool
          now appears to depend on these classes being available, wouldn't it
          make sense to use them for generating the XML too?

          Show
          Knut Anders Hatlen added a comment - > 1) A way to add a pointer to the XSL style sheet I think the org.w3c.dom.Document.createProcessingInstruction() method could do that. The following code fragment seems to create a pointer to the style sheet for me, and it automatically escapes the '<', '>' and '&' characters in the query text: Document doc = DocumentBuilderFactory.newInstance(). newDocumentBuilder().newDocument(); doc.appendChild( doc.createProcessingInstruction( "xml-stylesheet", "type=\"text/xsl\" href=\"...\"")); Element plan = doc.createElement("plan"); doc.appendChild(plan); Element statement = doc.createElement("statement"); statement.setTextContent("SELECT * FROM T WHERE X <> 'a & b'"); plan.appendChild(statement); Element time = doc.createElement("time"); time.setTextContent("2010-11-24 18:27:14.466"); plan.appendChild(time); Source source = new DOMSource(doc); Result result = new StreamResult(System.out); Transformer trans = TransformerFactory.newInstance().newTransformer(); trans.transform(source, result); > 2) Difficulty of finding all classes needed, in all JREs. That's what I thought too, and that's why I was surprised to see that the generation of HTML files used those classes. But since the tool now appears to depend on these classes being available, wouldn't it make sense to use them for generating the XML too?
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          Thanks for the ideas!

          Can you please consider following.

          Few negative/tricky points on doing this:
          1) This way XML document becomes hard to read: It was a requirement, so that the user can go through a generated XML file and build up his own Style sheet.

          eg:
          <?xml version="1.0" encoding="UTF-8" standalone="no"?><?xml-stylesheet type="text/xsl" href="..."?>
          <plan><statement>SELECT * FROM T WHERE X <> 'a & b'</statement><time>2010-11-24 18:27:14.466</time></plan>

          2) This need some re-factoring of the design approach:

          • Now we are getting the name of the attribute from the query, we might need to change it.
          • Now I am building the whole query plan tree and then only write the whole string to the file, we need to alter this approach.
            etc.

          3) Given the time constraint at that time, I was unable to find any other approach of creating the HTML file. Since creating the HTML file involves parsing the style sheet as well. I think we need some extra classes when creating the XML document. So, I am not quite sure about the presence of those extra classes in all JREs.

          4) Element.setTextContent() method is not there in J2SE 1.4.2, and it gave rubbish when I used setNodeValue().

          After considering all these facts I think it's better to keep XML generation in this way and just handle those special character occurrences.

          I'm not reluctant to change the code if you have other ideas, I just wanted to say what I feel .

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Knut, Thanks for the ideas! Can you please consider following. Few negative/tricky points on doing this: 1) This way XML document becomes hard to read: It was a requirement, so that the user can go through a generated XML file and build up his own Style sheet. eg: <?xml version="1.0" encoding="UTF-8" standalone="no"?><?xml-stylesheet type="text/xsl" href="..."?> <plan><statement>SELECT * FROM T WHERE X <> 'a & b'</statement><time>2010-11-24 18:27:14.466</time></plan> 2) This need some re-factoring of the design approach: Now we are getting the name of the attribute from the query, we might need to change it. Now I am building the whole query plan tree and then only write the whole string to the file, we need to alter this approach. etc. 3) Given the time constraint at that time, I was unable to find any other approach of creating the HTML file. Since creating the HTML file involves parsing the style sheet as well. I think we need some extra classes when creating the XML document. So, I am not quite sure about the presence of those extra classes in all JREs. 4) Element.setTextContent() method is not there in J2SE 1.4.2, and it gave rubbish when I used setNodeValue(). After considering all these facts I think it's better to keep XML generation in this way and just handle those special character occurrences. I'm not reluctant to change the code if you have other ideas, I just wanted to say what I feel . Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Hi Nirmal,

          Thanks for writing up the challenges with such a rewrite. Some more
          comments below...

          > Few negative/tricky points on doing this:
          > 1) This way XML document becomes hard to read: It was a requirement,
          > so that the user can go through a generated XML file and build up
          > his own Style sheet.

          I think the Transformer can be tuned to pretty print the document. For
          example, this code makes it output XML that looks more or less
          identical to the manually formatted XML generated by the exporter:

          trans.setOutputProperty(OutputKeys.INDENT, "yes");
          trans.setOutputProperty("

          {http://xml.apache.org/xslt}

          indent-amount", "4");

          (I'm not sure exactly how portable the above code is, but I'm sure we
          can work out something that'll do the trick.)

          I've also noticed that, no matter how badly formatted the XML document
          is, it looks nice and tidy when I open it in Firefox. I assume other
          XML tools will behave similarly.

          > 2) This need some re-factoring of the design approach:

          I agree. It will be a chunk of work that someone must volunteer to do.

          > 3) Given the time constraint at that time, I was unable to find any
          > other approach of creating the HTML file. Since creating the HTML
          > file involves parsing the style sheet as well. I think we need some
          > extra classes when creating the XML document. So, I am not quite
          > sure about the presence of those extra classes in all JREs.

          I'm assuming the JREs in question are the Java ME ones. Those JREs
          don't even have the classes used in the generation of HTML files, so
          they'd need the Xalan jars on the classpath to be able to run the plan
          exporter in any case. I believe the Xalan jars contain all the classes
          and interfaces needed, also to generate the XML file, but I may be
          mistaken.

          > 4) Element.setTextContent() method is not there in J2SE 1.4.2, and
          > it gave rubbish when I used setNodeValue().

          Right, to make it work in 1.4.2, I think we need to replace
          setTextContent(text) with appendChild(doc.createTextNode(text)).

          > After considering all these facts I think it's better to keep XML
          > generation in this way and just handle those special character
          > occurrences.

          Well, I lean towards the opposite conclusion...

          If we had used the XML libraries to generate the XML, we wouldn't have
          had this bug or DERBY-4902, and perhaps other bugs that we haven't
          discovered yet.

          But I think the current patch would be a good incremental
          improvement. I'll check it in and back-port it to the 10.7 branch so
          that it makes it into the next release candidate. If we want to change
          how the XML is generated, we can file a new JIRA issue to track that
          work.

          Show
          Knut Anders Hatlen added a comment - Hi Nirmal, Thanks for writing up the challenges with such a rewrite. Some more comments below... > Few negative/tricky points on doing this: > 1) This way XML document becomes hard to read: It was a requirement, > so that the user can go through a generated XML file and build up > his own Style sheet. I think the Transformer can be tuned to pretty print the document. For example, this code makes it output XML that looks more or less identical to the manually formatted XML generated by the exporter: trans.setOutputProperty(OutputKeys.INDENT, "yes"); trans.setOutputProperty(" {http://xml.apache.org/xslt} indent-amount", "4"); (I'm not sure exactly how portable the above code is, but I'm sure we can work out something that'll do the trick.) I've also noticed that, no matter how badly formatted the XML document is, it looks nice and tidy when I open it in Firefox. I assume other XML tools will behave similarly. > 2) This need some re-factoring of the design approach: I agree. It will be a chunk of work that someone must volunteer to do. > 3) Given the time constraint at that time, I was unable to find any > other approach of creating the HTML file. Since creating the HTML > file involves parsing the style sheet as well. I think we need some > extra classes when creating the XML document. So, I am not quite > sure about the presence of those extra classes in all JREs. I'm assuming the JREs in question are the Java ME ones. Those JREs don't even have the classes used in the generation of HTML files, so they'd need the Xalan jars on the classpath to be able to run the plan exporter in any case. I believe the Xalan jars contain all the classes and interfaces needed, also to generate the XML file, but I may be mistaken. > 4) Element.setTextContent() method is not there in J2SE 1.4.2, and > it gave rubbish when I used setNodeValue(). Right, to make it work in 1.4.2, I think we need to replace setTextContent(text) with appendChild(doc.createTextNode(text)). > After considering all these facts I think it's better to keep XML > generation in this way and just handle those special character > occurrences. Well, I lean towards the opposite conclusion... If we had used the XML libraries to generate the XML, we wouldn't have had this bug or DERBY-4902 , and perhaps other bugs that we haven't discovered yet. But I think the current patch would be a good incremental improvement. I'll check it in and back-port it to the 10.7 branch so that it makes it into the next release candidate. If we want to change how the XML is generated, we can file a new JIRA issue to track that work.
          Hide
          Bryan Pendleton added a comment -

          +1 to not worrying too much about the human-readability of the generated XML.
          I think there are plenty of other reformatting tools that an XML-savvy user will
          have access to.

          Regarding the selection of which classes to use, I'd like to see us using
          the built-in system classes for XML handling wherever possible. As Nirmal
          notes, we were struggling with the availability of these classes, see for
          example the discussion of xml-apis.jar in DERBY-4781.

          Over time, the core platform Java environments become more sophisticated and
          provide more complete XML support, and as that occurs, if we can shift
          work away from our own custom code and into system-provided code that
          seems like a clear win, so I think it would be great if Knut could preserve
          these suggestions for how to do that as part of a JIRA issue somewhere.

          Show
          Bryan Pendleton added a comment - +1 to not worrying too much about the human-readability of the generated XML. I think there are plenty of other reformatting tools that an XML-savvy user will have access to. Regarding the selection of which classes to use, I'd like to see us using the built-in system classes for XML handling wherever possible. As Nirmal notes, we were struggling with the availability of these classes, see for example the discussion of xml-apis.jar in DERBY-4781 . Over time, the core platform Java environments become more sophisticated and provide more complete XML support, and as that occurs, if we can shift work away from our own custom code and into system-provided code that seems like a clear win, so I think it would be great if Knut could preserve these suggestions for how to do that as part of a JIRA issue somewhere.
          Hide
          Knut Anders Hatlen added a comment -

          Committed derby-4903-a to trunk with revision 1039084.

          Show
          Knut Anders Hatlen added a comment - Committed derby-4903-a to trunk with revision 1039084.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut & Bryan,

          Thanks for the responds.

          I really like to try this out, but due to academic constraints that I have till the 3rd week of January 2011,
          I am not quite sure that I will be able to look at re-factoring the code before that. If there's no rush I'll definitely
          take it after January 2011.

          For now I'll attach a patch soon, on comments received for patch (a).

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Knut & Bryan, Thanks for the responds. I really like to try this out, but due to academic constraints that I have till the 3rd week of January 2011, I am not quite sure that I will be able to look at re-factoring the code before that. If there's no rush I'll definitely take it after January 2011. For now I'll attach a patch soon, on comments received for patch (a). Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Bryan. I'll file a new JIRA issue for making use of the XML libraries.

          Show
          Knut Anders Hatlen added a comment - Thanks, Bryan. I'll file a new JIRA issue for making use of the XML libraries.
          Hide
          Knut Anders Hatlen added a comment -

          Here's a patch that extends the test case with testing more special characters, and by adding a less-than operator as mentioned in the original bug description.

          The test fails because the special character '&' wasn't handled by the patch. I tried to add a case for '&' in AccessDatabase, but the replace methods go into an infinite loop when trying to replace '&' with "&" because the replacement text includes the character to be replaced. I'll try to come up with a different solution.

          Show
          Knut Anders Hatlen added a comment - Here's a patch that extends the test case with testing more special characters, and by adding a less-than operator as mentioned in the original bug description. The test fails because the special character '&' wasn't handled by the patch. I tried to add a case for '&' in AccessDatabase, but the replace methods go into an infinite loop when trying to replace '&' with "&" because the replacement text includes the character to be replaced. I'll try to come up with a different solution.
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan,

          >It seems like it would be nice to perform the special character encoding of the
          >XML attribute strings prior to composing them into quoted substrings in the XML tag string,
          >since then replaceInAttribute() wouldn't have to go searching for the start and end of
          >the attribute string to find it.

          Since we're getting the attributes as quoted substrings from the database query it self,
          I'm not quite clear whether this is possible.

          Show
          Nirmal Fernando added a comment - Hi Bryan, >It seems like it would be nice to perform the special character encoding of the >XML attribute strings prior to composing them into quoted substrings in the XML tag string, >since then replaceInAttribute() wouldn't have to go searching for the start and end of >the attribute string to find it. Since we're getting the attributes as quoted substrings from the database query it self, I'm not quite clear whether this is possible.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          What about something like this for replacing '&' ?

          Show
          Nirmal Fernando added a comment - Hi Knut, What about something like this for replacing '&' ?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Nirmal, but that was the code I tried which went into an infinite loop. The reason is that the replace() method will continue until the resulting string contains no '&' characters, and even if it replaces a '&' with "&", it will still find an '&' in the result string.

          The attached patch (derby-4903-c.diff) uses a different approach, which also works for '&'.

          Show
          Knut Anders Hatlen added a comment - Thanks Nirmal, but that was the code I tried which went into an infinite loop. The reason is that the replace() method will continue until the resulting string contains no '&' characters, and even if it replaces a '&' with "&", it will still find an '&' in the result string. The attached patch (derby-4903-c.diff) uses a different approach, which also works for '&'.
          Hide
          Nirmal Fernando added a comment -

          Oh, yeah! Your approach looks fine to me!

          Thanks.

          Show
          Nirmal Fernando added a comment - Oh, yeah! Your approach looks fine to me! Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks, Nirmal. Committed revision 1039268.

          Show
          Knut Anders Hatlen added a comment - Thanks, Nirmal. Committed revision 1039268.
          Hide
          Knut Anders Hatlen added a comment -

          Merged to 10.7 and committed 1039272. Closing the issue.

          Show
          Knut Anders Hatlen added a comment - Merged to 10.7 and committed 1039272. Closing the issue.

            People

            • Assignee:
              Nirmal Fernando
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development