|
Tried the program for this example
java ASTParser "select * from table(foo(4))t" but it seems the binding is done early here so it crashed: Exception in thread "main" java.sql.SQLSyntaxErrorException: 'APP'.FOO' does not identify a table function. Does this mean only a subset of the syntax may be used? If so, what parts? Dag, this is because the initialization of NewInvocationNode (invoked by the parser -- see vtiTableConstruct() in sqlgrammar.jj) looks up the class name associated with the table function in the data dictionary. That sounds like the wrong place to do it, it should probably be moved from init() to bindExpression(). The actual verification that the class exists is (correctly) done in bindExpression(). I wouldn't be surprised if there's more misplaced code, since until now it hasn't really made any difference if some of the binding has been performed in the parser.
I agree with Knut's analysis. The parse-time constructor for NewInvocationNode contains a bind-time call to the data dictionary. I regard this as a bug. The bug could probably be fixed by faulting in the metadata only when it is needed rather than pro-actively reading the catalogs when the AST is created.
I also agree with Knut that the devil is in the details. Since a standalone parser hasn't been tested widely, there are likely to be several cases where bind-time logic has leaked into the parsing phase. Again, as they come up, I think we should clean them up. It might be worth figuring out how we could record all of the SQL that is issued by our regression tests and then pump that SQL through the ASTParser logic. That could give us some confidence that we have found the central bulge of problems--though it would not be a systematic way to catch all of the outliers. Tests ran cleanly for me on derby-3946-01-aa-standaloneParser.diff. Committed at subversion revision 713721.
Additional commentary on this issue can be found in this email thread: http://www.nabble.com/Using-derby-to-parse-an-SQL-statement-td20461127.html#a20461127
Rick,
your patch has helped us a lot so far ... since I cannot post any messages to the list for some reason, I want to give my feedback here. getting hold of the QueryTreeNode is a big step in the right direction ... however for actually analyzing the tree it would be necessary to access the tree's elements. I.e. traverse the tree sequentially, getting information how many childnodes exist etc. The problem that I still see is that some of the classes I'd need for this are not visible so that I can use the from the outside. The QueryTreeNode in your example code seems to be an instance of CursorNode ... how can I go on from here? The things we'd like to know is: which tables are involved in the statement (also in inner selects) which fields from which tables are accessed, which aliases are given for the tables that are accessed etc. Could you give me a further hint? Thanks Christian Hi Christian,
I have posted a reply to a similar question on your original email thread. Hopefully you will be able to see that soon. 1) It's useful to build the engine javadoc (ant -quiet javadoc) and browse the javadoc for the package org.apache.derby.impl.sql.compile. In particular, you will see that the AST nodes are the classes indented under QueryTreeNode in the tree view. 2) The nodes themselves implement Visitable so you can write you own Visitor to explore the AST graph. Visitable has one method, accept(), and by looking at the implementations of that method, you will understand how the nodes snap together into a graph. I've attached a simple Visitor (TreeWalker), which shows you some classes in the graph. That may help explain the AST a bit more. When I run java TreeWalker "select a from t, s where t.a = s.a" I get the following graph: org.apache.derby.impl.sql.compile.CursorNode org.apache.derby.impl.sql.compile.SelectNode org.apache.derby.impl.sql.compile.ResultColumnList org.apache.derby.impl.sql.compile.ResultColumn org.apache.derby.impl.sql.compile.ColumnReference org.apache.derby.impl.sql.compile.FromBaseTable org.apache.derby.impl.sql.compile.FromBaseTable org.apache.derby.impl.sql.compile.BinaryRelationalOperatorNode org.apache.derby.impl.sql.compile.ColumnReference org.apache.derby.impl.sql.compile.ColumnReference Let me try to explain this tree a bit: The SelectNode has the following children: i) The columns in the SELECT list. The whole SELECT list is represented by a ResultColumnList and there is only one ResultColumn (representing "a") in that list. ii) The tables in the FROM list. There are two of these, each represented by its own FromBaseTable. iii) The WHERE clause. This is a BinaryRelationalOperatorNode (representing the "=" operator). This operator node has a left child and a right child, "t.a" and "s.a" respectively. So to answer your specific questions: Q) What tables are in the query? A) Look for FromBaseTables in the graph. Q) Which fields in the tables are accessed? A) The column references can appear in the SELECT list (the ResultColumnList) or in the WHERE clause (under the BinaryRelationalOperatorNode). Note, however, that the columns are not matched up to tables yet. This isn't done by the parser. That kind of name resolution happens during Derby's bind() phase and it requires metadata so that Derby knows the structure of the tables. Unless you create the tables in Derby, there will be no way to move forward to the bind() phase. If you don't provide this information to Derby, then you will have to write your own name-resolution phase. Q) What are the table aliases, if any? A) FromBaseTable.getExposedName() will give you the name of the table (or the alias name if you specified an alias) Hope this helps, -Rick Hi Rick,
thanks a lot for your examples ... that makes things a lot clearer. I think I got the idea now. Now, I have to try and write my own code around it to make it "easier" to use for us. This was really great support! Regards Christian Linking this issue to DERBY-791. The XmlTreeWalker attached to that issue can be used to print out a query tree in xml.
Can this be closed? If so, should I put it as a 'new feature' in the Release notes for 10.5?
Hi Myrna,
I think there's more work to be done on this issue--I just haven't found a rainy day for that yet. If we say something about this issue in the release notes, then we'll need to warn users that the technique relies on non-public apis which can change between releases. Thanks. Attaching a version of the ASTParser which has been modified to use a transient in-memory database rather than creating an unnecessary on-disk database.
Good idea to use an in-memory database. The storeless engine (DERBY-2164) is probably an even better fit, as this is exactly the kind of use it was intended for (it skips database creation entirely). The down-side is of course that the storeless classes have not been added to the production jars yet, so you would need to add the classes.storeless directory to the classpath for it to work.
Attaching a third rev of the ASTParser. This rev hard-codes the context id for LanguageConnectionContext. This lets you compile ASTParser against the debug jars--the ContextId class itself does not appear in the Derby jar files because the Derby build eliminates classes which merely contain constants.
I have successfully compiled and run this ASTParser against the debug derby.jar that is part of the 10.5.3.0 distribution. Attaching a new version of TreeWalker which implements the visitChildrenFirst() method which was recently added to the Visitor interface.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A) Apply the patch to the Derby trunk and build Derby.
B) Compile ASTParser
C) Run ASTParser. The program takes one argument, which is a query string which you would like to parse. The program then retrieves the parsed query tree and prints it out. For example:
java ASTParser "select a from t, s where t.a = s.a"
Note that this technique for using the parser standalone involves invoking non-public APIs which are subject to change from release to release. Note, however, that these APIs have remained stable since Derby was open-sourced.
The patch touches the following files. I will run tests. I am inclined to check in these changes if the tests pass.
M java/engine/org/apache/derby/impl/sql/conn/GenericLanguageConnectionContext.java
M java/engine/org/apache/derby/iapi/sql/conn/LanguageConnectionContext.java
Adds some accessors to the compiler's state variable so that query trees can be stored and retrieved.
M java/engine/org/apache/derby/impl/sql/GenericStatement.java
Pokes the query tree into the state variable when the user sets the StopAfterParsing tracepoint.