Re-posting this mail from Hiranya with answers and comments here to
have all info in one place (common Derby practice
Mail to derby-dev on Thu, 18 Jun 2009 11:22:33 +0530.
Thanks a lot for reviewing the patch and for the detailed analysis provided.
I decided to reply to the notification mail rather than to the JIRA entry
since I can better correlate my answers with your questions.
Please see my comments in-line.
> I have been though the code in a first pass. Thanks for this great
> work, Hiranya, it is definitely taking us a long step in the right
Thanks. This is really encouraging.
> I'll post most my findings now (non-trivial and trivial),
> I am sure you know about some of them
> already, but any way... There is a lot of new code, so please forgive if
> I have overlooked stuff Please don't get discouraged by all the
> comments, this is really good progress!!
> - Nice code for that general graph handling. I didn't quite see where
> the fact that the objects are comparable (orderable) comes into
> play, cf the comment in DBPersistentObject:
> "Objects of higher levels of the dependency hierarchy should get
> lesser type values".
The object comparison logic is in the DBObjectComparator class which
implements the standard Comparator interface. It uses the type values of the
DBPersistentObject instances to prioritize them accordingly during DDL
generation. (Currently I have used some generics there. Will get rid of
> There is also the explicit dependencies encoded in the edges, so my
> question is, how is this ordering used? Is it supposed to be used when
> walking the graph somehow?
The edge direction is used to identify the provider and the dependent of a
dependency relationship. This directionality information is used during the
graph walk as you have mentioned to locate objects which doesn't have any
providers associated with them. See the DiGraph#getRoot method.
> - There is no encoding of the dependency a persistent object might
> have on a role, and similarly no code for setting the role before
> attempting its creation yet, right? I guess you just didn't get that
> far yet
Yes we need to implement this. In fact I was going to start a discussion on
this on the mailing list soon
> - Did you catch the dependency a CHECK constraint may have on
> functions? I could not find that..
I haven't implemented DDL generation for CHECK constraints yet. It's only
implemented for primary key, foreign key and unique constraints yet. Will
look into this.
> - Similarly, the GENERATED ALWAYS AS (expression) may cause dependency
> on a function and I could not see that being captured.
> - Views, triggers not handled yet. Both may have dependencies on other
> objects as well as role.
Yes this is for the future.
> - In DBPersistenObject.getDDL you use a switch to handle to object
> subtypes. Why not use subclassing here?
This is also an option. Will look into this. The only issue I see in this
approach is that we have to write a class for each persistent object type.
Currently we handle all objects with one class.
> The method name "getDDL" is a bit misleading to me, since it does
> not return the DDL, it has a side effect: it write the DDL to Log.
> Maybe this is the pattern used in dblook; I tend to prefer such
> methods to format their output strings and return them and let the
> upper layers handle the actual printing.. But it would be nice to
> make the name more descriptive, maybe emitDDL or some such.
OK I'll perform the necessary refactoring on the code.
> For "case DB_OBJECT_TYPE_FUNCTION", the "dumpPerms = true" is missing
> I think.
Bug! Will fix this up for the next patch.
> - In DBPersistenObject.getPermissions, is GrantedPermissions.size()
> ever 0? If not, I'd prefer an invariant guard instead leniency..
True. Since we don't allow removing permissions from the objects the size
can never be '0'. The list is either null or populated with at least one
permission. Thanks for pointing this out. I'll remove the redundant check.
> - In DiGraph.remove, does the case "e.end() == data" ever arise? I
> thought edges would only ever be removed when the e.getStart !=
> null.. (In a generic graph package the code is correct and necessary
> of course)
Well because of the way we use the graph this condition will never arise.
But in a more general sense, as you also have pointed out, I think this is
the correct behavior. Otherwise there is chance that an edge can hang around
without an ending vertex. So I prefer to leave this logic as it is.
Otherwise the graph implementation doesn't really comply with the
mathematical definition of 'graph'.
> - in DBPermission.getDDL, the parent object is an argument. Would it
> perhaps be cleaner to pass that in in the constructor since along
> with grantor and grantee?
+1. We should be able to do that.
> In the DB_COLUMN_PERMISSION case, you use just the first () value
> returned from getGrantedPermissions.. Not sure I understood why.. In
> any case an explanation would be good here.
Looks like a bug. I'll fix this to iterate over all granted permissions.
> - In DB_Schema.doSchemas, it would be good to explain why you skip the
> schema called "APP"? One could envision a user called "APP", so is
> it always safe to skip it?
I think the old doSchemas method does the same. I just took the logic from
there. I noticed that when the "APP" schema is not skipped a bunch of
unnecessary schemas get included in the graph. So I guess the correct
behavior is to skip it?
> - In doTables, there is an "if (graph.vertex(currentTable) == null)".
> When would it ever be != null? Perhaps when a dependency on that
> table was already registered? Please add a comment..
That's the idea. But because we tackle this task level-by-level (ie we first
handle schemas, then functions/procedures, then tables and so on) this check
is actually redundant. We should be able to get rid of it safely.
> - in DB_key.addConstraint, you do a call to getProviderType. Great
> that you figured out how to handle this
> But do you handle the case where the returned type is
No we don't have to. By the time we handle constraints, tables are already
handled and they are available in the graph (because we handle one level at
a time). So provider table objects will be always found from the graph and
so we just need to create an edge. Just to further clarify the things, I'm
following the dependency ordering pointed out by Rick in a previous thread
to during graph creation. As Rick pointed out by then we don't have to worry
about inter-level dependencies, we just need to handle intra-level
> Some minor nits which I'm sure you'll handle when the time comes
> (added mainly here so you can see I have read the code
> - We try to keep lines within 80 columns and not use TABs or blanks at end
> of lines
> in new code.
> - Some magic numbers should be made into constants, e.g. positions in
> result set from queries with "*" forces the reader to look up the
> table definition see see whats being read, sa in a getString(3).
My plan is to get rid of '*' queries at the end of the project. Currently
I'm trying out lot of things with the code so it's easier to have '*'
> - missing Javadoc for classes and public members.
Absolutely +1. I'll work on that
> - typo or "vertex" in Javadoc for DiGraph.remove.
You really have gone through every single line of code Thanks for
pointing this out.
> - In DB_Schema.doSchemas, in the if statement "if (tablesOnly | ..)"
> you do a "continue". Wouldn't a "break" be better for the tablesOnly
> - in getTableOwner, I'd replace the "while" with an "if .. else" for
> better legibility. Same for getKeyOwner and the "while
> (authorResultSet.next()" loop in getVertices.
That makes sense. Since there is only record in the result set, right?
> - in doKeys, the variable "getReferenceCols" is unused?
Correct. Let's get rid of that.
> Thanks, and keep up the good work!
Indeed. And thanks again for the great analysis. Expect another patch by