|
1. I think the hint name should be more like "openjpa.hint.OptimizeResultCount". It should be "hint" and not "hints" to match the existent "openjpa.hint.OracleSelectHint". And I see no reason to tie it to SQL.
2. I don't think we need a FetchConfiguration instance variable. For find() calls and loading hollow objects we always know we're only loading one instance because we use the StoreManager.load() call internally. There are similar internal APIs used only when traversing a to-one relation. So the JDBCStoreManager or relation field strategies can set the proper expected result count into the Select directly. Much better than having to set a FetchConfiguration value and then make sure it disappears for subsequent selects. The only problem is Query.getSingleResult, because right now we execute the query as a multi-result query and then extract the one result at the JPA layer. We don't have to do it this way, though. The underlying kernel Query already has a "Unique" property you can set to indicate a single result. The property doesn't have quite the semantics we want, because it allows a query that returns 0 results whereas that's an error in JPA. We can't just change the Unique semantics because of JDO, but we could certainly make it configurable on the Query instance whether a Unique query can legally return 0 results. So if we start using the Unique property rather than extracting the single result at the JPA layer, we'll know when the user is using getSingleResult when we construct the Select, and we can again set the expected number of results directly into the Select. 3. The SelectImpl already knows when it has to-many eager joins; no need for additional state. See SelectImpl.hasEagerJoin(boolean toMany). 4. "getOptimizeClause" seems too generic. I'm also not clear on what use it has in the base DBDictionary class if you state that individual dictionaries will still have to override toSelect themselves to insert the optimization SQL. Abe is there a way from selectImpl to know if this is the top select or not.For example if EagerFetchMode set to parallel if we are trying to retrieve Department and department is eagerly fetching Emps.Then a single user query (select d from Department d) will produce multiple push down sqls.Now if the user set the hint to optimize on the top level query ie select d from Department d then it should appear only on the top level query and not the subsequent queries.But to do so we need the ability to be able to distinguish if this is the top select or not.Is there currently any mechanism to do this
ritika I don't recall any way to know whether a given Select it is a top-level Select or a parallel Select. Note that if you take my advice and implement this as an ExpectedResultCount property of SelectExecutor (which can also replace the isSingleResult propertyof Union), then you don't need to know. The query/storemanager/whatever that constructs the top-level select will set the expected count based on hints / knowledge, and the dictionary will use it as needed. It won't get passed to parallel eager selects accidentally.
Abe I do not understand how the SelectImpl.hasEagerJoin method is implemented.I have EmpBean and DeptBean entities as follows
public class DeptBean2 implements Serializable { @Id private Integer no; @Column(name="cdname",length=80) private String name; @OneToMany(fetch=FetchType.EAGER , mappedBy="dept") private List<EmpBean2> emps; @OneToOne private EmpBean2 mgr; public class EmpBean2 implements Serializable { @Id Integer empid; @Column(name="cesalary",columnDefinition="_double") Double salary; @Column(name="cename",columnDefinition="_varchar") String name; @Column(name="cehireDate") Date hireDate; @Column(name="ceismanager" , columnDefinition="_boolean") Boolean isManager; @Column(name="ceemp_ts") Timestamp emp_ts; @ManyToOne() //@ForeignKey(name="deptno") DeptBean2 dept; Now I set the EagerFetchMode to parallel and run the following query OpenJPAQuery qryE = OpenJPAPersistence.cast(em.createQuery("select d from DeptBean2 d")); qryE.setHint("openjpa.hint.OptimizeResultCount", new Integer(1)); List rsE = qryE.getResultList(); These are the selects which are generated 42902 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> executing prepstmnt 808464432 SELECT t0.no, t1.empid, t1.dept_no, t1.ceemp_ts, t1.cehireDate, t1.ceismanager, t1.cename, t1.cesalary, t0.cdname FROM deptab2 t0 LEFT OUTER JOIN emptab2 t1 ON t0.mgr_empid = t1.empid optimize for 1 row 42902 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> [0 ms] spent 65474 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> executing prepstmnt 1509579258 SELECT t0.no, t1.empid, t2.no, t2.mgr_empid, t2.cdname, t1.ceemp_ts, t1.cehireDate, t1.ceismanager, t1.cename, t1.cesalary FROM deptab2 t0 INNER JOIN emptab2 t1 ON t0.no = t1.dept_no LEFT OUTER JOIN deptab2 t2 ON t1.dept_no = t2.no ORDER BY t0.no ASC 65474 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> [0 ms] spent 105963 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> executing prepstmnt 1725589210 SELECT t0.empid, t1.no, t1.cdname, t0.ceemp_ts, t0.cehireDate, t0.ceismanager, t0.cename, t0.cesalary FROM emptab2 t0 LEFT OUTER JOIN deptab2 t1 ON t0.dept_no = t1.no WHERE t0.dept_no = ? [params=(int) 2] 106003 dwtest TRACE [main] openjpa.jdbc.SQL - <t 1094730048, conn 685648094> [40 ms] spent when I am executing the 1st select I see 3 eagerKeys in the eagerKey map in the debugger (dept,mgr,emps) but the selectImpl.hasEagerJoin(toMany) method returns false.So my question is that inspite of seeing emps in the eagerKey map how come this method returns false.The same holds true for the second select.So when is the hasEagerJoin method going to return true here is a patch based on the discussions regarding this feature
Ritika Maheshwari made changes - 16/Mar/07 04:44 PM
> when I am executing the 1st select I see 3 eagerKeys in the eagerKey map in the debugger (dept,mgr,emps) but the selectImpl.hasEagerJoin(toMany) method returns false.So my question is that inspite of seeing emps in the eagerKey map how come this method returns false.
Because emps is being selected via a parallel select, not a to-many join in the primary select. Comments on the proposed patch:
- I don't like the whole scheme of setting the expected result count to -1 for anything "artificial". It's confusing and unnecessary. Just set it to the number of expected "primary" results, and the DBDictionary can invoke sel.hasEagerJoin(true) to figure out if the expected count can be used. Or just have the getter for the expected count always return 0 if there is an eager to-many join (or better yet, turn -1 into a value meaning "unknown" and have it return -1, which would then also be the default when no expected count is set). - I still think there should be a way to get rid of Union.is/setSingleResult by moving the expected result property to SelectExecutor -- which both Select and Union extend -- and taking advantage of the new expected result ("1" obviously indicates a single result). - If you're going to validate the value of the user-supplied hint in the JPA QueryImpl, you might as well transform it into a Number at that point before setting it into the FetchConfiguration. Also, I'd accept any Number, not just an Integer (technically we should accept any whole number, but that's a pain to implement). Then in the JDBC layer, you can just cast the hint value directly to a Number and forgo validating it and checking for String values a second time. - DB2 really cares whether you use "optimize for 1 row" vs. "optimize for 1 rows"? That's ugly. - We should probably generalize the configuration of row optimization to the base DBDictionary with an override mechanism. - If you're going to invoke setUnique(true) on the underlying query from the JPA QueryImpl's getSingleResult, you need to do three things: 1. Unset it in a finally clause, because the very next call might be to getResultList, and in general getSingleResult shouldn't have stateful side effects. 2. Change the kernel's QueryImpl to throw an exception when unique is set but the query doesn't return any results. Right now it allows 0 results and will return null, which is indistinguishable from a projection on a null field that returned 1 result. 3. Get rid of the code immediately following in getSingleResult that extracts the value if a List is returned, because after setting the unique flag on the underlying query, it will never return a List. - The hint key should be a constant in the kernel's Query interface or somewhere like that. Abe your comment :
- I don't like the whole scheme of setting the expected result count to -1 for anything "artificial". It's confusing and unnecessary. Just set it to the number of expected "primary" results, and the DBDictionary can invoke sel.hasEagerJoin(true) to figure out if the expected count can be used. Or just have the getter for the expected count always return 0 if there is an eager to-many join (or better yet, turn -1 into a value meaning "unknown" and have it return -1, which would then also be the default when no expected count is set). The reason for using -1 is that we want to differentiate between cases where the user added a hint to optimize for 1 row versus where we internally generated the value (getSingleResult,singleRelationshiptraversal etc).If the user has specified the optimize for 1 row through hint than we do not want to check for eagerJoins(true).We basically do not want to do anychecking and just add the optimize for 1 row clause. But if the expectedResultCount is computed internally to be a value of 1 then we want to check the eagerJoins(true) and want to make sure that we are in fact returning a single row.So the -1 differnetiates the internally generated 1 from the user's specified 1 Your Comment We should probably generalize the configuration of row optimization to the base DBDictionary with an override mechanism. is contradicting with your earlier comment 4. "getOptimizeClause" seems too generic. I'm also not clear on what use it has in the base DBDictionary class if you state that individual dictionaries will still have to override toSelect themselves to insert the optimization SQL. The fact of the matter is that the optimize clause generation for various dictionaries is different in terms of syntax and where the clause appears in the select string.So I am not sure if u really want to generalize the configuration of row optimization to the base DBDictionary.Although that was our original plan > The reason for using -1 is that we want to differentiate between cases where the user added a hint to optimize for 1 row versus where we internally generated the value (getSingleResult,singleRelationshiptraversal etc).
That makes sense, though the use case seems vanishingly small (when would the user explicitly optimize for one row but also include a to-many eager join?). How about instead of using -1 to indicate "an artificial single result", we change the setExpectedResults API on Select (or SelectExecutor if we go that route) to take an additional "boolean force" parameter that is set to true when it's a user-specified count, and false when it's an artificial expectation. Internally we can keep track of whether the expected count is forced with another bit flag among those already used in SelectImpl. When the count is not forced, we ignore it in the presence of a to-many eager join. This API seems more clear-cut to me, in addition to being more general (even if we don't necessarily need the additional generality at this point). > The fact of the matter is that the optimize clause generation for various dictionaries is different in terms of syntax and where the clause appears in the select string.So I am not sure if u really want to generalize the configuration of row optimization to the base DBDictionary Fair enough -- I had forgotten that it was so different for different databases. based on your latest comments here is another patch
Ritika Maheshwari made changes - 21/Mar/07 09:43 PM
In the singleResult method of the kernel QueryImpl when null was returned I threw an error using localizer property "not-unique".But this error message does not correctly define the error so in the localizer.properties I added another property as follows
not-unique: The query on candidate type "{0}" with filter "{1}" was \ configured to have a unique result, but more than one instance matched \ the query. is-null: The query on candidate type "{0}" with filter "{1}" was \ configured to have a unique result, but no instance matched \ the query. so the is-null error would be thrown for the null conditions.So I will apply this on top of patch2 > 2. I don't think we need a FetchConfiguration instance variable.
My understanding is that you can often provide hints about how many results might be returned, as well as whether one or many results will be loaded. So it seems like we might want to expose some way for the user to specify for an arbitrary query (or relation load) that they expect n records to be returned. > 4. "getOptimizeClause" seems too generic. Agreed. The optimizations tend to get tucked into different parts of the SQL statement; this probably will end up needing to be significantly different for different db back-ends. Something is wrong with my email system so instead of replying to the SVN commit message, I'm posting my comments on the commit of this patch here.
I didn't go through the entire commit, but what I saw looked really good. There are, however, a few problems I'd like to see fixed: 1. The formatting is messed up (in that it is unlike all our other formatting) in many places. Indentation is off, spaces are missing before opening braces or added before line-ending semicolons, etc. Can you change the formatting to be consistent with the rest of the codebase? 2. The point of making the optimize hint string a static constant is not only to avoid typo errors when we use it internally, but also to make it available to users. In this way it is exactly like the constants in org.apache.openjpa.kernel.QueryFlushModes, for example. I'd like to see the constant removed from AbstractStoreQuery. Instead, I'd like to see an org.apache.openjpa.kernel.QueryHints interface with a HINT_RESULT_COUNT defined, and I'd like to see org.apache.openjpa.persistence.OpenJPAQuery implement this interface, just as it does for QueryFlushModes. This will allow both our internal code -- through QueryHints -- and user code -- through OpenJPAQuery -- to use the constant. 3. This is much more of a matter of personal opinion, but I think the added code is over-commented. The new SelectExecutor.setExpectedResultCount method should be thoroughly documented, but IMO we don't need comments explaining what we're doing every time we invoke the method. The code speaks for itself, and over-commenting always runs the danger of the comments falling out of synch with the code. Abe,
I hope you realize that the expectedResultCount on selectImpl is not reflecting the actual reseultCount of that SelectImpl but actually specifying the number of rows to be optimized for that selectImpl.So user might very well say optimize for 1 row where as the actual number of results returned by selectImpl is more than 1.To me the expectedResultCount is misleading and should be optimizeResultCount. Has this new feature been documented?
Patrick Linskey made changes - 07/Aug/07 05:36 PM
Attached documentation patch for query hints
Catalina Wei made changes - 10/Aug/07 12:15 AM
David Wisneski made changes - 10/Aug/07 12:27 AM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Derby does not support.
DB2 supports with syntax OPTIMIZE FOR n ROW(S) which appears toward the end of the sql select statement.
Oracle supports with syntax /*+ FIRST ROWS(N) */ or /*+ ALL ROWS */ which occurs after the select keyword.
SQLServer supports but the syntax is different from DB2, Oracle.
Since the concept applies to multiple (but not all) databases and the syntax is vendor specific, we propose a hint with the name
openjpa.hints.SQLOptimizeRows
which can be assigned a positive integer value.
This value will be verified on the setHint call and if it is not a java.lang.Integer and a positive value, an ArgumentException will be thrown.
For native queries, the hint, if specified, will be ignored.
The hint value will be stored on the FetchConfiguration instance.
For internally generated selects, such as from EntityManager.find or relationship navigation, we also want to indicate if one row is the expected size of the result set. Also in the case of Query.getSingleResult we want to indicate an expected size of one.
To do this we will have a instance variable n FetchConfiguration to track the fetch operation: getResultList, getSingleResult, find, etc.
So FetchConfiguration will contain the user supplied hint value OR a indication of the operation that generated the select.
We modify SelectImpl to add instance variables for expectedResultCount and boolean hasSingleValuesJoins. When multi valued relationships are added to a select through the EagerMap the expectedResultCount will be appropriately modified to indicate it is not a single value select. even if the original internal operation (i.e. find ) would normally be a single row resut.
Finally the Dictionary class will have new method getOptimizeClause. Database Dictionaries that support some form of OPTIMIZE FOR will have to override the two toSelect methods so that the expectedResultCout from selectImpl is emitted into the sql.