Issue Details (XML | Word | Printable)

Key: JDO-246
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Erik Bengtson
Reporter: Michael Watzek
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
JDO

JPOX eliminates duplicates in the query result although DISTINCT is not specified.

Created: 09/Dec/05 11:38 PM   Updated: 21/Mar/06 07:36 AM
Return to search
Component/s: tck2
Affects Version/s: JDO 2 beta
Fix Version/s: JDO 2 final

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works VariableInResult.patch 2006-02-12 05:00 AM Craig Russell 4 kB

Resolution Date: 21/Mar/06 07:36 AM


 Description  « Hide
Test case NPEInResultExpr fails because the result of the query below is expected to contain duplicates. JPOX eliminates the duplicates.

14:22:55,046 (main) DEBUG [org.apache.jdo.tck] - Executing API query: SELECT employee.manager.lastname FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) VARIABLES Employee employee
14:22:55,078 (main) DEBUG [org.apache.jdo.tck] - Query result: [emp2Last, null]
14:22:55,078 (main) DEBUG [org.apache.jdo.tck] - Wrong query result:
expected: [emp2Last, null, emp2Last, emp2Last, emp2Last]
got: [emp2Last, null]
14:22:55,078 (main) INFO [org.apache.jdo.tck] - Exception during setUp or runtest:
junit.framework.AssertionFailedError: Assertion A14.6.9-4 (NPEInResultExpr) failed:
Wrong query result:
expected: [emp2Last, null, emp2Last, emp2Last, emp2Last]
got: [emp2Last, null]
at junit.framework.Assert.fail(Assert.java:47)
at org.apache.jdo.tck.JDO_Test.fail(JDO_Test.java:546)
at org.apache.jdo.tck.query.QueryTest.queryFailed(QueryTest.java:500)
at org.apache.jdo.tck.query.QueryTest.checkQueryResultWithoutOrder(QueryTest.java:485)
at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1189)
at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1029)
at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:966)
at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:946)
at org.apache.jdo.tck.query.result.NPEInResultExpr.testPositive(NPEInResultExpr.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at junit.framework.TestCase.runTest(TestCase.java:154)
at org.apache.jdo.tck.JDO_Test.runBare(JDO_Test.java:204)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at junit.textui.TestRunner.doRun(TestRunner.java:116)
at junit.textui.TestRunner.doRun(TestRunner.java:109)
at org.apache.jdo.tck.util.BatchTestRunner.start(BatchTestRunner.java:120)
at org.apache.jdo.tck.util.BatchTestRunner.main(BatchTestRunner.java:95)


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Erik Bengtson added a comment - 22/Jan/06 10:16 PM
There is just two departments persisted, so should return only two instances.

Michael Bouschen added a comment - 01/Feb/06 01:07 AM
Here is the JDOQL:
  SELECT e.manager.lastname FROM org.apache.jdo.tck.pc.company.Department
  WHERE employees.contains(e) VARIABLES Employee e
The query selects the variable that iterates the collection of employees of the two Departments. Unless distinct is specified, the query should return as many values as there are instances in the employees collections.

Erik Bengtson added a comment - 01/Feb/06 03:11 AM
I dont understand. You are querying Departments and iterating each element of it. The elements that will be in the result are the ones where the filter evaluates to true. Here the filter evaluates to true the two elements. Now the only thing remaining is for each element we navigate to return the result, so two instances only

Michael Bouschen added a comment - 01/Feb/06 03:53 AM
Section "14.6.9 Specifying the Result of a Query (Projections, Aggregates)" explicitly defines a different behavior in case a variable is included in the result:

If a variable or a field of a variable is included in the result, either directly or via navigation through the variable, then the semantics of the "contains" clause that include the variable change. In this case, all values of the variable that satisfy the filter are included in the result.

I read the above query to return the manager's lastname of all the employees included in the employees collection of each department.

Craig Russell added a comment - 01/Feb/06 04:45 AM
Michael interprets the specification as I had intended when I wrote it. I'm willing to clarify the specification if needed.

Erik Bengtson added a comment - 01/Feb/06 05:46 AM
To me this is intersection. See the below example

Employee1
Employee2
Employee3

Dept1.employess=Employee1
Dept1.employess=Employee3

Dept2.employess=Employee2

Intersection

Employee1 x Dept1.employess=Employee1
Employee3 x Dept1.employess=Employee3
Employee2 x Dept2.employess=Employee2

I don't like this sintax since it changes an obvious behaviour, IMO. I would prefer to write it as

SELECT intersect(employee,this.employees).manager.lastname FROM org.apache.jdo.tck.pc.company.Department VARIABLES Employee employee

Now it's up to up to clearify

Craig Russell added a comment - 01/Feb/06 06:28 AM
It might be easier to think about this if we add another projection to the SELECT:

SELECT employee.id, employee.manager.lastname FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) VARIABLES Employee employee

Here, the results should be:

{1, "emp2Last"}
{2, null}
{3, "emp2Last"}
{4, "emp2Last"}
{5, "emp2Last"}

Simply removing employee.id should not change the number of rows of the result. It should only change the number of columns returned by the query.

Erik Bengtson added a comment - 01/Feb/06 07:15 AM
I will do a small change in your example before expanding it.

------------------------

Employee = {"Employee1","Employee2","Employee3"}
Department = {"dept1","dept2","dept3"}
Department.employees = {{"Employee1","Employee3"},{"Employee2"},{"Employee1","Employee2"}}

SELECT employee.name, employee.manager.lastname FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) VARIABLES Employee employee

Here, the results should be:

{"Employee1", "emp2Last"}
{"Employee3", null}
{"Employee2", "emp2Last"}
{"Employee1", "emp2Last"}
{"Employee2", "emp2Last"}

Simply removing employee.id should not change the number of rows of the result. It should only change the number of columns returned by the query.

------------------------

Now we add:

Resource = {"Resource1","Resource2","Resource3"}

Department.resources = {{"Resource2"},{"Resource2","Resource3","Resource1"},{"Resource1","Resource2"}}

SELECT employee.name, employee.manager.lastname, resource.name FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) && resources.contains(resource) VARIABLES Employee employee, Resource resource

Here the results can vary depending how you expect employee and resource sets be joined

{"Employee1", "emp2Last", "Resource2"}
{"Employee3", null, null}
{"Employee2", "emp2Last","Resource2"}
{null,null,"Resource3"}
{null,null,"Resource1"}
{"Employee1", "emp2Last","Resource1"}
{"Employee2", "emp2Last","Resource2"}

What does && or || operators would mean here?

Any other thoughts?

Michael Bouschen added a comment - 02/Feb/06 12:10 AM
I'm a little confused about the term intersection. Just for my understanding: do you mean Cartesian product?

Here is my understanding of the semantics of your query including the resource variable. Please note, this does not necessarily mean it needs to be implemented that way :-):
The query first calculates the Cartesian product for Department, Employee and Resource: Department x Employee x Resources. This results in a set of triples. It then retains only those triples where the employee *and* the resource have a relationship to the department in the first column of the triple. The Select clause then defines values that are really returned by the query: employee.name, employee.manager.lastname, resource.name. So the departments are not included in the result.

Then the query result would be:
{"Employee1", "emp2Last", "Resource2"}
{"Employee3", null, "Resource2"}
{"Employee2", "emp2Last","Resource2"}
{"Employee2", "emp2Last","Resource3"}
{"Employee2", "emp2Last","Resource1"}
{"Employee1", "emp2Last","Resource1"}
{"Employee1", "emp2Last","Resource2"}
{"Employee2", "emp2Last","Resource1"}
{"Employee2", "emp2Last","Resource2"}
Rows 1 and 2 represent values for department dept1, rows 3-5 for dept2 and rows 6-9 for dept3.

About the difference between uisng && or ||:
I think with && only those tripels are retained where both the employee and the resource have a relationship to the department of the triple. Using || the set of retained tripels before the projection is bigger, because it also includes tripels where only one of the two (employee, resource) is related to the department.

Does this make sense?

Erik Bengtson added a comment - 02/Feb/06 12:52 AM
> I'm a little confused about the term intersection. Just for my understanding: do you mean Cartesian product?

We are using intersection between Department.employees vs Employee, since we only retain
the employees that are on both sets.

Employee = {"Employee1","Employee2","Employee3"}
Department = {"dept1","dept2","dept3"}
Department.employees = {{"Employee1","Employee3"},{"Employee2"},{"Employee1","Employee2"}}

{"Employee1", "emp2Last"}
{"Employee3", null}
{"Employee2", "emp2Last"}
{"Employee1", "emp2Last"}
{"Employee2", "emp2Last"}

A cartesian product would result in return all employees from Employee set for each Department.

> Here is my understanding of the semantics of your query including the resource variable. Please note, this does not necessarily mean it needs to be implemented that way :-):

You are the JDOQL expert here ;).

Now the second example, The way I exemplified I was thinking in a kind of outer join from Employee vs Resource. Meaning returning a row if one of both has an element in the set.


{"Employee1", "emp2Last", "Resource2"}
{"Employee3", null, null}
{"Employee2", "emp2Last","Resource2"}
{null,null,"Resource3"}
{null,null,"Resource1"}
{"Employee1", "emp2Last","Resource1"}
{"Employee2", "emp2Last","Resource2"}


The way you did it is different, you take Employee vs Resource and calculates the Cartesian product.

Then the query result would be:
{"Employee1", "emp2Last", "Resource2"}
{"Employee3", null, "Resource2"}
{"Employee2", "emp2Last","Resource2"}
{"Employee2", "emp2Last","Resource3"}
{"Employee2", "emp2Last","Resource1"}
{"Employee1", "emp2Last","Resource1"}
{"Employee1", "emp2Last","Resource2"}
{"Employee2", "emp2Last","Resource1"}
{"Employee2", "emp2Last","Resource2"}


>About the difference between uisng && or ||:
>I think with && only those tripels are retained where both the employee and the resource have a relationship to the department of the triple. Using || the set of retained tripels before the projection is bigger, because it also includes tripels where only one of the two (employee, resource) is related to the department.

You mean an inner join if &&, and outer join ||

> Does this make sense?

I don't know. :(

We have these steps to calculate

1. Set e =Department.employees (Intersect) Employee
2. Set r =Departments.resources (Intersect) Resource
3. Set result = d (Cartesian Product) (e (Cartesian Product) r) (your example)

or

3. Set result = d (Cartesian Product) (e (outer join) r) (my example)

Erik Bengtson added a comment - 02/Feb/06 01:12 AM
small correction (forgive my non scientific syntax):

We have these steps to calculate

1. Set e =Department.employees (Intersect) Employee
2. Set r =Departments.resources (Intersect) Resource
3. Set result = d.instance (Cartesian Product) (e (Cartesian Product) r) (your example)

or

3. Set result = d.instance (Cartesian Product) (e (outer join) r) (my example)

Erik Bengtson added a comment - 05/Feb/06 10:32 PM
I navigate through the jdo expert group mail archives and noticed hot discussions on the topic

See messages around these dates

2004 01 06
2004 01 08
2004 01 10
2004 01 20
2004 06 20
2004 07 21



Craig Russell added a comment - 07/Feb/06 05:32 AM
The relevant parts of the specification are these:

14.6.5 A variable that is not constrained with an explicit contains clause is constrained by the extent of the persistence capable class (including subclasses).
...
The semantics of contains is "exists", where the contains clause is used to filter instances. The meaning of the expression "emps.contains(e) && e.salary < param" is "there exists an e in the emps collection such that e.salary is less than param". This is the natural meaning of contains in the Java language, except where the expression is negated. If the variable is used in the result, then it need not be constrained.
...
A portable query will constrain all variables with a contains clause in each side of an "OR" expression of the filter where the variable is used. Further, each variable must either be used in the query result or its contains clause must be the left expression of an "AND" expression where the variable is used in the right expression. That is, for each occurrence of an expression in the filter using the variable, there is a contains clause "ANDed" with the expression that constrains the possible values by the elements of a collection.

14.6.9 If a variable or a field of a variable is included in the result, either directly or via navigation through the variable, then the semantics of the "contains" clause that include the variable change. In this case, all values of the variable that satisfy the filter are included in the result.
...
If any result is a navigational expression, and a non-terminal field or variable has a null value for a particular set of conditions (the result calculation would throw NullPointerException), then the result is null for that result expression.

Using the sample query data provided in the TCK in companyForQueryTests.xml and applying these to the cases at hand:

SELECT this.name from org.apache.jdo.tck.pc.company.Department WHERE name.matches(".*e.*")

The candidate tuples are {dept1, dept2}. Both satisfy the condition. The projection results in:
{"Development"}, {Human Resources"}

SELECT this.name, e.lastname from org.apache.jdo.tck.pc.company.Department WHERE name.matches(".*e.*") VARIABLES Employee e

This query is not portable because the variable employee is not constrained. This is not so useful because the relationship between department and employee is not constrained. Therefore the extent of department and the extent of employee are used.The candidate tuples are the cartesian product of {this, e}. There are two departments and five employees, so the cartesian product contains 10 tuples. The projection then is {{"Development", "emp1Last"}, {"Development", "emp2Last"}, {"Development", "emp3Last"}, {"Development", "emp4Last"}, {"Development", "emp5Last"}, {"Human Resouces", "emp1Last"}, {"Human Resouces", "emp2Last"}, {"Human Resouces", "emp3Last"}, {"Human Resouces", "emp4Last"}, {"Human Resouces", "emp5Last"}}.

SELECT this.name, e.lastname from org.apache.jdo.tck.pc.company.Department WHERE name.matches(".*e.*") && this.employees.contains(Employee e)

The candidate tuples are the cartesian product of {this, e}. There are two departments and five employees, so the cartesian product contains 10 tuples. Of these 10, only 5 satisfy the filter because of the emps.contains clause. These are {{dept1, emp1}, {dept1, emp2}, {dept1, emp3}, {dept2, emp4}, {dept2, emp5}}. The projection then is {{"Development", "emp1Last"}, {"Development", "emp2Last"}, {"Development", "emp3Last"}, {"Human Resouces", "emp4Last"}, {"Human Resouces", "emp5Last"}}

SELECT employee.id, employee.manager.lastname FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) VARIABLES Employee employee

The candidate tuples are this.employee. These are {{emp1}, {emp2}, {emp3}, {emp4}, {emp5}}. The projection then is {{1", "emp2Last"}, {2, "emp2Last"}, {3, "emp2Last"}, {4, null}, {5, null}}

SELECT employee.manager.lastname FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(employee) VARIABLES Employee employee

The candidate tuples are this.employee. These are {{emp1}, {emp2}, {emp3}, {emp4}, {emp5}}. The projection then is {{"emp2Last"}, {"emp2Last"}, {"emp2Last"}, {null}, {null}}

SELECT employee.lastname, project.name FROM org.apache.jdo.tck.pc.company.Department VARIABLES Employee employee; Project project

This query is not portable because the variable employee is not constrained. The candidate tuples are the cartesian product of department, employee, and project. Two departments, five employees, and three projects result in 30 tuples. The result will contain 30 projections with each combination of employee.lastname and project.name repeated twice (one for each department).

SELECT employee.lastname, project.name FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(Employee employee)VARIABLES Employee employee; Project project;

This query is not portable because the variable project is not constrained. The candidate tuples are the cartesian product of department, employee, and project. Two departments, five employees, and three projects result in 30 tuples. The filter reduces the results to 15 (the cartesian product of the {department, employee-who-works-in-the-department} and {project}. The result will contain 15 projections with each combination of employee.lastname and project.name.

SELECT employee.lastname, project.name FROM org.apache.jdo.tck.pc.company.Department WHERE employees.contains(Employee employee) && employees.projects.contains(project) VARIABLES Employee employee; Project project

This query is portable because all variables are constrained. The candidate tuples are the cartesian product of department, employee, and project. Two departments, five employees, and three projects result in 30 tuples. The filter reduces the result tuples to 7 {{dept1, emp1, proj3}, {dept1, emp2, proj1}, {dept1, emp2, proj2}, {dept1, emp3, proj1}, {dept1, emp3, proj2}, {dept2, emp4, proj3}, {dept2, emp5, proj3}, }. The result will contain 7 projections {{"emp1Last", "green"}, {"emp2Last", "orange"}, {"emp2Last", "blue"}, {"emp3Last", "orange"}, {"emp3Last", "blue"}, {"emp4Last", "green"}, {"emp5Last", "green"}}

Erik Bengtson added a comment - 09/Feb/06 05:52 AM
Thanks for the explanation.

Craig Russell added a comment - 12/Feb/06 05:00 AM
Please review this patch. It adds two new test queries to the "variables in result" test case. One of the queries projects the variable p and the other projects p.id and p.name. The two extra test queries specify "distinct" for the results.

Michael Bouschen added a comment - 12/Feb/06 05:58 AM
The patch VariableInResult.patch looks good!

Craig Russell added a comment - 12/Feb/06 06:02 AM
In case you don't want to bother applying the patch, here are the results with the JPOX jar file as of this morning PST:

doRuntck.jdori:
Overriding previous definition of reference to this.jdori.classpath
    [java] RUN VariableInResult.testDistinctNoNavigation
    [java] RUN VariableInResult.testNoNavigation FAILURE
    [java] RUN VariableInResult.testDistinctNavigation
    [java] RUN VariableInResult.testNavigation FAILURE
    [java] Description: Special Test
    [java] Time: 032
    [java] There were 2 failures:
    [java] 1) testNoNavigation(org.apache.jdo.tck.query.result.VariableInResult)junit.framework.AssertionFailedError: Assertion A14.6.9-3 (VariableInResult) failed:
    [java] Wrong query result:
    [java] expected: [Project(1, name orange, budget 2500000.99), Project(1, name orange, budget 2500000.99), Project(1, name orange, budget 2500000.99)]
    [java] got: [Project(1, name orange, budget 2500000.99)]
    [java] at org.apache.jdo.tck.JDO_Test.fail(JDO_Test.java:546)
    [java] at org.apache.jdo.tck.query.QueryTest.queryFailed(QueryTest.java:500)
    [java] at org.apache.jdo.tck.query.QueryTest.checkQueryResultWithoutOrder(QueryTest.java:485)
    [java] at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1191)
    [java] at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1031)
    [java] at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:968)
    [java] at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:948)
    [java] at org.apache.jdo.tck.query.result.VariableInResult.testNoNavigation(VariableInResult.java:147)
    [java] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [java] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [java] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [java] at org.apache.jdo.tck.JDO_Test.runBare(JDO_Test.java:204)
    [java] at org.apache.jdo.tck.util.BatchTestRunner.start(BatchTestRunner.java:120)
    [java] at org.apache.jdo.tck.util.BatchTestRunner.main(BatchTestRunner.java:95)
    [java] 2) testNavigation(org.apache.jdo.tck.query.result.VariableInResult)junit.framework.AssertionFailedError: Assertion A14.6.9-3 (VariableInResult) failed:
    [java] Wrong query result:
    [java] expected: [[1, orange], [1, orange], [1, orange]]
    [java] got: [[1, orange]]
    [java] at org.apache.jdo.tck.JDO_Test.fail(JDO_Test.java:546)
    [java] at org.apache.jdo.tck.query.QueryTest.queryFailed(QueryTest.java:500)
    [java] at org.apache.jdo.tck.query.QueryTest.checkQueryResultWithoutOrder(QueryTest.java:485)
    [java] at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1191)
    [java] at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1031)
    [java] at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:968)
    [java] at org.apache.jdo.tck.query.QueryTest.executeAPIQuery(QueryTest.java:948)
    [java] at org.apache.jdo.tck.query.result.VariableInResult.testNavigation(VariableInResult.java:165)
    [java] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [java] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [java] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [java] at org.apache.jdo.tck.JDO_Test.runBare(JDO_Test.java:204)
    [java] at org.apache.jdo.tck.util.BatchTestRunner.start(BatchTestRunner.java:120)
    [java] at org.apache.jdo.tck.util.BatchTestRunner.main(BatchTestRunner.java:95)

Erik Bengtson added a comment - 15/Mar/06 08:48 AM
should be fixed in JPOX CVS

Craig Russell added a comment - 21/Mar/06 07:36 AM
This issue is fixed in the latest JPOX SNAPSHOT.