Cayenne
  1. Cayenne
  2. CAY-1681

Third prefetch kind - DISJOINT_BY_ID

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1B1
    • Component/s: Core Library
    • Labels:
      None

      Description

      (here is a mailing list thread discussing the issue: http://markmail.org/message/zzyd26ucfwhnacfe )

      I keep encountering a common scenario where neither JOINT or DISJOINT prefetch strategies are adequate - queries with fetch limit. It is very common in the application to display X most recent entries from a table with millions of rows, and then drill down to the object details. E.g. assume 2 entities - "Order" and "LineItem", with orders having multiple line items. We want 10 most recent orders, with line items prefetched, so you'd so something like this:

      SelectQuery q = new SelectQuery(Order.class);
      q.addPrefetch("lineItems");
      q.setFetchLimit(10);

      "Disjoint" prefetch in this situation would fetch 10 orders and ALL LineItems in DB.

      "Joint" prefetch will fetch anywhere between 1 and 10 orders, depending on how many line items the first 10 orders have, i.e. fetch limit is applied to to-many join, not to the query root. And this is certainly not what we want.

      Now Cayenne already has something that can solve the problem:

      q.setPageSize(10); // same as fetch limit

      Paginated query is the most optimal way to prefetch here. Whenever a result list is accessed, Cayenne would execute 2 IN () queries - one for the Orders, another one - for the LineItems. Both queries are matching on a set of Order PKs and are pretty efficient, and only return the objects that we care about.

      The problem with this solution is that it is counterintuitive to the user (why should I set "pageSize" to make my prefetches work) and adds one extra query (the IN query resolving the root object list). Would be cool to turn it into a separate type of prefetch. Something like "disjoint by id"?

      1. CAY-1681-v2.patch
        52 kB
        Andrei Veprev
      2. CAY-1681-v3.patch
        13 kB
        Andrei Veprev
      3. CAY-1681-v4.patch
        10 kB
        Andrei Veprev
      4. CAY-1681-v5.patch
        12 kB
        Andrei Veprev
      5. CAY-1681-varchar-length.patch
        0.7 kB
        Andrei Veprev

        Activity

        Hide
        Andrei Veprev added a comment - - edited

        Attaching a patch with DISJOIN_BY_ID prefetch implementation.
        After some discussion with Andrus, the decision was to do all prefetch queries for that type of prefetch on the object conversion stage of query execution (conversion from data rows to objects). So here it is, most of the staff are done inside HierarchicalObjectResolver.

        Flatten relationships are not implemented yet, as well as support for phantoms in a middle of prefetch tree hierarchy (marked as TODO in the patch).

        Show
        Andrei Veprev added a comment - - edited Attaching a patch with DISJOIN_BY_ID prefetch implementation. After some discussion with Andrus, the decision was to do all prefetch queries for that type of prefetch on the object conversion stage of query execution (conversion from data rows to objects). So here it is, most of the staff are done inside HierarchicalObjectResolver. Flatten relationships are not implemented yet, as well as support for phantoms in a middle of prefetch tree hierarchy (marked as TODO in the patch).
        Hide
        Andrus Adamchik added a comment - - edited

        Andrei, thanks. Good start. I am attaching my version of your patch with some minor styling changes (explicit imports without *, deprecation comments containing "@deprecated since 3.1).. The logic is unchanged.

        A few notes on the patch contents:

        1. We need to cover a few more cases via unit tests. Here are a few that come to mind: prefetching on to-one relationships; prefetching on multi-column relationships (I don't think this one will work, see below)

        2. The existing unit tests should attempt to access prefetched object properties inside 'queryInterceptor.runWithQueriesBlocked'. This is the ultimate test that the objects are in a fully resolved state and will not trigger a query when read by the app code

        3. HierarchicalObjectResolver allJoinsQualifier.andExp(joinQualifier) is a noop, as 'andExp' returns a new expression instance instead of appending to the existing one. I know this is inconsistent with SelectQuery.andExp and is one of the Cayenne gotchas. I guess we can start by writing a unit test that shows this as a problem.

        4. Check this code:
        + Object targetValue = ((DataRow) dataRow).get(join.getTargetName());
        + Expression joinQualifier = ExpressionFactory.matchDbExp(join.getSourceName(), targetValue);

        I think this is backwards. The join source and target names are used incorrectly. The root of prefetch query is "target", not "source" of the join. This only works with your tests because PK and FK have the same name (ARTIST_ID). Yeat another reason to expand the test coverage.

        Show
        Andrus Adamchik added a comment - - edited Andrei, thanks. Good start. I am attaching my version of your patch with some minor styling changes (explicit imports without *, deprecation comments containing "@deprecated since 3.1).. The logic is unchanged. A few notes on the patch contents: 1. We need to cover a few more cases via unit tests. Here are a few that come to mind: prefetching on to-one relationships; prefetching on multi-column relationships (I don't think this one will work, see below) 2. The existing unit tests should attempt to access prefetched object properties inside 'queryInterceptor.runWithQueriesBlocked'. This is the ultimate test that the objects are in a fully resolved state and will not trigger a query when read by the app code 3. HierarchicalObjectResolver allJoinsQualifier.andExp(joinQualifier) is a noop, as 'andExp' returns a new expression instance instead of appending to the existing one. I know this is inconsistent with SelectQuery.andExp and is one of the Cayenne gotchas. I guess we can start by writing a unit test that shows this as a problem. 4. Check this code: + Object targetValue = ((DataRow) dataRow).get(join.getTargetName()); + Expression joinQualifier = ExpressionFactory.matchDbExp(join.getSourceName(), targetValue); I think this is backwards. The join source and target names are used incorrectly. The root of prefetch query is "target", not "source" of the join. This only works with your tests because PK and FK have the same name (ARTIST_ID). Yeat another reason to expand the test coverage.
        Hide
        Andrei Veprev added a comment - - edited

        Ok. New patch. Now with flattened relationships implementation. Seems it works, on basic tests at least.
        Also fixed bugs and added more tests (hope no more bugs were added ). Tests were moved to separate class (test case).

        Looking forward to your review.

        Show
        Andrei Veprev added a comment - - edited Ok. New patch. Now with flattened relationships implementation. Seems it works, on basic tests at least. Also fixed bugs and added more tests (hope no more bugs were added ). Tests were moved to separate class (test case). Looking forward to your review.
        Hide
        Andrus Adamchik added a comment -

        Andrei, I applied your March 28 patch. It works well. I added a few more tests to make sure we cover all common cases. Also did some performance tweaking (see r1308081) and fixed a few problems that are not visible from the unit tests but were inferred from the code (stack overflow per r1308082, and missing dot in the path per r1308081). I think phantom prefetches is probably the last case we need to handle here. Also maybe we can create unit tests for the stack overflow and missing dot problems.

        Show
        Andrus Adamchik added a comment - Andrei, I applied your March 28 patch. It works well. I added a few more tests to make sure we cover all common cases. Also did some performance tweaking (see r1308081) and fixed a few problems that are not visible from the unit tests but were inferred from the code (stack overflow per r1308082, and missing dot in the path per r1308081). I think phantom prefetches is probably the last case we need to handle here. Also maybe we can create unit tests for the stack overflow and missing dot problems.
        Hide
        Andrus Adamchik added a comment -

        Another interesting thing that I noticed. From 'testManyToOne':

        SELECT t0.ID, t0.NAME FROM BAG t0 WHERE (t0.ID = ?) OR (t0.ID = ?) [bind: 1->ID:1, 2->ID:1]

        As you see there are two conditions checking the same thing. Wonder if it is worthwhile to collapse repeating conditions in the where clause (i.e. whether overhead we add in Java would pay off by creating a more efficient DB query).. Just something to ponder. Not a must-have certainly.

        Show
        Andrus Adamchik added a comment - Another interesting thing that I noticed. From 'testManyToOne': SELECT t0.ID, t0.NAME FROM BAG t0 WHERE (t0.ID = ?) OR (t0.ID = ?) [bind: 1->ID:1, 2->ID:1] As you see there are two conditions checking the same thing. Wonder if it is worthwhile to collapse repeating conditions in the where clause (i.e. whether overhead we add in Java would pay off by creating a more efficient DB query).. Just something to ponder. Not a must-have certainly.
        Hide
        Andrus Adamchik added a comment -

        And one more thing we will probably have to implement - breaking down OR query if it gets too long. This is a real problem which has been repeatedly mentioned in the context of the paginated queries, and in fact solved in IncrementalFaultList. see IncrementalFaultList.resolveInterval - it checks the number of clauses in the qualifier against 'maxFetchSize'. We may need to make "maxFetchSize" a container property used by IncrementalFaultList as well as our prefetch strategy, and take it into account in the later.

        Show
        Andrus Adamchik added a comment - And one more thing we will probably have to implement - breaking down OR query if it gets too long. This is a real problem which has been repeatedly mentioned in the context of the paginated queries, and in fact solved in IncrementalFaultList. see IncrementalFaultList.resolveInterval - it checks the number of clauses in the qualifier against 'maxFetchSize'. We may need to make "maxFetchSize" a container property used by IncrementalFaultList as well as our prefetch strategy, and take it into account in the later.
        Hide
        Andrei Veprev added a comment -

        A little patch that fixes test scheme generation on mysql.

        Show
        Andrei Veprev added a comment - A little patch that fixes test scheme generation on mysql.
        Hide
        Andrei Veprev added a comment -

        Another update.

        • fixed NPE when parent node is joint
        • test for long flattened relationships (testing inclusion of dot in join path)
        • support for joint prefetches in children - still more work is needed here
        Show
        Andrei Veprev added a comment - Another update. fixed NPE when parent node is joint test for long flattened relationships (testing inclusion of dot in join path) support for joint prefetches in children - still more work is needed here
        Hide
        Andrei Veprev added a comment -

        Found out a bug - CAY-1695 (is it?). Seems everything else is working in my last patch. Need some feedback now.

        Show
        Andrei Veprev added a comment - Found out a bug - CAY-1695 (is it?). Seems everything else is working in my last patch. Need some feedback now.
        Hide
        Andrus Adamchik added a comment -

        I applied the latest set of patches. Looks great, and we are almost there I think. Re:

        + // TODO: need to pass the remaining tree to make joint prefetches work
        + // but not sure is it a good idea to do it in that way
        + query.setPrefetchTree(node);

        We need to test how this affects downstream disjoint prefetches (make sure they are not executed twice - once from root query and once from "by id" query). Also probably a good idea to clone the prefetch node subtree to avoid conflicts here... And while we are at it we can filter only the prefetches we care about into the clone

        Show
        Andrus Adamchik added a comment - I applied the latest set of patches. Looks great, and we are almost there I think. Re: + // TODO: need to pass the remaining tree to make joint prefetches work + // but not sure is it a good idea to do it in that way + query.setPrefetchTree(node); We need to test how this affects downstream disjoint prefetches (make sure they are not executed twice - once from root query and once from "by id" query). Also probably a good idea to clone the prefetch node subtree to avoid conflicts here... And while we are at it we can filter only the prefetches we care about into the clone
        Hide
        Andrei Veprev added a comment -

        Yet another patch. Includes:

        • passing cloned subtree with joint children instead of full tree
        • testing prefetched objects state
        Show
        Andrei Veprev added a comment - Yet another patch. Includes: passing cloned subtree with joint children instead of full tree testing prefetched objects state
        Hide
        Andrus Adamchik added a comment -

        Below are my comments on the parts of the submitted patch:

        diff --git a/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java b/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java
        index c81a508..ee731c7 100644
        — a/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java
        +++ b/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java
        @@ -33,7 +33,7 @@ import org.apache.cayenne.util.XMLSerializable;

        /**

        • Defines a node in a prefetch tree.
        • *
          + *
        • @since 1.2
          */
          public class PrefetchTreeNode implements Serializable, XMLSerializable {
          @@ -164,6 +164,31 @@ public class PrefetchTreeNode implements Serializable, XMLSerializable {
          }

        /**
        + * Returns a clone of subtree that includes all joint children
        + * starting from this node itself and till the first occurrence of non-joint node
        + *
        + * @since 3.1
        + */
        + public PrefetchTreeNode cloneJointSubtree()

        { // [andrus] this results in the cloned root obtaining unneeded semantics // also it results in a non-null root prefetch even if there are no joint children + return cloneJointSubtree(null); + }

        +
        + private PrefetchTreeNode cloneJointSubtree(PrefetchTreeNode parent) {
        + PrefetchTreeNode cloned = new PrefetchTreeNode(parent, getName());
        + cloned.setSemantics(getSemantics());
        + cloned.setPhantom(isPhantom());

        // [andrus] AdjacentJoinsOperation is already recursive AFAIK, so here we have double recursion,
        // a simple loop over parent's child nodes would probably suffice

        +
        + Collection<PrefetchTreeNode> jointChildren = new ArrayList<PrefetchTreeNode>();
        + traverse(new AdjacentJoinsOperation(jointChildren));
        +
        + for (PrefetchTreeNode jointChild : jointChildren)

        { + cloned.addChild(jointChild.cloneJointSubtree(cloned)); + }

        +
        + return cloned;
        + }
        +
        + /**
        diff --git a/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java b/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java
        index d5e8ed4..8a230ef 100644
        — a/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java
        +++ b/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java
        @@ -227,10 +228,14 @@ public class DataContextDisjointByIdPrefetchTest extends ServerCase {
        public void execute() {
        assertFalse(result.isEmpty());
        Bag b1 = result.get(0);

        • List<?> toMany = (List<?>) b1.readPropertyDirectly(Bag.BALLS_PROPERTY);
        • assertNotNull(toMany);
        • assertFalse(((ValueHolder) toMany).isFault());
        • assertEquals(6, toMany.size());
          + List<Ball> balls = (List<Ball>) b1.readPropertyDirectly(Bag.BALLS_PROPERTY);
          + assertNotNull(balls);
          + assertFalse(((ValueHolder) balls).isFault());
          + assertEquals(6, balls.size());
          +
          + for (Ball b : balls) { + assertEquals(PersistenceState.COMMITTED, b.getPersistenceState()); + }

        // [andrus] in addition to checking for PersistenceState, would be great to check the value of one of the properties, just to make sure
        // that prefetched objects are completely valid.

        Show
        Andrus Adamchik added a comment - Below are my comments on the parts of the submitted patch: diff --git a/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java b/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java index c81a508..ee731c7 100644 — a/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java +++ b/framework/cayenne-jdk1.5-unpublished/src/main/java/org/apache/cayenne/query/PrefetchTreeNode.java @@ -33,7 +33,7 @@ import org.apache.cayenne.util.XMLSerializable; /** Defines a node in a prefetch tree. * + * @since 1.2 */ public class PrefetchTreeNode implements Serializable, XMLSerializable { @@ -164,6 +164,31 @@ public class PrefetchTreeNode implements Serializable, XMLSerializable { } /** + * Returns a clone of subtree that includes all joint children + * starting from this node itself and till the first occurrence of non-joint node + * + * @since 3.1 + */ + public PrefetchTreeNode cloneJointSubtree() { // [andrus] this results in the cloned root obtaining unneeded semantics // also it results in a non-null root prefetch even if there are no joint children + return cloneJointSubtree(null); + } + + private PrefetchTreeNode cloneJointSubtree(PrefetchTreeNode parent) { + PrefetchTreeNode cloned = new PrefetchTreeNode(parent, getName()); + cloned.setSemantics(getSemantics()); + cloned.setPhantom(isPhantom()); // [andrus] AdjacentJoinsOperation is already recursive AFAIK, so here we have double recursion, // a simple loop over parent's child nodes would probably suffice + + Collection<PrefetchTreeNode> jointChildren = new ArrayList<PrefetchTreeNode>(); + traverse(new AdjacentJoinsOperation(jointChildren)); + + for (PrefetchTreeNode jointChild : jointChildren) { + cloned.addChild(jointChild.cloneJointSubtree(cloned)); + } + + return cloned; + } + + /** diff --git a/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java b/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java index d5e8ed4..8a230ef 100644 — a/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java +++ b/framework/cayenne-jdk1.5-unpublished/src/test/java/org/apache/cayenne/access/DataContextDisjointByIdPrefetchTest.java @@ -227,10 +228,14 @@ public class DataContextDisjointByIdPrefetchTest extends ServerCase { public void execute() { assertFalse(result.isEmpty()); Bag b1 = result.get(0); List<?> toMany = (List<?>) b1.readPropertyDirectly(Bag.BALLS_PROPERTY); assertNotNull(toMany); assertFalse(((ValueHolder) toMany).isFault()); assertEquals(6, toMany.size()); + List<Ball> balls = (List<Ball>) b1.readPropertyDirectly(Bag.BALLS_PROPERTY); + assertNotNull(balls); + assertFalse(((ValueHolder) balls).isFault()); + assertEquals(6, balls.size()); + + for (Ball b : balls) { + assertEquals(PersistenceState.COMMITTED, b.getPersistenceState()); + } // [andrus] in addition to checking for PersistenceState, would be great to check the value of one of the properties, just to make sure // that prefetched objects are completely valid.
        Hide
        Andrei Veprev added a comment -

        Took in account all your comments in that new patch.

        Show
        Andrei Veprev added a comment - Took in account all your comments in that new patch.
        Hide
        Andrus Adamchik added a comment -

        Thanks! The last patch finishes it IMO

        Show
        Andrus Adamchik added a comment - Thanks! The last patch finishes it IMO

          People

          • Assignee:
            Andrus Adamchik
            Reporter:
            Andrus Adamchik
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development