Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-3261

Problems with BundleDbPersistenceManager getAllNodeIds

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4.1, 2.5
    • Component/s: None
    • Labels:
      None

      Description

      When using MySQL:
      The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

      First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

      public void testMySQLOrderByNodeId() throws Exception {
      NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
      NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

      PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

      Object[] params = new Object[]

      { nodeId1.getRawBytes(), nodeId2.getRawBytes() }

      ;
      stmt.setObject(1, params[0]);
      stmt.setObject(2, params[1]);

      ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
      ResultSet resultSet = stmt.executeQuery();
      while(resultSet.next())

      { NodeId nodeId = new NodeId(resultSet.getBytes(1)); System.out.println(nodeId); nodeIds.add(nodeId); }

      Collections.sort(nodeIds);
      for (NodeId nodeId : nodeIds)

      { System.out.println(nodeId); }

      }

      Which results in the following output:

      7ff9e87c-f87f-4d35-9d61-2e298e56ac37
      9fd0d452-b5d0-426b-8a0f-bef830ba0495
      9fd0d452-b5d0-426b-8a0f-bef830ba0495
      7ff9e87c-f87f-4d35-9d61-2e298e56ac37

      Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

      I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.

        Activity

        Unico Hommes created issue -
        Hide
        Unico Hommes added a comment -

        Patch against 2.4.x branch.

        The special handling of maxcount is only necessary in the case of longlong storage model. In the case of a binary storage model I can easily understand that ordering is not well defined and can vary from database to database. Therefore only do the special handling for longlong storage model.

        Show
        Unico Hommes added a comment - Patch against 2.4.x branch. The special handling of maxcount is only necessary in the case of longlong storage model. In the case of a binary storage model I can easily understand that ordering is not well defined and can vary from database to database. Therefore only do the special handling for longlong storage model.
        Unico Hommes made changes -
        Field Original Value New Value
        Attachment bdbpm_allids.patch [ 12518675 ]
        Hide
        Unico Hommes added a comment -

        Updated title and description because another problem with the same method was found for Derby DB. Attaching an updated patch.

        Show
        Unico Hommes added a comment - Updated title and description because another problem with the same method was found for Derby DB. Attaching an updated patch.
        Unico Hommes made changes -
        Summary BundleDbPersistenceManager getAllIds gives wrong amount of results for MySQL Problems with BundleDbPersistenceManager getAllNodeIds
        Description The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

        First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

            public void testMySQLOrderByNodeId() throws Exception {
                NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
                NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

                PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

                Object[] params = new Object[] { nodeId1.getRawBytes(), nodeId2.getRawBytes() };
                stmt.setObject(1, params[0]);
                stmt.setObject(2, params[1]);

                ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
                ResultSet resultSet = stmt.executeQuery();
                while(resultSet.next()) {
                    NodeId nodeId = new NodeId(resultSet.getBytes(1));
                    System.out.println(nodeId);
                    nodeIds.add(nodeId);
                }
                Collections.sort(nodeIds);
                for (NodeId nodeId : nodeIds) {
                    System.out.println(nodeId);
                }
            }

        Which results in the following output:

        7ff9e87c-f87f-4d35-9d61-2e298e56ac37
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        7ff9e87c-f87f-4d35-9d61-2e298e56ac37


        Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

        I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.

        I'll attach a patch that fixes the problem.
        When using MySQL:
        The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

        First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

            public void testMySQLOrderByNodeId() throws Exception {
                NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
                NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

                PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

                Object[] params = new Object[] { nodeId1.getRawBytes(), nodeId2.getRawBytes() };
                stmt.setObject(1, params[0]);
                stmt.setObject(2, params[1]);

                ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
                ResultSet resultSet = stmt.executeQuery();
                while(resultSet.next()) {
                    NodeId nodeId = new NodeId(resultSet.getBytes(1));
                    System.out.println(nodeId);
                    nodeIds.add(nodeId);
                }
                Collections.sort(nodeIds);
                for (NodeId nodeId : nodeIds) {
                    System.out.println(nodeId);
                }
            }

        Which results in the following output:

        7ff9e87c-f87f-4d35-9d61-2e298e56ac37
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        7ff9e87c-f87f-4d35-9d61-2e298e56ac37


        Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

        I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.

        With Derby DB there is an infinite loop when it is used as it is for instance by ConsistencyCheckerImpl. This is because of the bundleSelectAllIdsFromSQL sql query that is done in the case of SM_LONGLONG_KEYS which is used in the case of Derby. Compare for instance that statement with the corresponding statement for SM_BINARY_KEYS. You will see that the former uses >= while the latter only >. The latter is correct. We want only records that are bigger than the passed in parameter, not bigger or equal.
        Unico Hommes made changes -
        Attachment bdpm_allids2.patch [ 12518805 ]
        Unico Hommes made changes -
        Attachment bdpm_allids2.patch [ 12518805 ]
        Hide
        Unico Hommes added a comment -

        Problem was at my end apparently. No problem with Derby DB after all.

        Show
        Unico Hommes added a comment - Problem was at my end apparently. No problem with Derby DB after all.
        Unico Hommes made changes -
        Description When using MySQL:
        The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

        First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

            public void testMySQLOrderByNodeId() throws Exception {
                NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
                NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

                PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

                Object[] params = new Object[] { nodeId1.getRawBytes(), nodeId2.getRawBytes() };
                stmt.setObject(1, params[0]);
                stmt.setObject(2, params[1]);

                ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
                ResultSet resultSet = stmt.executeQuery();
                while(resultSet.next()) {
                    NodeId nodeId = new NodeId(resultSet.getBytes(1));
                    System.out.println(nodeId);
                    nodeIds.add(nodeId);
                }
                Collections.sort(nodeIds);
                for (NodeId nodeId : nodeIds) {
                    System.out.println(nodeId);
                }
            }

        Which results in the following output:

        7ff9e87c-f87f-4d35-9d61-2e298e56ac37
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        7ff9e87c-f87f-4d35-9d61-2e298e56ac37


        Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

        I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.

        With Derby DB there is an infinite loop when it is used as it is for instance by ConsistencyCheckerImpl. This is because of the bundleSelectAllIdsFromSQL sql query that is done in the case of SM_LONGLONG_KEYS which is used in the case of Derby. Compare for instance that statement with the corresponding statement for SM_BINARY_KEYS. You will see that the former uses >= while the latter only >. The latter is correct. We want only records that are bigger than the passed in parameter, not bigger or equal.
        When using MySQL:
        The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

        First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

            public void testMySQLOrderByNodeId() throws Exception {
                NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
                NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

                PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

                Object[] params = new Object[] { nodeId1.getRawBytes(), nodeId2.getRawBytes() };
                stmt.setObject(1, params[0]);
                stmt.setObject(2, params[1]);

                ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
                ResultSet resultSet = stmt.executeQuery();
                while(resultSet.next()) {
                    NodeId nodeId = new NodeId(resultSet.getBytes(1));
                    System.out.println(nodeId);
                    nodeIds.add(nodeId);
                }
                Collections.sort(nodeIds);
                for (NodeId nodeId : nodeIds) {
                    System.out.println(nodeId);
                }
            }

        Which results in the following output:

        7ff9e87c-f87f-4d35-9d61-2e298e56ac37
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        9fd0d452-b5d0-426b-8a0f-bef830ba0495
        7ff9e87c-f87f-4d35-9d61-2e298e56ac37


        Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

        I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.
        Hide
        Thomas Mueller added a comment -

        What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.

        Show
        Thomas Mueller added a comment - What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.
        Hide
        Julian Reschke added a comment -

        > What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.

        It's needed by the datastore garbage collector and the consistency checker.

        Show
        Julian Reschke added a comment - > What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems. It's needed by the datastore garbage collector and the consistency checker.
        Show
        Unico Hommes added a comment - I was using the Jackrabbit consistency checker. See http://svn.apache.org/repos/asf/jackrabbit/tags/2.4.0/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/persistence/bundle/ConsistencyCheckerImpl.java
        Hide
        Thomas Mueller added a comment -

        OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work.

        Not sure how to solve it, possibly use "order by desc"? I guess more tests would be required (it seems we use varbinary(16) in MySQL, which might deal with trailing zeroes differently and binary(16))

        Show
        Thomas Mueller added a comment - OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work. Not sure how to solve it, possibly use "order by desc"? I guess more tests would be required (it seems we use varbinary(16) in MySQL, which might deal with trailing zeroes differently and binary(16))
        Hide
        Unico Hommes added a comment -

        I think the supplied patch is probably the right solution. Order by descending won't work because it's not exactly the other way around that MySQL orders the nodes.

        Show
        Unico Hommes added a comment - I think the supplied patch is probably the right solution. Order by descending won't work because it's not exactly the other way around that MySQL orders the nodes.
        Hide
        Bart van der Schans added a comment -

        The problem seems to be that the following block should only be applied for LONGLONG keys and not for BINARY keys.

        if (lowId != null) {
        // skip the keys that are smaller or equal (see above, maxCount += 10)
        // only required for SM_LONGLONG_KEYS
        if (current.compareTo(lowId) <= 0)

        { continue; }

        }

        Show
        Bart van der Schans added a comment - The problem seems to be that the following block should only be applied for LONGLONG keys and not for BINARY keys. if (lowId != null) { // skip the keys that are smaller or equal (see above, maxCount += 10) // only required for SM_LONGLONG_KEYS if (current.compareTo(lowId) <= 0) { continue; } }
        Hide
        Bart van der Schans added a comment -

        Patch committed in r1302401 in slightly adjusted form to trunk.

        Show
        Bart van der Schans added a comment - Patch committed in r1302401 in slightly adjusted form to trunk.
        Bart van der Schans made changes -
        Assignee Bart van der Schans [ schans ]
        Bart van der Schans made changes -
        Fix Version/s 2.6 [ 12319480 ]
        Hide
        Julian Reschke added a comment -

        > OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work.

        It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).

        Show
        Julian Reschke added a comment - > OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work. It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).
        Hide
        Thomas Mueller added a comment -

        > It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).

        Yes, absolutely true.

        Show
        Thomas Mueller added a comment - > It would also defeat part of the purpose of the consistency checker (finding orphaned nodes). Yes, absolutely true.
        Hide
        Bart van der Schans added a comment -

        Merged in 2.4 in r1302430.

        Show
        Bart van der Schans added a comment - Merged in 2.4 in r1302430.
        Bart van der Schans made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Alex Parvulescu made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Jukka Zitting made changes -
        Fix Version/s 2.5 [ 12319280 ]
        Fix Version/s 2.6 [ 12319480 ]

          People

          • Assignee:
            Bart van der Schans
            Reporter:
            Unico Hommes
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development