Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-3261

Problems with BundleDbPersistenceManager getAllNodeIds

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4.1, 2.5
    • Component/s: None
    • Labels:
      None

      Description

      When using MySQL:
      The problem arises when the method parameter maxcount is less than the total amount of records in the bundle table.

      First of all I found out that mysql orders the nodeid objects different than jackrabbit does. The following test describes this idea:

      public void testMySQLOrderByNodeId() throws Exception {
      NodeId nodeId1 = new NodeId("7ff9e87c-f87f-4d35-9d61-2e298e56ac37");
      NodeId nodeId2 = new NodeId("9fd0d452-b5d0-426b-8a0f-bef830ba0495");

      PreparedStatement stmt = connection.prepareStatement("SELECT NODE_ID FROM DEFAULT_BUNDLE WHERE NODE_ID = ? OR NODE_ID = ? ORDER BY NODE_ID");

      Object[] params = new Object[]

      { nodeId1.getRawBytes(), nodeId2.getRawBytes() }

      ;
      stmt.setObject(1, params[0]);
      stmt.setObject(2, params[1]);

      ArrayList<NodeId> nodeIds = new ArrayList<NodeId>();
      ResultSet resultSet = stmt.executeQuery();
      while(resultSet.next())

      { NodeId nodeId = new NodeId(resultSet.getBytes(1)); System.out.println(nodeId); nodeIds.add(nodeId); }

      Collections.sort(nodeIds);
      for (NodeId nodeId : nodeIds)

      { System.out.println(nodeId); }

      }

      Which results in the following output:

      7ff9e87c-f87f-4d35-9d61-2e298e56ac37
      9fd0d452-b5d0-426b-8a0f-bef830ba0495
      9fd0d452-b5d0-426b-8a0f-bef830ba0495
      7ff9e87c-f87f-4d35-9d61-2e298e56ac37

      Now the problem with the getAllNodeIds method is that it fetches an extra 10 records on top of maxcount (to avoid a problem where the first key is not the one you that is wanted). Afterwards it skips a number of records again, this time using nodeid.compareto. This compareto statement returns true unexpectedly for mysql because the code doesn't expect the mysql ordering.

      I had the situation where I had about 17000 records in the bundle table but consecutively getting the ids a thousand records at a time returned only about 8000 records in all.

        Activity

        Hide
        Bart van der Schans added a comment -

        Merged in 2.4 in r1302430.

        Show
        Bart van der Schans added a comment - Merged in 2.4 in r1302430.
        Hide
        Thomas Mueller added a comment -

        > It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).

        Yes, absolutely true.

        Show
        Thomas Mueller added a comment - > It would also defeat part of the purpose of the consistency checker (finding orphaned nodes). Yes, absolutely true.
        Hide
        Julian Reschke added a comment -

        > OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work.

        It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).

        Show
        Julian Reschke added a comment - > OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work. It would also defeat part of the purpose of the consistency checker (finding orphaned nodes).
        Hide
        Bart van der Schans added a comment -

        Patch committed in r1302401 in slightly adjusted form to trunk.

        Show
        Bart van der Schans added a comment - Patch committed in r1302401 in slightly adjusted form to trunk.
        Hide
        Bart van der Schans added a comment -

        The problem seems to be that the following block should only be applied for LONGLONG keys and not for BINARY keys.

        if (lowId != null) {
        // skip the keys that are smaller or equal (see above, maxCount += 10)
        // only required for SM_LONGLONG_KEYS
        if (current.compareTo(lowId) <= 0)

        { continue; }

        }

        Show
        Bart van der Schans added a comment - The problem seems to be that the following block should only be applied for LONGLONG keys and not for BINARY keys. if (lowId != null) { // skip the keys that are smaller or equal (see above, maxCount += 10) // only required for SM_LONGLONG_KEYS if (current.compareTo(lowId) <= 0) { continue; } }
        Hide
        Unico Hommes added a comment -

        I think the supplied patch is probably the right solution. Order by descending won't work because it's not exactly the other way around that MySQL orders the nodes.

        Show
        Unico Hommes added a comment - I think the supplied patch is probably the right solution. Order by descending won't work because it's not exactly the other way around that MySQL orders the nodes.
        Hide
        Thomas Mueller added a comment -

        OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work.

        Not sure how to solve it, possibly use "order by desc"? I guess more tests would be required (it seems we use varbinary(16) in MySQL, which might deal with trailing zeroes differently and binary(16))

        Show
        Thomas Mueller added a comment - OK I see. For data store garbage collection, you could live without the method, as persistence manager scan is optional there. But the consistency checker needs it, and changing the consistency checker to do a regular node traversal would be probably quite a lot of work. Not sure how to solve it, possibly use "order by desc"? I guess more tests would be required (it seems we use varbinary(16) in MySQL, which might deal with trailing zeroes differently and binary(16))
        Show
        Unico Hommes added a comment - I was using the Jackrabbit consistency checker. See http://svn.apache.org/repos/asf/jackrabbit/tags/2.4.0/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/persistence/bundle/ConsistencyCheckerImpl.java
        Hide
        Julian Reschke added a comment -

        > What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.

        It's needed by the datastore garbage collector and the consistency checker.

        Show
        Julian Reschke added a comment - > What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems. It's needed by the datastore garbage collector and the consistency checker.
        Hide
        Thomas Mueller added a comment -

        What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.

        Show
        Thomas Mueller added a comment - What is your use case, that is, why do you need getAllNodeIds? I'm just wondering if we really need this method... If we could remove it then we wouldn't have to deal with such problems.
        Hide
        Unico Hommes added a comment -

        Problem was at my end apparently. No problem with Derby DB after all.

        Show
        Unico Hommes added a comment - Problem was at my end apparently. No problem with Derby DB after all.
        Hide
        Unico Hommes added a comment -

        Updated title and description because another problem with the same method was found for Derby DB. Attaching an updated patch.

        Show
        Unico Hommes added a comment - Updated title and description because another problem with the same method was found for Derby DB. Attaching an updated patch.
        Hide
        Unico Hommes added a comment -

        Patch against 2.4.x branch.

        The special handling of maxcount is only necessary in the case of longlong storage model. In the case of a binary storage model I can easily understand that ordering is not well defined and can vary from database to database. Therefore only do the special handling for longlong storage model.

        Show
        Unico Hommes added a comment - Patch against 2.4.x branch. The special handling of maxcount is only necessary in the case of longlong storage model. In the case of a binary storage model I can easily understand that ordering is not well defined and can vary from database to database. Therefore only do the special handling for longlong storage model.

          People

          • Assignee:
            Bart van der Schans
            Reporter:
            Unico Hommes
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development