ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-44

DataTree does not use natural sort for getChildren

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DataTree.getChildren() performs Collection.sort() on the list of children before returning it, but Java's default comparator for Strings will sort 'lock-20' before 'lock-3' for instance.

      1. seqfile2.diff
        5 kB
        Jakob Homan
      2. seqfile.diff
        5 kB
        Jakob Homan

        Activity

        Hide
        Patrick Hunt added a comment -

        3.0.0 has been released, closing issues.

        Show
        Patrick Hunt added a comment - 3.0.0 has been released, closing issues.
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #42 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/42/)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #42 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/42/ )
        Hide
        Mahadev konar added a comment -

        2 billion creates seem to be a lot fo creates. also since we are working on an upgrade feature on zookeeper — we can upgrade the version to be long sometime later ... – which would make the wrap around 4* 10^18.

        Show
        Mahadev konar added a comment - 2 billion creates seem to be a lot fo creates. also since we are working on an upgrade feature on zookeeper — we can upgrade the version to be long sometime later ... – which would make the wrap around 4* 10^18.
        Hide
        Stu Hood added a comment -

        This is probably a separate issue, but is anyone worried by the fact that after ~2 billion creates, the sequence number will wrap around and break various algorithms?

        Show
        Stu Hood added a comment - This is probably a separate issue, but is anyone worried by the fact that after ~2 billion creates, the sequence number will wrap around and break various algorithms?
        Hide
        Mahadev konar added a comment -

        I just committed this. THanks Jakob.

        Show
        Mahadev konar added a comment - I just committed this. THanks Jakob.
        Hide
        Jakob Homan added a comment -

        Unnecessary placeholder removed.

        Show
        Jakob Homan added a comment - Unnecessary placeholder removed.
        Hide
        Mahadev konar added a comment -

        +1 .. thanks jakob ....

        nit:

        is %1$ redundant? i had to look up the formatter in java for this ... i think %010d should be fine?

        Show
        Mahadev konar added a comment - +1 .. thanks jakob .... nit: is %1$ redundant? i had to look up the formatter in java for this ... i think %010d should be fine?
        Hide
        Benjamin Reed added a comment -

        +1 Fantastic job Jakob! This looks great. Test cases and everything.

        Show
        Benjamin Reed added a comment - +1 Fantastic job Jakob! This looks great. Test cases and everything.
        Hide
        Jakob Homan added a comment -

        Implements the changes discussed in the Jira.

        Show
        Jakob Homan added a comment - Implements the changes discussed in the Jira.
        Hide
        Jakob Homan added a comment -

        I've gone ahead and implemented the patch as discussed, which does the following:

        • Implements the padding for sequential file names, padding ten zeroes to the name (enough to handle as many digits as can be generated by a 32-bit int). This allows the names to be sorted by natural order.
        • Removed the server sorting the list of children, leaving it up to the client to do so if it wishes. This could give a very slight performance bump, particularly when dealing with directories with a large number of files.
        • Updated the Javadoc to reflect this change in both versions of the getChildren method.
        • Adds a unit test that checks that the sequential file names are being created correctly.

        I thought about adding a GetChildrenSorted or such method, but since it's really easy for the client to sort the list once returned (java.Collections.List.Sort(children)), it seems like it would be more work to add and support these methods.

        Hope this is useful.

        Show
        Jakob Homan added a comment - I've gone ahead and implemented the patch as discussed, which does the following: Implements the padding for sequential file names, padding ten zeroes to the name (enough to handle as many digits as can be generated by a 32-bit int). This allows the names to be sorted by natural order. Removed the server sorting the list of children, leaving it up to the client to do so if it wishes. This could give a very slight performance bump, particularly when dealing with directories with a large number of files. Updated the Javadoc to reflect this change in both versions of the getChildren method. Adds a unit test that checks that the sequential file names are being created correctly. I thought about adding a GetChildrenSorted or such method, but since it's really easy for the client to sort the list once returned (java.Collections.List.Sort(children)), it seems like it would be more work to add and support these methods. Hope this is useful.
        Hide
        Patrick Hunt added a comment -

        +1, sounds like a great addition to a FAQ.

        Assigning to Ben to remove the sort and document appropriately.

        Show
        Patrick Hunt added a comment - +1, sounds like a great addition to a FAQ. Assigning to Ben to remove the sort and document appropriately.
        Hide
        Benjamin Reed added a comment -

        +1 Excellent suggestion!

        Show
        Benjamin Reed added a comment - +1 Excellent suggestion!
        Hide
        Stu Hood added a comment -

        I'd be fine with the sort being removed from getChildren, but fixing the natural sorting problem is still important. It causes undue pain for the client to have to perform an integer-aware sort to get the proper ordering for locks.

        My workaround was to pad the sequence value before appending it to the path in create:

        if ((createRequest.getFlags() & CreateFlags.SEQUENCE) != 0) {
            path = path + String.format("%1$010d", parentCVersion);
        }
        
        Show
        Stu Hood added a comment - I'd be fine with the sort being removed from getChildren, but fixing the natural sorting problem is still important. It causes undue pain for the client to have to perform an integer-aware sort to get the proper ordering for locks. My workaround was to pad the sequence value before appending it to the path in create: if ((createRequest.getFlags() & CreateFlags.SEQUENCE) != 0) { path = path + String .format( "%1$010d" , parentCVersion); }
        Hide
        Patrick Hunt added a comment -

        My personal preference would be to leave it unsorted and let the client sort if needed. That way you only pay the perf penalty if you need the children sorted. Either way we should address this – remove the unnecessary sort and make sure to document that the results are unsorted.

        Show
        Patrick Hunt added a comment - My personal preference would be to leave it unsorted and let the client sort if needed. That way you only pay the perf penalty if you need the children sorted. Either way we should address this – remove the unnecessary sort and make sure to document that the results are unsorted.
        Hide
        Benjamin Reed added a comment -

        Actually we have been meaning to remove the sort. It causes an unnecessary burden on the server. If we really want things sorted, we should do it in the client library. The question is: do we really want things sorted? Especially in Java it is easy enough for the user to sort the list themself if needed.

        Show
        Benjamin Reed added a comment - Actually we have been meaning to remove the sort. It causes an unnecessary burden on the server. If we really want things sorted, we should do it in the client library. The question is: do we really want things sorted? Especially in Java it is easy enough for the user to sort the list themself if needed.

          People

          • Assignee:
            Jakob Homan
            Reporter:
            Stu Hood
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development