ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-44

DataTree does not use natural sort for getChildren

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DataTree.getChildren() performs Collection.sort() on the list of children before returning it, but Java's default comparator for Strings will sort 'lock-20' before 'lock-3' for instance.

      1. seqfile2.diff
        5 kB
        Jakob Homan
      2. seqfile.diff
        5 kB
        Jakob Homan

        Activity

        Stu Hood created issue -
        Hide
        Benjamin Reed added a comment -

        Actually we have been meaning to remove the sort. It causes an unnecessary burden on the server. If we really want things sorted, we should do it in the client library. The question is: do we really want things sorted? Especially in Java it is easy enough for the user to sort the list themself if needed.

        Show
        Benjamin Reed added a comment - Actually we have been meaning to remove the sort. It causes an unnecessary burden on the server. If we really want things sorted, we should do it in the client library. The question is: do we really want things sorted? Especially in Java it is easy enough for the user to sort the list themself if needed.
        Hide
        Patrick Hunt added a comment -

        My personal preference would be to leave it unsorted and let the client sort if needed. That way you only pay the perf penalty if you need the children sorted. Either way we should address this – remove the unnecessary sort and make sure to document that the results are unsorted.

        Show
        Patrick Hunt added a comment - My personal preference would be to leave it unsorted and let the client sort if needed. That way you only pay the perf penalty if you need the children sorted. Either way we should address this – remove the unnecessary sort and make sure to document that the results are unsorted.
        Hide
        Stu Hood added a comment -

        I'd be fine with the sort being removed from getChildren, but fixing the natural sorting problem is still important. It causes undue pain for the client to have to perform an integer-aware sort to get the proper ordering for locks.

        My workaround was to pad the sequence value before appending it to the path in create:

        if ((createRequest.getFlags() & CreateFlags.SEQUENCE) != 0) {
            path = path + String.format("%1$010d", parentCVersion);
        }
        
        Show
        Stu Hood added a comment - I'd be fine with the sort being removed from getChildren, but fixing the natural sorting problem is still important. It causes undue pain for the client to have to perform an integer-aware sort to get the proper ordering for locks. My workaround was to pad the sequence value before appending it to the path in create: if ((createRequest.getFlags() & CreateFlags.SEQUENCE) != 0) { path = path + String .format( "%1$010d" , parentCVersion); }
        Hide
        Benjamin Reed added a comment -

        +1 Excellent suggestion!

        Show
        Benjamin Reed added a comment - +1 Excellent suggestion!
        Hide
        Patrick Hunt added a comment -

        +1, sounds like a great addition to a FAQ.

        Assigning to Ben to remove the sort and document appropriately.

        Show
        Patrick Hunt added a comment - +1, sounds like a great addition to a FAQ. Assigning to Ben to remove the sort and document appropriately.
        Patrick Hunt made changes -
        Field Original Value New Value
        Assignee Benjamin Reed [ breed ]
        Patrick Hunt made changes -
        Fix Version/s 3.0.0 [ 12313216 ]
        Hide
        Jakob Homan added a comment -

        I've gone ahead and implemented the patch as discussed, which does the following:

        • Implements the padding for sequential file names, padding ten zeroes to the name (enough to handle as many digits as can be generated by a 32-bit int). This allows the names to be sorted by natural order.
        • Removed the server sorting the list of children, leaving it up to the client to do so if it wishes. This could give a very slight performance bump, particularly when dealing with directories with a large number of files.
        • Updated the Javadoc to reflect this change in both versions of the getChildren method.
        • Adds a unit test that checks that the sequential file names are being created correctly.

        I thought about adding a GetChildrenSorted or such method, but since it's really easy for the client to sort the list once returned (java.Collections.List.Sort(children)), it seems like it would be more work to add and support these methods.

        Hope this is useful.

        Show
        Jakob Homan added a comment - I've gone ahead and implemented the patch as discussed, which does the following: Implements the padding for sequential file names, padding ten zeroes to the name (enough to handle as many digits as can be generated by a 32-bit int). This allows the names to be sorted by natural order. Removed the server sorting the list of children, leaving it up to the client to do so if it wishes. This could give a very slight performance bump, particularly when dealing with directories with a large number of files. Updated the Javadoc to reflect this change in both versions of the getChildren method. Adds a unit test that checks that the sequential file names are being created correctly. I thought about adding a GetChildrenSorted or such method, but since it's really easy for the client to sort the list once returned (java.Collections.List.Sort(children)), it seems like it would be more work to add and support these methods. Hope this is useful.
        Jakob Homan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Jakob Homan added a comment -

        Implements the changes discussed in the Jira.

        Show
        Jakob Homan added a comment - Implements the changes discussed in the Jira.
        Jakob Homan made changes -
        Attachment seqfile.diff [ 12387376 ]
        Mahadev konar made changes -
        Assignee Benjamin Reed [ breed ] Jakob Homan [ jghoman ]
        Hide
        Benjamin Reed added a comment -

        +1 Fantastic job Jakob! This looks great. Test cases and everything.

        Show
        Benjamin Reed added a comment - +1 Fantastic job Jakob! This looks great. Test cases and everything.
        Benjamin Reed made changes -
        Hadoop Flags [Reviewed]
        Hide
        Mahadev konar added a comment -

        +1 .. thanks jakob ....

        nit:

        is %1$ redundant? i had to look up the formatter in java for this ... i think %010d should be fine?

        Show
        Mahadev konar added a comment - +1 .. thanks jakob .... nit: is %1$ redundant? i had to look up the formatter in java for this ... i think %010d should be fine?
        Hide
        Jakob Homan added a comment -

        Unnecessary placeholder removed.

        Show
        Jakob Homan added a comment - Unnecessary placeholder removed.
        Jakob Homan made changes -
        Attachment seqfile2.diff [ 12387378 ]
        Hide
        Mahadev konar added a comment -

        I just committed this. THanks Jakob.

        Show
        Mahadev konar added a comment - I just committed this. THanks Jakob.
        Mahadev konar made changes -
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Stu Hood added a comment -

        This is probably a separate issue, but is anyone worried by the fact that after ~2 billion creates, the sequence number will wrap around and break various algorithms?

        Show
        Stu Hood added a comment - This is probably a separate issue, but is anyone worried by the fact that after ~2 billion creates, the sequence number will wrap around and break various algorithms?
        Hide
        Mahadev konar added a comment -

        2 billion creates seem to be a lot fo creates. also since we are working on an upgrade feature on zookeeper — we can upgrade the version to be long sometime later ... – which would make the wrap around 4* 10^18.

        Show
        Mahadev konar added a comment - 2 billion creates seem to be a lot fo creates. also since we are working on an upgrade feature on zookeeper — we can upgrade the version to be long sometime later ... – which would make the wrap around 4* 10^18.
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #42 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/42/)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #42 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/42/ )
        Hide
        Patrick Hunt added a comment -

        3.0.0 has been released, closing issues.

        Show
        Patrick Hunt added a comment - 3.0.0 has been released, closing issues.
        Patrick Hunt made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Jakob Homan
            Reporter:
            Stu Hood
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development