Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7053

Remove deprecated BytesRef#getUTF8SortedAsUTF16Comparator(); remove natural comparator in favour of Java 8 one

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Followup from LUCENE-7052: This removes the legacy, deprecated getUTF8SortedAsUTF16Comparator() in the BytesRef class. I know originally we added the different comparators to be able to allow the index term dict to be sorted in different order. This never proved to be useful, as many Lucene queries rely on the default order. The only codec that used another byte order internally was the Lucene 3 one (but it used the unicode spaghetti algorithm to reorder its term enums at runtime).

      This patch also removes the BytesRef-Comparator completely and just implements compareTo. So all code can rely on natural ordering.

      This patch also cleans up other usages of natural order comparators, e.g. in ArrayUtil, because Java 8 natively provides a comparator.

      1. LUCENE-7053.patch
        43 kB
        Uwe Schindler
      2. LUCENE-7053.patch
        29 kB
        Uwe Schindler
      3. LUCENE-7053.patch
        17 kB
        Uwe Schindler
      4. LUCENE-7053.patch
        14 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          thetaphi Uwe Schindler added a comment -

          Patch (copy from LUCENE-7052).

          Show
          thetaphi Uwe Schindler added a comment - Patch (copy from LUCENE-7052 ).
          Hide
          thetaphi Uwe Schindler added a comment -

          There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil?

          Show
          thetaphi Uwe Schindler added a comment - There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil?
          Hide
          mikemccand Michael McCandless added a comment -

          +1 to the patch.

          You can also fix TestUnicodeUtil's custom String -> int[] code points logic maybe?

          There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil?

          +1

          We can take this further, e.g. I grep'd for places calling BytesRef.getUTF8SortedAsUnicodeComparator and it turns up silliness in BlockTermsReader that should just be invoking BytesRef.compareTo directly instead, I think?

          Show
          mikemccand Michael McCandless added a comment - +1 to the patch. You can also fix TestUnicodeUtil 's custom String -> int[] code points logic maybe? There is a bit code duplication in both tests (sorting Strings in code point order), should we maybe move the new comparator to TestUtil? +1 We can take this further, e.g. I grep'd for places calling BytesRef.getUTF8SortedAsUnicodeComparator and it turns up silliness in BlockTermsReader that should just be invoking BytesRef.compareTo directly instead, I think?
          Hide
          thetaphi Uwe Schindler added a comment -

          We can take this further, e.g. I grep'd for places calling BytesRef.getUTF8SortedAsUnicodeComparator and it turns up silliness in BlockTermsReader that should just be invoking BytesRef.compareTo directly instead, I think?

          Yeah. As said, we may not remove the comparator completely, but we should only use it at places where we can't use Comparable<BytesRef> interface that BytesRef implements.

          You can also fix TestUnicodeUtil's custom String -> int[] code points logic maybe?

          Will check this, too. I am currently investigating it Java 8 already has some Comparator interface somewhere ready-to use. But does not look like that.

          Show
          thetaphi Uwe Schindler added a comment - We can take this further, e.g. I grep'd for places calling BytesRef.getUTF8SortedAsUnicodeComparator and it turns up silliness in BlockTermsReader that should just be invoking BytesRef.compareTo directly instead, I think? Yeah. As said, we may not remove the comparator completely, but we should only use it at places where we can't use Comparable<BytesRef> interface that BytesRef implements. You can also fix TestUnicodeUtil's custom String -> int[] code points logic maybe? Will check this, too. I am currently investigating it Java 8 already has some Comparator interface somewhere ready-to use. But does not look like that.
          Hide
          thetaphi Uwe Schindler added a comment -

          New patch with Mike's suggestions. I added a comparator static final constant to TestUtil with a warning that it is not as effective as it could be (Dawid Weiss comment on LUCENE-7052).

          I did not yet look into further cleaning up and removing comparator usage.

          Show
          thetaphi Uwe Schindler added a comment - New patch with Mike's suggestions. I added a comparator static final constant to TestUtil with a warning that it is not as effective as it could be ( Dawid Weiss comment on LUCENE-7052 ). I did not yet look into further cleaning up and removing comparator usage.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          New patch:

          • I removed some usage of comparator and replaced by compareTo at many places.
          • Moved the code of the comparator to compareTo.
          • The comparator isstelf is now a static final constant BytesRef.COMPARATOR (the name was horrible). It is documented to just order by bytes. In fact its just a method reference to the compareTo method!
          • I also removed the chicken-egg statement in FuzzyQuery.
          Show
          thetaphi Uwe Schindler added a comment - - edited New patch: I removed some usage of comparator and replaced by compareTo at many places. Moved the code of the comparator to compareTo . The comparator isstelf is now a static final constant BytesRef.COMPARATOR (the name was horrible). It is documented to just order by bytes. In fact its just a method reference to the compareTo method! I also removed the chicken-egg statement in FuzzyQuery.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          As we implemented compareTo we could remove the comparator completely. One could use Comparator.naturalOrder() (see https://docs.oracle.com/javase/8/docs/api/java/util/Comparator.html#naturalOrder--) instead (naturalOrder is defined to use compareTo. At places like Collections.sort() we could remove the comparator argument completely.

          Any comments on this?

          Show
          thetaphi Uwe Schindler added a comment - - edited As we implemented compareTo we could remove the comparator completely. One could use Comparator.naturalOrder() (see https://docs.oracle.com/javase/8/docs/api/java/util/Comparator.html#naturalOrder-- ) instead (naturalOrder is defined to use compareTo . At places like Collections.sort() we could remove the comparator argument completely. Any comments on this?
          Hide
          rcmuir Robert Muir added a comment -

          Yes, please, and remove BytesRef.COMPARATOR which just duplicates that: naturalOrder() already returns a singleton.

          Also in cases like TreeSet creation in the join tests, we should just make new TreeSet<>() and not pass any comparator in at all.

          Show
          rcmuir Robert Muir added a comment - Yes, please, and remove BytesRef.COMPARATOR which just duplicates that: naturalOrder() already returns a singleton. Also in cases like TreeSet creation in the join tests, we should just make new TreeSet<>() and not pass any comparator in at all.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          Here is the comparator cleanup. I removed usage of the BytesRef comparator completely (where possible). Otherwise I used Comparator.naturalOrder().

          I also removed the ArrayUtil.naturalComparator() method, because it is obsolete with Java 8.

          Show
          thetaphi Uwe Schindler added a comment - - edited Here is the comparator cleanup. I removed usage of the BytesRef comparator completely (where possible). Otherwise I used Comparator.naturalOrder() . I also removed the ArrayUtil.naturalComparator() method, because it is obsolete with Java 8.
          Hide
          thetaphi Uwe Schindler added a comment -

          All tests pass.

          Show
          thetaphi Uwe Schindler added a comment - All tests pass.
          Hide
          rcmuir Robert Muir added a comment -

          +1

          Show
          rcmuir Robert Muir added a comment - +1
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f48d23cd1448f20fb1b97ec986ded76a04a7075c in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f48d23c ]

          LUCENE-7053: Remove custom comparators from BytesRef class and solely use natural byte[] comparator throughout codebase. It also replaces the natural comparator in ArrayUtil by Java 8's Comparator#naturalOrder().

          Show
          jira-bot ASF subversion and git services added a comment - Commit f48d23cd1448f20fb1b97ec986ded76a04a7075c in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f48d23c ] LUCENE-7053 : Remove custom comparators from BytesRef class and solely use natural byte[] comparator throughout codebase. It also replaces the natural comparator in ArrayUtil by Java 8's Comparator#naturalOrder().
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8ffa436f00d24cb45af49160739f71b3654349ce in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ffa436 ]

          LUCENE-7053: Move comparator to better place in code; generalize to use CharSequence instead of String

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8ffa436f00d24cb45af49160739f71b3654349ce in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ffa436 ] LUCENE-7053 : Move comparator to better place in code; generalize to use CharSequence instead of String
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Uwe Schindler!

          Show
          mikemccand Michael McCandless added a comment - Thanks Uwe Schindler !
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3c27980c4ae7777716ba74b3a0e2c70b3dd1c1d4 in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3c27980 ]

          LUCENE-7053: Simplify code to work around Java 8u25 compiler bug

          Show
          jira-bot ASF subversion and git services added a comment - Commit 3c27980c4ae7777716ba74b3a0e2c70b3dd1c1d4 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3c27980 ] LUCENE-7053 : Simplify code to work around Java 8u25 compiler bug

            People

            • Assignee:
              thetaphi Uwe Schindler
              Reporter:
              thetaphi Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development