Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      CharArraySet#contains(...) always creates a HashCode of the String, Char[] or CharSequence even if the set is empty.
      contains should return false if set it empty

      1. LUCENE-2195.patch
        5 kB
        Robert Muir
      2. LUCENE-2195.patch
        5 kB
        Simon Willnauer
      3. LUCENE-2195.patch
        4 kB
        Simon Willnauer
      4. LUCENE-2195.patch
        4 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        here is a patch

        Show
        Simon Willnauer added a comment - here is a patch
        Hide
        Simon Willnauer added a comment -

        Updated patch. This patch does not count==0 check in contains(Object) as the o.toString() could return null and the NPE would be silently swallowed if the set is empty.
        The null check and NPE are necessary to yield consistent behavior no matter if the set is empty or not.

        Show
        Simon Willnauer added a comment - Updated patch. This patch does not count==0 check in contains(Object) as the o.toString() could return null and the NPE would be silently swallowed if the set is empty. The null check and NPE are necessary to yield consistent behavior no matter if the set is empty or not.
        Hide
        Robert Muir added a comment -

        Simon I like it. this is useful for things like empty stopword sets, so we do not hash every input token unnecessarily.

        Show
        Robert Muir added a comment - Simon I like it. this is useful for things like empty stopword sets, so we do not hash every input token unnecessarily.
        Hide
        Yonik Seeley added a comment -

        Seems that this patch would just slow things down (slightly) for the common case, and for correctly written code.
        If one really wants to optimize the case of an empty set, shouldn't that be done at a higher level (i.e. don't put the stop filter in the chain, don't create a set at all, etc).

        So I think we should skip both the null checks and the ==0 checks and leave it as is.

        Show
        Yonik Seeley added a comment - Seems that this patch would just slow things down (slightly) for the common case, and for correctly written code. If one really wants to optimize the case of an empty set, shouldn't that be done at a higher level (i.e. don't put the stop filter in the chain, don't create a set at all, etc). So I think we should skip both the null checks and the ==0 checks and leave it as is.
        Hide
        Robert Muir added a comment -

        Yonik, I disagree. I would rather see contains() comparisons against empty sets than this kind of if (set != null && set.contains())... i see this code around a lot.

        it does not matter, it is the same.

        Show
        Robert Muir added a comment - Yonik, I disagree. I would rather see contains() comparisons against empty sets than this kind of if (set != null && set.contains())... i see this code around a lot. it does not matter, it is the same.
        Hide
        Yonik Seeley added a comment -

        It's not quite the same - no method call for set != null.
        But my point stands - this is optimization for the very uncommon case at the expense of the common case. This specific instance isn't too big of a deal (it's a predictable branch at least), but as a matter of principle we should avoid going down those roads.

        Show
        Yonik Seeley added a comment - It's not quite the same - no method call for set != null. But my point stands - this is optimization for the very uncommon case at the expense of the common case. This specific instance isn't too big of a deal (it's a predictable branch at least), but as a matter of principle we should avoid going down those roads.
        Hide
        Robert Muir added a comment -

        well then something must be decided: either we use null sets,remove use of EMPTY_SET all over the place, and have if set != null scattered all over the place, or we go this route.

        i do not care either way but we should not be using EMPTY_SET in various tokenstreams without this patch, else we unnecessarily hash every input token for no good reason.

        Show
        Robert Muir added a comment - well then something must be decided: either we use null sets,remove use of EMPTY_SET all over the place, and have if set != null scattered all over the place, or we go this route. i do not care either way but we should not be using EMPTY_SET in various tokenstreams without this patch, else we unnecessarily hash every input token for no good reason.
        Hide
        Simon Willnauer added a comment -

        I changed my patch to please Yonik who has performance concerns as well as robert who wants to use EMTPY_SET instead of set == null checks.
        I agree with robert that I would rather have an empty set instead of null asssigned if the set is omitted or if the default set is empty. Yet, I subclassed UnmodifiableCharArraySet and added a specailized implementation for EMPTY_SET that checks for null to throw the NPE and otherwise always returns false for all contains(...) methods.
        This class is final and the overhead for the method call should be very small.

        Show
        Simon Willnauer added a comment - I changed my patch to please Yonik who has performance concerns as well as robert who wants to use EMTPY_SET instead of set == null checks. I agree with robert that I would rather have an empty set instead of null asssigned if the set is omitted or if the default set is empty. Yet, I subclassed UnmodifiableCharArraySet and added a specailized implementation for EMPTY_SET that checks for null to throw the NPE and otherwise always returns false for all contains(...) methods. This class is final and the overhead for the method call should be very small.
        Hide
        Simon Willnauer added a comment -

        any comments on the latest patch?

        Show
        Simon Willnauer added a comment - any comments on the latest patch?
        Hide
        Robert Muir added a comment -

        Simon i made a few changes (i like the idea of simply a special EmptyCharArraySet so this does not slow down anything else and alleviates any concerns)

        • I do not think unmodifiableset should have a no-arg ctor, so instead i pushed this up to emptychararrayset
        • i do not think emptychararrayset need override and throw uoe for removeAll or retainAll, and i don't think the tests were correct in assuming it will throw uoe. it will not throw uoe for say, removeAll only because it is empty. it will just do nothing.

        now the emptychararrayset conforms with AbstractCollation.remove/retainAll:
        Note that this implementation will throw an UnsupportedOperationException if the iterator returned by the iterator method does not implement the remove method and this collection contains one or more elements in common with/not present with the specified collection.

        Show
        Robert Muir added a comment - Simon i made a few changes (i like the idea of simply a special EmptyCharArraySet so this does not slow down anything else and alleviates any concerns) I do not think unmodifiableset should have a no-arg ctor, so instead i pushed this up to emptychararrayset i do not think emptychararrayset need override and throw uoe for removeAll or retainAll, and i don't think the tests were correct in assuming it will throw uoe. it will not throw uoe for say, removeAll only because it is empty. it will just do nothing. now the emptychararrayset conforms with AbstractCollation.remove/retainAll: Note that this implementation will throw an UnsupportedOperationException if the iterator returned by the iterator method does not implement the remove method and this collection contains one or more elements in common with/not present with the specified collection.
        Hide
        Simon Willnauer added a comment -

        I do not think unmodifiableset should have a no-arg ctor, so instead i pushed this up to emptychararrayset

        ok I'm fine with that.

        i do not think emptychararrayset need override and throw uoe for removeAll or retainAll, and i don't think the tests were correct in assuming it will throw uoe. it will not throw uoe for say, removeAll only because it is empty. it will just do nothing.

        You are right, this should only throw this exception if the set contains it and the Iterator does not implement remove()

             * Note that this implementation throws an
             * <tt>UnsupportedOperationException</tt> if the iterator returned by this
             * collection's iterator method does not implement the <tt>remove</tt>
             * method and this collection contains the specified object.
        

        same is true for AbstractSet#removeAll() & retainAll()

        Thanks for updating it. I think this is good to go though!

        Show
        Simon Willnauer added a comment - I do not think unmodifiableset should have a no-arg ctor, so instead i pushed this up to emptychararrayset ok I'm fine with that. i do not think emptychararrayset need override and throw uoe for removeAll or retainAll, and i don't think the tests were correct in assuming it will throw uoe. it will not throw uoe for say, removeAll only because it is empty. it will just do nothing. You are right, this should only throw this exception if the set contains it and the Iterator does not implement remove() * Note that this implementation throws an * <tt>UnsupportedOperationException</tt> if the iterator returned by this * collection's iterator method does not implement the <tt>remove</tt> * method and this collection contains the specified object. same is true for AbstractSet#removeAll() & retainAll() Thanks for updating it. I think this is good to go though!
        Hide
        Robert Muir added a comment -

        Thanks for updating it. I think this is good to go though!

        me too, if no one objects I will commit in a few days.

        Show
        Robert Muir added a comment - Thanks for updating it. I think this is good to go though! me too, if no one objects I will commit in a few days.
        Hide
        Robert Muir added a comment -

        Committed revision 902723.

        Thanks Simon!

        Show
        Robert Muir added a comment - Committed revision 902723. Thanks Simon!

          People

          • Assignee:
            Robert Muir
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development