Uploaded image for project: 'Lucene.Net'
  1. Lucene.Net
  2. LUCENENET-616

Make Collections from Lucene.Net.Support into a 1st Class Feature

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Lucene.Net 4.8.0
    • Lucene.Net 4.8.0
    • Lucene.Net Core
    • None
    • Important

    Description

      The collection types in Lucene.Net.Support were originally sourced to support Lucene.Net itself. While they were made public, they were not considered to be features that would be used by anyone except for advanced users.

      However, it has become clear by user reports that some parts of Lucene's design require specialized collections that don't exist in the .NET Framework in order to properly function (see LUCENENET-612 and LUCENENET-615. .NET users are generally not familiar with these specialized collection types and assume that when IDictionary<TKey, TValue> is required by an API that using Dictionary<TKey, TValue> is their best choice. We need to improve documentation and increase visibility of the specialized collection types in order to help them along.

      Some of the existing collections are composed of other nested collection objects and should be replaced with a lower-level implementation, if possible. Since these are breaking API changes, they should be done before the official release of Lucene.Net 4.8.0.

      Additionally, many parts of Lucene.NET expect the collections to:

      • Be structurally equal
      • Format their contents in the ToString() method

      The safest and most thorough way to achieve this is to replace the usage of all built-in .NET collections with collections that we own that implement this functionality. .NET provides interfaces to help achieve this:

      • IStructuralEquatable
      • IFormattable

      But neither interface is implemented in any of the built-in collections in .NET.

      Requirements for Lucene.NET Collections

      We must have the following collections for internal use within Lucene.NET, and each should also be made available to end users.

      • J2N.Collections.Generic.List<T>
        • Subclass of System.Collections.Generic.List<T>
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.HashSet<T>
        • Subclass of System.Collections.Generic.List<T>
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.SortedSet<T>
        • Subclass of System.Collections.Generic.SortedSet<T>
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.Dictionary<TKey, TValue>
        • Same interface as System.Collections.Generic.Dictionary<TKey, TValue>
        • Supports null keys
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.SortedDictionary<TKey, TValue>
        • Same interface as System.Collections.Generic.SortedDictionary<TKey, TValue>
        • Supports null keys
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.LinkedDictionary<TKey, TValue>
        • Same interface as System.Collections.Generic.Dictionary<TKey, TValue>
        • Preserves insertion order across adds/deletes
        • Supports null keys
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.PriorityQueue<T>
        • Migrate Lucene.Net.Support.PriorityQueue<T> implementation, but clean up the API for .NET (AddAll() > AddRange(), etc.)
      • J2N.Collections.Generic.IdentityDictionary<TKey, TValue> (See Lucene.Net.Support.IdentityHashMap<TKey, TValue>)
        • Same interface as System.Collections.Generic.Dictionary<TKey, TValue>
        • Supports null keys
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Collections.Generic.IdentityHashSet<T> (See Lucene.Net.Support.IdentityHashSet<T>)
        • Subclass of System.Collections.Generic.HashSet<T>
        • Implements IStructuralEquatable
        • Implements IFormattable

      We should have the following collections for internal use within Lucene.NET.

      • J2N.Collections.Generic.LinkedHashSet<T>
        • Same interface as System.Collections.Generic.List<T>
        • Preserves insertion order across adds/deletes
        • Implements IStructuralEquatable
        • Implements IFormattable
      • J2N.Runtime.CompilerServices.ConditionalWeakTable<TKey, TValue> (See LUCENENET-636)

      Special Cases

      There are a few uses where the above collections won't suffice because they use low-level methods that don't exist in .NET collections. However, these are closed systems and we don't have to worry about supporting end users in these cases.

      • Lucene.Net.TestFramework.Analysis.MockCharFilter > Use C5's TreeDictionary<TKey, TValue
      • Lucene.Net.Grouping.AbstractGroupFacetCollector > Use C5's TreeSet<T>
      • Lucene.Net.Highlighter.PostingsHighlight.PostingsHighlighter > Use C5's TreeSet<T>

      Note that C5's TreeSet<T> does not currently support ISet<T>, and we will need to push our implementation of it back to them to support this change.

      Structural Equality

      J2N has J2N.Collections.Generic.ListEqualityComparer<T>, J2N.Collections.Generic.SetEqualityComparer<T>, J2N.Collections.Generic.DictionaryEqualityComparer<TKey, TValue>, and J2N.Collections.StructuralEqualityComparer<T> that implement the structural equality behavior for each type of collection. These types should be passed in through each collection's constructor, as each has a Default and an Aggressive implementation.

      Default is more efficient and should be used in all closed scenarios within Lucene.NET (where the collection is not passed in from outside of the class). Aggressive is designed to interoperate with built-in .NET collections.

      See these minimal examples of implementations.

      Collection Formatting

      J2N has a StringFormatter class that can be used to format collections, as follows.

      // IFormattable
      public string ToString(IFormatProvider provider)
      {
          return string.Format(provider, "{0}", this);
      }
      
      public override string ToString()
      {
          return ToString(StringFormatter.CurrentCulture);
      }
      

      Nullable Keys

      Use the attached NullableKey<T> and NullableKeyDictionary<TKey, TValue> implementations. NullableKeyDictionary<TKey, TValue> is intended to be a base class for all nullable key dictionary implementations. The backing dictionary (the one we are trying to mimic) is then passed in through its constructor.

      Integration

      After the collections have been implemented, we can eliminate the usage of Lucene.Net.Support.Collections.Equals(), Lucene.Net.Support.Collections.GetHashCode(), and {{Lucene.Net.Support.Collections.ToString().

      Usage in of Lucene.Net.Support.Collections.Equals() and {{Lucene.Net.Support.Collections.GetHashCode() in Lucene.Net can be changed to the equality comparer matching the collection type.

      For string formatting, the usage can simply be changed to collection.ToString() in production code, and we can use in collection.ToString(J2N.Text.StringFormatter.InvariantCulture) to patch tests.

      For types where collections are passed from the outside that may contain nested collections, we should either use Aggressive structural comparison, or document that if nested collections are needed, the end user should use collections that implement IStructuralEquatable, such as those from J2N.

      Attachments

        1. NullableKeyDictionary.cs
          54 kB
          Shad Storhaug
        2. NullableKey.cs
          7 kB
          Shad Storhaug

        Issue Links

          Activity

            People

              nightowl888 Shad Storhaug
              nightowl888 Shad Storhaug
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 60h
                  60h
                  Remaining:
                  Remaining Estimate - 60h
                  60h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified