Solr
  1. Solr
  2. SOLR-912

org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None
    • Environment:

      Tomcat 6, JRE 6, Solr 1.3+ nightlies

      Description

      The implementation of NamedList - while being fast - is not necessarily type-safe. I have implemented an additional implementation of the same - ModernNamedList (a type-safe variation providing the same interface as NamedList) - while preserving the semantics in terms of ordering of elements and allowing null elements for key and values (keys are always Strings , while values correspond to generics ).

      1. SOLR-912.patch
        14 kB
        Karthik K
      2. SOLR-912.patch
        18 kB
        Karthik K
      3. NLProfile.java
        7 kB
        Karthik K

        Issue Links

          Activity

          Karthik K created issue -
          Hide
          Karthik K added a comment -

          1) New Type-safe implementation of NamedList , ModernNamedList .

          2) New interface INamedList<T> created

          3) New Test case - ModernNamedList added.

          4) Added more test cases to NamedListTest .

          Once the patch is approved - NamedList would be deprecated and the existing codebase in Solr would be replaced to use ModernNamedList<T> to be more type-safe.

          Show
          Karthik K added a comment - 1) New Type-safe implementation of NamedList , ModernNamedList . 2) New interface INamedList<T> created 3) New Test case - ModernNamedList added. 4) Added more test cases to NamedListTest . Once the patch is approved - NamedList would be deprecated and the existing codebase in Solr would be replaced to use ModernNamedList<T> to be more type-safe.
          Karthik K made changes -
          Field Original Value New Value
          Attachment SOLR-912.patch [ 12396020 ]
          Hide
          Noble Paul added a comment -

          Type safety is not an overriding concern when a NamedList is used (that is the beauty of it). It does not help in any way. Most of the usages of NamedList involves heterogeneous values .

          your implementation is not as efficient (memory usage) as the original one

          The idea of having an interface is an overkill

          -1

          Show
          Noble Paul added a comment - Type safety is not an overriding concern when a NamedList is used (that is the beauty of it). It does not help in any way. Most of the usages of NamedList involves heterogeneous values . your implementation is not as efficient (memory usage) as the original one The idea of having an interface is an overkill -1
          Hide
          Karthik K added a comment -

          The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

          If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type.

          There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.

          Show
          Karthik K added a comment - The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ). If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type. There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.
          Hide
          Noble Paul added a comment -

          The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

          This means we will need to use this interface wherever we use NamedList which is not desirable

          NamedList is designed to achieve a specific purpose

          Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these.

          If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type

          It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not.

          There seems to be no memory leaks as far as the container is concerned

          There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) .

          Creating an iterator object for every call to an iterator

          iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API

          Show
          Noble Paul added a comment - The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ). This means we will need to use this interface wherever we use NamedList which is not desirable NamedList is designed to achieve a specific purpose Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these. If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not. There seems to be no memory leaks as far as the container is concerned There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) . Creating an iterator object for every call to an iterator iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API
          Hide
          Karthik K added a comment -

          The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same.

          As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off.

          Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today.

          NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.

          Show
          Karthik K added a comment - The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same. As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off. Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today. NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.
          Karthik K made changes -
          Component/s Analysis [ 12312520 ]
          Component/s clients - java [ 12311580 ]
          Hide
          Hoss Man added a comment -

          If i'm understanding the discussion so far...

          • ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
          • INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

          do i have those aspects correct?

          with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for.

          I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API?

          If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove).

          My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.

          Show
          Hoss Man added a comment - If i'm understanding the discussion so far... ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared. INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc) do i have those aspects correct? with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for. I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API? If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove). My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.
          Hide
          Karthik K added a comment -
          ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
          INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

          True.

          Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page).

          Some results:

          addAll / getAll(): increase in performance is almost [1-10]% range.

          add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced.

          iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure.

          The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() .

          I will attach the final patch once we are convinced with the benchmark methodology and the numbers.

          Show
          Karthik K added a comment - ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared. INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc) True. Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page). Some results: addAll / getAll(): increase in performance is almost [1-10] % range. add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced. iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure. The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() . I will attach the final patch once we are convinced with the benchmark methodology and the numbers.
          Hide
          Karthik K added a comment -

          a sample benchmarking program that works with the previous patch submitted.

          Show
          Karthik K added a comment - a sample benchmarking program that works with the previous patch submitted.
          Karthik K made changes -
          Attachment NLProfile.java [ 12396363 ]
          Hide
          Karthik K added a comment -

          Additional Info: JRE 6, Linux 2.6.27-9 , 3.2GB Memory, Dual-core Intel @ 2.53 Ghz.

          Show
          Karthik K added a comment - Additional Info: JRE 6, Linux 2.6.27-9 , 3.2GB Memory, Dual-core Intel @ 2.53 Ghz.
          Hide
          Yonik Seeley added a comment -

          While we're going down the micro-benchmarking path, I tried eliminating ArrayList and got an additional 15-25% gain on common operations (create new, add between 5 and 15 elements, and then iterate over those elements later). This was with Java 1.6. -Xbatch improved the results even more... ~40% - but this is just a micro-benchmark.

          class NamedList2<T> implements INamedList<T> {
          
            protected NamedListEntry<T>[] nvPairs;
            protected int size;
          
            public NamedList2() {
              nvPairs = new NamedListEntry[10];
              size = 0;
            }
          
            @Override
            public int size() {
              return size;
            }
          
            @Override
            public String getName(int idx) {
              if (idx >= size) throw new ArrayIndexOutOfBoundsException();
              return nvPairs[idx].key;
            }
          
            @Override
            public T getVal(int idx) {
              if (idx >= size) throw new ArrayIndexOutOfBoundsException();    
              return nvPairs[idx].value;
            }
          
            private void resize() {
              NamedListEntry<T>[] arr = new NamedListEntry[nvPairs.length << 1];
              System.arraycopy(nvPairs, 0, arr, 0, size);
              nvPairs = arr;
            }
          
            @Override
            public void add(String name, T val) {
              if (size >= nvPairs.length) {
                resize();
              }
              nvPairs[size++] = new NamedListEntry<T>(name, val);
            }
          
          [...]
          
          Show
          Yonik Seeley added a comment - While we're going down the micro-benchmarking path, I tried eliminating ArrayList and got an additional 15-25% gain on common operations (create new, add between 5 and 15 elements, and then iterate over those elements later). This was with Java 1.6. -Xbatch improved the results even more... ~40% - but this is just a micro-benchmark. class NamedList2<T> implements INamedList<T> { protected NamedListEntry<T>[] nvPairs; protected int size; public NamedList2() { nvPairs = new NamedListEntry[10]; size = 0; } @Override public int size() { return size; } @Override public String getName( int idx) { if (idx >= size) throw new ArrayIndexOutOfBoundsException(); return nvPairs[idx].key; } @Override public T getVal( int idx) { if (idx >= size) throw new ArrayIndexOutOfBoundsException(); return nvPairs[idx].value; } private void resize() { NamedListEntry<T>[] arr = new NamedListEntry[nvPairs.length << 1]; System .arraycopy(nvPairs, 0, arr, 0, size); nvPairs = arr; } @Override public void add( String name, T val) { if (size >= nvPairs.length) { resize(); } nvPairs[size++] = new NamedListEntry<T>(name, val); } [...]
          Shalin Shekhar Mangar made changes -
          Fix Version/s 1.4 [ 12313351 ]
          Component/s Analysis [ 12312520 ]
          Fix Version/s 1.3.1 [ 12313422 ]
          Component/s search [ 12310674 ]
          Hide
          Karthik K added a comment -

          System.arrayCopy is great. It is bound to perform much better because of the native code for the same.

          Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that.

          The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5)

          int newCapacity = (oldCapacity * 3)/2 + 1;

          +1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).

          Show
          Karthik K added a comment - System.arrayCopy is great. It is bound to perform much better because of the native code for the same. Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that. The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5) int newCapacity = (oldCapacity * 3)/2 + 1; +1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).
          Hide
          Karthik K added a comment -

          Introduce another ctor. called Type(Object [] ) to distinguish them from List<Map.Entry<String, T > > and List of objects.

          Change the invocation in DebugComponent . Highlight Component etc.

          Show
          Karthik K added a comment - Introduce another ctor. called Type(Object [] ) to distinguish them from List<Map.Entry<String, T > > and List of objects. Change the invocation in DebugComponent . Highlight Component etc.
          Karthik K made changes -
          Attachment SOLR-912.patch [ 12396517 ]
          Hide
          Hoss Man added a comment -

          If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?

          We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.

          Show
          Hoss Man added a comment - If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety? We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.
          Hide
          Karthik K added a comment -
          We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.

          <code>

          • public NamedList(List nameValuePairs) {
          • nvPairs=nameValuePairs;
            + protected NamedList(List<Map.Entry<String, T>> nameValuePairs) { + nvPairs = nameValuePairs; - }

            </code>

          As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality.

          Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion.

          Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here.
          If backward compatibility is the key here- I can revisit the patch again ensuring the same.

          If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?

          When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .

          Show
          Karthik K added a comment - We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch. <code> public NamedList(List nameValuePairs) { nvPairs=nameValuePairs; + protected NamedList(List<Map.Entry<String, T>> nameValuePairs) { + nvPairs = nameValuePairs; - } </code> As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality. Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion. Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here. If backward compatibility is the key here- I can revisit the patch again ensuring the same. If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety? When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .
          Hide
          Hoss Man added a comment -

          Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?)

          If backward compatibility is the key here- I can revisit the patch again ensuring the same.

          There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people.

          unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity.

          As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs...

             * @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList.
             */
            public NamedList(List nameValuePairs) {
          

          ...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer)

          I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.

          Show
          Hoss Man added a comment - Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?) If backward compatibility is the key here- I can revisit the patch again ensuring the same. There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people. unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity. As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs... * @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList. */ public NamedList(List nameValuePairs) { ...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer) I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.
          Hide
          Karthik K added a comment -

          Agreed. This patch is too late in the game w.r.t type safety to change the underlying behavior at this point.

          Show
          Karthik K added a comment - Agreed. This patch is too late in the game w.r.t type safety to change the underlying behavior at this point.
          Karthik K made changes -
          Resolution Won't Fix [ 2 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Karthik K made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hoss Man added a comment -

          Kay: It occurs to me that one thing we could do is invert the nature of your patch:

          1. add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor
          2. deprecate the existing List constructor.

          Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements)

          thoughts?

          Show
          Hoss Man added a comment - Kay: It occurs to me that one thing we could do is invert the nature of your patch: add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor deprecate the existing List constructor. Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements) thoughts?
          Karthik K made changes -
          Link This issue relates to SOLR-967 [ SOLR-967 ]
          Hide
          Karthik K added a comment -

          Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .

          Show
          Karthik K added a comment - Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .
          Hide
          Hoss Man added a comment -

          reopening for the future

          Show
          Hoss Man added a comment - reopening for the future
          Hoss Man made changes -
          Status Closed [ 6 ] Reopened [ 4 ]
          Resolution Won't Fix [ 2 ]
          Hoss Man made changes -
          Affects Version/s 1.4 [ 12313351 ]
          Fix Version/s 1.4 [ 12313351 ]
          Hide
          Karthik K added a comment -

          So - what is the current status of this.

          At what time are we planning to deprecate the old implementation and incorporate this new d.s ( changing the guts of the internals ).

          Show
          Karthik K added a comment - So - what is the current status of this. At what time are we planning to deprecate the old implementation and incorporate this new d.s ( changing the guts of the internals ).
          Hide
          Noble Paul added a comment -

          Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.

          Creating an Entry Object per entry is memory inefficient compared to the existing one.

          type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

          Show
          Noble Paul added a comment - Type safety is generally not required in NamedList. very often we use heterogeneous NamedList. Creating an Entry Object per entry is memory inefficient compared to the existing one. type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel
          Hide
          Karthik K added a comment -

          Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.

          Creating an Entry Object per entry is memory inefficient compared to the existing one.

          type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

          The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.

          Show
          Karthik K added a comment - Type safety is generally not required in NamedList. very often we use heterogeneous NamedList. Creating an Entry Object per entry is memory inefficient compared to the existing one. type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.
          Hide
          Chris A. Mattmann added a comment -

          The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here.

          +1 to this. NamedList is not a very intuitive structure at all, which I remarked on SOLR-1516.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. +1 to this. NamedList is not a very intuitive structure at all, which I remarked on SOLR-1516 . Cheers, Chris
          Hide
          Noble Paul added a comment - - edited

          The performance numbers in here say a different story

          I'm not referring to perf numbers here . It is memory efficiency.

          As regarding Entry Objects being a memory hog - do we have some stats to back it up.

          We don't need stats for everything. we should know about how VM holds objects .

          Let me illustrate with a case of consider 5 key->values on a 32 bit m/c

          NamedList(Backed by arraylist)
          one Object [] + array size= 4 + 5 * 2*4 (bytes) = 44 bytes + the overhead of ArrayList

          ModernNamedList

          one Object[] + 5 entry objects (16 bytes object overhead + each has 2 references of 4+4 bytes)+ array size () = 4 + 16*5+ 5*2*4 + 5*4 = 144 bytes+ the overhead of ArrayList

          Add to this the overhead of GC'ing 5 entry objects

          reference : http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

          Show
          Noble Paul added a comment - - edited The performance numbers in here say a different story I'm not referring to perf numbers here . It is memory efficiency. As regarding Entry Objects being a memory hog - do we have some stats to back it up. We don't need stats for everything. we should know about how VM holds objects . Let me illustrate with a case of consider 5 key->values on a 32 bit m/c NamedList(Backed by arraylist) one Object [] + array size= 4 + 5 * 2*4 (bytes) = 44 bytes + the overhead of ArrayList ModernNamedList one Object[] + 5 entry objects (16 bytes object overhead + each has 2 references of 4+4 bytes)+ array size () = 4 + 16*5+ 5*2*4 + 5*4 = 144 bytes+ the overhead of ArrayList Add to this the overhead of GC'ing 5 entry objects reference : http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
          Hoss Man made changes -
          Link This issue is related to SOLR-2292 [ SOLR-2292 ]
          Hide
          Shai Erera added a comment - - edited

          Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:

          • Its <T> generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of <name,value> pairs, where name is always String, and the type of value may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like getInt(), getString().... I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on <T> at all. So I'd rather we remove that generic from the API signature.
          • There is what seems to be a totally redundant SimpleOrderedMap class, which has contradicting documentation in its class-jdocs:
            • a NamedList where access by key is more important than maintaining order
            • This class does not provide efficient lookup by key

          But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.

          Leaving performance aside for a second, NamedList could simply hold an internal Map<String,List<Object>> to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the <name,value> pairs in any order though, i.e. getVal(i). I don't know how important is this functionality though .. i.e. if we replaced it with a namesIterator(), would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me – how important is getName/Val(i)?

          Now back to performance, for a sec, in order to not always allocate a List<Object> when NamedList is used to hold only one value per parameter, we can either:

          • Use Collections.singletonList() on first add, and change to a concrete List on the second add only.
          • Use an Object[], it's less expensive than a List object.
          • Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.

          BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway.

          Maybe we should break this down into 3 issues:

          • Get rid of SimpleOrderedMap – if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
          • Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
          • Simplify NamedList internal implementation. On the performance side – how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.
          Show
          Shai Erera added a comment - - edited Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated: Its <T> generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of <name,value> pairs, where name is always String, and the type of value may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like getInt(), getString()... . I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on <T> at all. So I'd rather we remove that generic from the API signature. There is what seems to be a totally redundant SimpleOrderedMap class, which has contradicting documentation in its class-jdocs: a NamedList where access by key is more important than maintaining order This class does not provide efficient lookup by key But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away. Leaving performance aside for a second, NamedList could simply hold an internal Map<String,List<Object>> to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the <name,value> pairs in any order though, i.e. getVal(i) . I don't know how important is this functionality though .. i.e. if we replaced it with a namesIterator() , would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me – how important is getName/Val(i) ? Now back to performance, for a sec, in order to not always allocate a List<Object> when NamedList is used to hold only one value per parameter, we can either: Use Collections.singletonList() on first add , and change to a concrete List on the second add only. Use an Object[] , it's less expensive than a List object. Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate. BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway. Maybe we should break this down into 3 issues: Get rid of SimpleOrderedMap – if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly. Remove the generics from NamedList's API. We can add sugar getters for specific types if we want. Simplify NamedList internal implementation. On the performance side – how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.
          Hide
          Shai Erera added a comment -

          One other thing, how important is it to be able to store null names? I haven't dug deep through the code – do we actually use it? This doesn't prevent us from using a Map internally, as we can use our own key, something like "$$NULL_STRING!!" (or pick some other constant) and map the null-name requests to this key.

          Show
          Shai Erera added a comment - One other thing, how important is it to be able to store null names? I haven't dug deep through the code – do we actually use it? This doesn't prevent us from using a Map internally, as we can use our own key, something like "$$NULL_STRING!!" (or pick some other constant) and map the null-name requests to this key.
          Hide
          Shawn Heisey added a comment -

          I built a patch that eliminated SimpleOrderedMap, but it's huge and intrusive. Based on how often SimpleOrderedMap is used and the number of times that a SimpleOrderedMap object was actually named "nl" ... I am guessing that people assumed they could rely on the note in the NamedList javadoc about SimpleOrderedMap, or perhaps the notes in the javadoc for SimpleOrderedMap itself (which contradicts what it says in NamedList).

          If somebody knows how to actually implement what SimpleOrderedMap promises, we might see a slight increase in Solr's performance, and the patch will be far less intrusive.

          Show
          Shawn Heisey added a comment - I built a patch that eliminated SimpleOrderedMap, but it's huge and intrusive. Based on how often SimpleOrderedMap is used and the number of times that a SimpleOrderedMap object was actually named "nl" ... I am guessing that people assumed they could rely on the note in the NamedList javadoc about SimpleOrderedMap, or perhaps the notes in the javadoc for SimpleOrderedMap itself (which contradicts what it says in NamedList). If somebody knows how to actually implement what SimpleOrderedMap promises, we might see a slight increase in Solr's performance, and the patch will be far less intrusive.
          Hide
          Shai Erera added a comment -

          Hi Shawn. See SOLR-6315 which I opened to do exactly that and ran into issues with removing SimpleOrderedMap. I've attached a patch there with my progress thus far. I think we should collaborate there - it may not be possible to remove the class entirely, unless we add something to NamedList (e.g. preferMapOutput or something), but I'm sure we can improve the current situation (even if only javadocs).

          Show
          Shai Erera added a comment - Hi Shawn. See SOLR-6315 which I opened to do exactly that and ran into issues with removing SimpleOrderedMap. I've attached a patch there with my progress thus far. I think we should collaborate there - it may not be possible to remove the class entirely, unless we add something to NamedList (e.g. preferMapOutput or something), but I'm sure we can improve the current situation (even if only javadocs).
          Hide
          Shalin Shekhar Mangar added a comment -

          Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.

          +1

          Show
          Shalin Shekhar Mangar added a comment - Remove the generics from NamedList's API. We can add sugar getters for specific types if we want. +1
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          32d 10h 49m 1 Karthik K 15/Jan/09 15:38
          Resolved Resolved Closed Closed
          13s 1 Karthik K 15/Jan/09 15:38
          Closed Closed Reopened Reopened
          35d 4h 1 Hoss Man 19/Feb/09 19:38

            People

            • Assignee:
              Unassigned
              Reporter:
              Karthik K
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development