Solr
  1. Solr
  2. SOLR-912

org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None
    • Environment:

      Tomcat 6, JRE 6, Solr 1.3+ nightlies

      Description

      The implementation of NamedList - while being fast - is not necessarily type-safe. I have implemented an additional implementation of the same - ModernNamedList (a type-safe variation providing the same interface as NamedList) - while preserving the semantics in terms of ordering of elements and allowing null elements for key and values (keys are always Strings , while values correspond to generics ).

      1. SOLR-912.patch
        18 kB
        Karthik K
      2. NLProfile.java
        7 kB
        Karthik K
      3. SOLR-912.patch
        14 kB
        Karthik K

        Issue Links

          Activity

          Hide
          Karthik K added a comment -

          1) New Type-safe implementation of NamedList , ModernNamedList .

          2) New interface INamedList<T> created

          3) New Test case - ModernNamedList added.

          4) Added more test cases to NamedListTest .

          Once the patch is approved - NamedList would be deprecated and the existing codebase in Solr would be replaced to use ModernNamedList<T> to be more type-safe.

          Show
          Karthik K added a comment - 1) New Type-safe implementation of NamedList , ModernNamedList . 2) New interface INamedList<T> created 3) New Test case - ModernNamedList added. 4) Added more test cases to NamedListTest . Once the patch is approved - NamedList would be deprecated and the existing codebase in Solr would be replaced to use ModernNamedList<T> to be more type-safe.
          Hide
          Noble Paul added a comment -

          Type safety is not an overriding concern when a NamedList is used (that is the beauty of it). It does not help in any way. Most of the usages of NamedList involves heterogeneous values .

          your implementation is not as efficient (memory usage) as the original one

          The idea of having an interface is an overkill

          -1

          Show
          Noble Paul added a comment - Type safety is not an overriding concern when a NamedList is used (that is the beauty of it). It does not help in any way. Most of the usages of NamedList involves heterogeneous values . your implementation is not as efficient (memory usage) as the original one The idea of having an interface is an overkill -1
          Hide
          Karthik K added a comment -

          The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

          If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type.

          There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.

          Show
          Karthik K added a comment - The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ). If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type. We could probably define it as NamedList , with T replaced to be an object internally. not making it a generic type. There seems to be no memory leaks as far as the container is concerned. Creating an iterator object for every call to an iterator seems to be quite a bit of data redundancy issues when ideally we can use the iterator of one of the underlying objects as well.
          Hide
          Noble Paul added a comment -

          The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ).

          This means we will need to use this interface wherever we use NamedList which is not desirable

          NamedList is designed to achieve a specific purpose

          Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these.

          If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type

          It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not.

          There seems to be no memory leaks as far as the container is concerned

          There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) .

          Creating an iterator object for every call to an iterator

          iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API

          Show
          Noble Paul added a comment - The interface is present only to enable the migration from NamedList (legacy) to the new one. (with similar properties of Cloneable, Serializable etc. ). This means we will need to use this interface wherever we use NamedList which is not desirable NamedList is designed to achieve a specific purpose Solr is not meant for java users only. It is also meant to be consumed over xml/json etc .There is no type safety in these. NamedList helps us to have a datastructure which can easily be converted back and forth from these. If type-safety is not a concern - is there a reason why NamedList<T> is defined as a generic type It helps where it makes sense . But where it is not necessary I can totally omit that and javac does not complain. So , it is better to keep it generic than not. There seems to be no memory leaks as far as the container is concerned There are no leaks. I was referring to the internal implementation . instead of one big array you keep a list of objects (means more objects, one per entry) . Creating an iterator object for every call to an iterator iterating over NamedList does not need to use an iterator. That is a choice left to the consumer of the API
          Hide
          Karthik K added a comment -

          The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same.

          As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off.

          Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today.

          NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.

          Show
          Karthik K added a comment - The interface is mostly used as a proof of concept that the new class - ModernNamedList implements the same methods as that of NamedList. Eventually - the interface would be gotten rid of , once we are happy with the same. As far as NamedList , I believe if we want to have the flexibility of allowing any type in it - we might as well define it to an Object. If we do qualify it as a specific type - then we might as well implement type-safety in the class. javac does not complain today because the compiler switch to indicate type-safety errors has been turned off. Previous implementation used to be having a pre-jdk5 List , with members being String and a type depending on the index. The revised implementation has Map.Entry<String, T> interface - which is directly intuitive to what is required ( a Map with order being preserved , allowing duplicates and nulls ). I did profile with 2 different implementations , involving Map.Entry<?> and a heterogenous list with String and a type (with insertion / deletion of 100,000 records). The current implementation in fact , failed in the performance comparison in both insertion / deletion in the middle of the List ( remove () ) , since we have to add/remove elements twice from the List (as in the current impl) , as compared to 1 insertion/deletion in the Map.Entry<> implementation. ) Given that addition/deletion in the List is worst-case linear - I believe the perceived performance degradation due to additional object , turns out to be not so bad when compared to 2-path insertion / deletion as we have today. NamedList does seem to implement the interface Iterable<T> . I am not sure how the consumer of the API can have independent iterators (since only NamedList is supposed to be aware of the internal data structures and not the consumer). So I believe it would be upto NamedList<T> to provide an iterator to the user of the API.
          Hide
          Hoss Man added a comment -

          If i'm understanding the discussion so far...

          • ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
          • INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

          do i have those aspects correct?

          with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for.

          I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API?

          If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove).

          My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.

          Show
          Hoss Man added a comment - If i'm understanding the discussion so far... ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared. INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc) do i have those aspects correct? with that in mind: i'm not sure i understand what "itch" changing the implementation "scratches" ... the initial issue description says it's because NamedList " is not necessarily type-safe" but it's not clear what that statement is referring to ... later comments suggest that the motivation is to improve the performance of "remove" ... which hardly seems like something worth optimizing for. I agree that having the internals based on a "list of pairs" certainly seems like it might be more intuitive to developers looking at the internals (then the current approach is), but how is the current approach less type safe for consumers using just the NamedList API? If the "modern" approach is more performant then the existing impl and passes all of the tests then i suppose it would make sense to switch – but i'm far more interested in how the performance compares for common cases (add/get/iterate) then for cases that hardly ever come up (remove). My suggestion: provide two independent attachments. One patch that just replaces the internals of NamedList with the approach you suggest so people can apply the patch, test it out, and verify the API/behavior; A second attachment that provides some benchmarks against the NmaedList class – so people can read/run your benchmark with and with out the patch to see how the performance changes.
          Hide
          Karthik K added a comment -
          ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared.
          INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc)

          True.

          Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page).

          Some results:

          addAll / getAll(): increase in performance is almost [1-10]% range.

          add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced.

          iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure.

          The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() .

          I will attach the final patch once we are convinced with the benchmark methodology and the numbers.

          Show
          Karthik K added a comment - ModernNamedList is being suggested as an alternate implementation of NamedList ... ideally the internals of NamedLIst would be replaced with the internals of ModernNamedList, but in this patch they are seperate classes so they can be compared. INamedList is included in the patch as a way to demonstrate that ModernNamedList fulfills the same contract as NamedList (for the purposes of testing etc) True. Attached herewith is: NLProfile.java - that contains sample benchmarking against the 2 implementations (will work with the previous page on the page). Some results: addAll / getAll(): increase in performance is almost [1-10] % range. add: increase in performance by around 30% , probably because of the additional growth in the List implementation when size approaches capacity. And since, in NamedList - we insert 2 elements as opposed to one, ( as done in ModernNamedList) - it might be more pronounced. iterator: ~70% increase in performance in favor of the new implementation since it just reuses the iterator for the internal data structure. The numbers should be present as comments in the corresponding methods - testAdd() , testAddAll(), testGetAll(), testIterator() . I will attach the final patch once we are convinced with the benchmark methodology and the numbers.
          Hide
          Karthik K added a comment -

          a sample benchmarking program that works with the previous patch submitted.

          Show
          Karthik K added a comment - a sample benchmarking program that works with the previous patch submitted.
          Hide
          Karthik K added a comment -

          Additional Info: JRE 6, Linux 2.6.27-9 , 3.2GB Memory, Dual-core Intel @ 2.53 Ghz.

          Show
          Karthik K added a comment - Additional Info: JRE 6, Linux 2.6.27-9 , 3.2GB Memory, Dual-core Intel @ 2.53 Ghz.
          Hide
          Yonik Seeley added a comment -

          While we're going down the micro-benchmarking path, I tried eliminating ArrayList and got an additional 15-25% gain on common operations (create new, add between 5 and 15 elements, and then iterate over those elements later). This was with Java 1.6. -Xbatch improved the results even more... ~40% - but this is just a micro-benchmark.

          class NamedList2<T> implements INamedList<T> {
          
            protected NamedListEntry<T>[] nvPairs;
            protected int size;
          
            public NamedList2() {
              nvPairs = new NamedListEntry[10];
              size = 0;
            }
          
            @Override
            public int size() {
              return size;
            }
          
            @Override
            public String getName(int idx) {
              if (idx >= size) throw new ArrayIndexOutOfBoundsException();
              return nvPairs[idx].key;
            }
          
            @Override
            public T getVal(int idx) {
              if (idx >= size) throw new ArrayIndexOutOfBoundsException();    
              return nvPairs[idx].value;
            }
          
            private void resize() {
              NamedListEntry<T>[] arr = new NamedListEntry[nvPairs.length << 1];
              System.arraycopy(nvPairs, 0, arr, 0, size);
              nvPairs = arr;
            }
          
            @Override
            public void add(String name, T val) {
              if (size >= nvPairs.length) {
                resize();
              }
              nvPairs[size++] = new NamedListEntry<T>(name, val);
            }
          
          [...]
          
          Show
          Yonik Seeley added a comment - While we're going down the micro-benchmarking path, I tried eliminating ArrayList and got an additional 15-25% gain on common operations (create new, add between 5 and 15 elements, and then iterate over those elements later). This was with Java 1.6. -Xbatch improved the results even more... ~40% - but this is just a micro-benchmark. class NamedList2<T> implements INamedList<T> { protected NamedListEntry<T>[] nvPairs; protected int size; public NamedList2() { nvPairs = new NamedListEntry[10]; size = 0; } @Override public int size() { return size; } @Override public String getName( int idx) { if (idx >= size) throw new ArrayIndexOutOfBoundsException(); return nvPairs[idx].key; } @Override public T getVal( int idx) { if (idx >= size) throw new ArrayIndexOutOfBoundsException(); return nvPairs[idx].value; } private void resize() { NamedListEntry<T>[] arr = new NamedListEntry[nvPairs.length << 1]; System .arraycopy(nvPairs, 0, arr, 0, size); nvPairs = arr; } @Override public void add( String name, T val) { if (size >= nvPairs.length) { resize(); } nvPairs[size++] = new NamedListEntry<T>(name, val); } [...]
          Hide
          Karthik K added a comment -

          System.arrayCopy is great. It is bound to perform much better because of the native code for the same.

          Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that.

          The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5)

          int newCapacity = (oldCapacity * 3)/2 + 1;

          +1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).

          Show
          Karthik K added a comment - System.arrayCopy is great. It is bound to perform much better because of the native code for the same. Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that. The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5) int newCapacity = (oldCapacity * 3)/2 + 1; +1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!).
          Hide
          Karthik K added a comment -

          Introduce another ctor. called Type(Object [] ) to distinguish them from List<Map.Entry<String, T > > and List of objects.

          Change the invocation in DebugComponent . Highlight Component etc.

          Show
          Karthik K added a comment - Introduce another ctor. called Type(Object [] ) to distinguish them from List<Map.Entry<String, T > > and List of objects. Change the invocation in DebugComponent . Highlight Component etc.
          Hide
          Hoss Man added a comment -

          If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?

          We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.

          Show
          Hoss Man added a comment - If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety? We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.
          Hide
          Karthik K added a comment -
          We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch.

          <code>

          • public NamedList(List nameValuePairs) {
          • nvPairs=nameValuePairs;
            + protected NamedList(List<Map.Entry<String, T>> nameValuePairs) { + nvPairs = nameValuePairs; - }

            </code>

          As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality.

          Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion.

          Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here.
          If backward compatibility is the key here- I can revisit the patch again ensuring the same.

          If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety?

          When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .

          Show
          Karthik K added a comment - We also need to make sure we don't eliminate any public constructors, which seems to be the case based on my quick glance at the latest patch. <code> public NamedList(List nameValuePairs) { nvPairs=nameValuePairs; + protected NamedList(List<Map.Entry<String, T>> nameValuePairs) { + nvPairs = nameValuePairs; - } </code> As part of ensuring type-safety , the previous code had a heterogenous List ctor. as before. I changed the access level and added another public ctor. ( Object [] ) with deprecated tag to it so that people are still able to use the functionality. Otherwise - retaining the same signature after type safety would imply - people passing in a List of String-s and T-s , when the List expects Map.Entry<String , T > and would cause more confusion. Thanks to the erasure of generics , List and List<Map.Entry<String, T>> are all equal , not helping here. If backward compatibility is the key here- I can revisit the patch again ensuring the same. If there are performance gains to be had in the common case i'm all for it ... but i still feel like i'm not understanding the original goal: how does this approach give us more type safety? When I logged the issue - type-safety was the major reason behind the same. When I submitted by first patch and did the benchmarking - performance was also found to be a major constraint , (with incremental addition and creation of iterator objects). NamedList seemed to be used all over the place. As long as we preserve the contract of the methods - this should definitely give an additional boost - since I discovered as part of profiling of the launch of SolrCore ( CoreContainer.Initializer.initalize() .. ) .
          Hide
          Hoss Man added a comment -

          Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?)

          If backward compatibility is the key here- I can revisit the patch again ensuring the same.

          There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people.

          unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity.

          As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs...

             * @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList.
             */
            public NamedList(List nameValuePairs) {
          

          ...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer)

          I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.

          Show
          Hoss Man added a comment - Ok, Now I think I understand your type safety goal – the existing implementation is type safe if-and-only-if the precondition of the "List" constructor is met (that it contains pairwise names/values) ... your goal is to make NamedList garuntee type correctness. (correct?) If backward compatibility is the key here- I can revisit the patch again ensuring the same. There are a lot of internal APIs where I wouldn't be opposed to fudging backwards compatibility in the interests of better code, but NamedList isn't one of them. it's the datastructure that gets used by almost any type of plugin people may write – any request handler or search component that wants to add data to the response is going to be constructing a new NamedList, so I'm definitely not on board breaking things for all of those people. unfortunately, i think your goal is in direct and inherent opposition to backwards compatiblity. As you mentioned, type erasure prevents us from adding a new "List<Entry<String, T>>" constructor while keeping the existing "List" constructor – but that's not the biggest problem. We could always use tricks (like adding an extra ignored arg to the new constructor, or making the new constructor take in an "Entry<String, T>[]" instead of a List) to maintain binary API compatibility, and then have the legacy constructor cast the List elements as needed to delegate to the new constructor .... EXCEPT .... binary API compatibilty is only part of backwards compatibility. The bigger problem is this sentence in the javadocs... * @param nameValuePairs underlying List which should be used to implement a NamedList; modifying this List will affect the NamedList. */ public NamedList(List nameValuePairs) { ...it would be nice if that was just an implementation detail, and the javadocs said "... may affect the NamedList", but it says "... will affect the NamedList" Changing the internals (and that constructor) would change the behavior our from under people who have an existing expectation that they can maintain a refrence to the List and modify it to affect the NamedList. Unfortunately this isn't an academic point: I've actually seen this utilized in plugin code. People build up datastructures containing NamedLists, and then data is added to the underlying Lists backing those NamedLists after the fact (but before the NamedList is iterated by a response writer) I just don't see any way to feasibly achieve your type-safe constructor goal while maintaining back-compat.
          Hide
          Karthik K added a comment -

          Agreed. This patch is too late in the game w.r.t type safety to change the underlying behavior at this point.

          Show
          Karthik K added a comment - Agreed. This patch is too late in the game w.r.t type safety to change the underlying behavior at this point.
          Hide
          Hoss Man added a comment -

          Kay: It occurs to me that one thing we could do is invert the nature of your patch:

          1. add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor
          2. deprecate the existing List constructor.

          Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements)

          thoughts?

          Show
          Hoss Man added a comment - Kay: It occurs to me that one thing we could do is invert the nature of your patch: add an "Entry<String, T>[]" constructor (with a warning, but no commitment, that modifying the array conents may affect the NamedList) which builds up a pairwise List<Object> and delegates to the existing "List" constructor deprecate the existing List constructor. Anyone using the new constructor will get the type safety benefits (not entirely enforced by the compiler, but enforced by the contract) and at some later date (Solr 2?) we can remove the "List" constructor and replace the guts of the class with your approach (to get the perf improvements) thoughts?
          Hide
          Karthik K added a comment -

          Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .

          Show
          Karthik K added a comment - Logged a separate issue - SOLR-967 for the temporary fix and tracking progress on the same. This issue changes the internals of the class that would be revisited after adopting a proper migration path , per SOLR-967 .
          Hide
          Hoss Man added a comment -

          reopening for the future

          Show
          Hoss Man added a comment - reopening for the future
          Hide
          Karthik K added a comment -

          So - what is the current status of this.

          At what time are we planning to deprecate the old implementation and incorporate this new d.s ( changing the guts of the internals ).

          Show
          Karthik K added a comment - So - what is the current status of this. At what time are we planning to deprecate the old implementation and incorporate this new d.s ( changing the guts of the internals ).
          Hide
          Noble Paul added a comment -

          Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.

          Creating an Entry Object per entry is memory inefficient compared to the existing one.

          type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

          Show
          Noble Paul added a comment - Type safety is generally not required in NamedList. very often we use heterogeneous NamedList. Creating an Entry Object per entry is memory inefficient compared to the existing one. type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel
          Hide
          Karthik K added a comment -

          Type safety is generally not required in NamedList. very often we use heterogeneous NamedList.

          Creating an Entry Object per entry is memory inefficient compared to the existing one.

          type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel

          The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.

          Show
          Karthik K added a comment - Type safety is generally not required in NamedList. very often we use heterogeneous NamedList. Creating an Entry Object per entry is memory inefficient compared to the existing one. type safety is there for the users of NamedList API even now. Internally how we manage it is not so important i feel The performance numbers in here say a different story. The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. As regarding Entry Objects being a memory hog - do we have some stats to back it up. Otherwise it is premature to call that a memory optimization.
          Hide
          Chris A. Mattmann added a comment -

          The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here.

          +1 to this. NamedList is not a very intuitive structure at all, which I remarked on SOLR-1516.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - The heterogenous NamedList data structure is not intuitive w.r.t code and performs poorly compared to the revised one as put in here. +1 to this. NamedList is not a very intuitive structure at all, which I remarked on SOLR-1516 . Cheers, Chris
          Hide
          Noble Paul added a comment - - edited

          The performance numbers in here say a different story

          I'm not referring to perf numbers here . It is memory efficiency.

          As regarding Entry Objects being a memory hog - do we have some stats to back it up.

          We don't need stats for everything. we should know about how VM holds objects .

          Let me illustrate with a case of consider 5 key->values on a 32 bit m/c

          NamedList(Backed by arraylist)
          one Object [] + array size= 4 + 5 * 2*4 (bytes) = 44 bytes + the overhead of ArrayList

          ModernNamedList

          one Object[] + 5 entry objects (16 bytes object overhead + each has 2 references of 4+4 bytes)+ array size () = 4 + 16*5+ 5*2*4 + 5*4 = 144 bytes+ the overhead of ArrayList

          Add to this the overhead of GC'ing 5 entry objects

          reference : http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

          Show
          Noble Paul added a comment - - edited The performance numbers in here say a different story I'm not referring to perf numbers here . It is memory efficiency. As regarding Entry Objects being a memory hog - do we have some stats to back it up. We don't need stats for everything. we should know about how VM holds objects . Let me illustrate with a case of consider 5 key->values on a 32 bit m/c NamedList(Backed by arraylist) one Object [] + array size= 4 + 5 * 2*4 (bytes) = 44 bytes + the overhead of ArrayList ModernNamedList one Object[] + 5 entry objects (16 bytes object overhead + each has 2 references of 4+4 bytes)+ array size () = 4 + 16*5+ 5*2*4 + 5*4 = 144 bytes+ the overhead of ArrayList Add to this the overhead of GC'ing 5 entry objects reference : http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

            People

            • Assignee:
              Unassigned
              Reporter:
              Karthik K
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development