Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated:
- Its <T> generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of <name,value> pairs, where name is always String, and the type of value may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like getInt(), getString().... I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on <T> at all. So I'd rather we remove that generic from the API signature.
- There is what seems to be a totally redundant SimpleOrderedMap class, which has contradicting documentation in its class-jdocs:
- a NamedList where access by key is more important than maintaining order
- This class does not provide efficient lookup by key
But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away.
Leaving performance aside for a second, NamedList could simply hold an internal Map<String,List<Object>> to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the <name,value> pairs in any order though, i.e. getVal(i). I don't know how important is this functionality though .. i.e. if we replaced it with a namesIterator(), would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me – how important is getName/Val(i)?
Now back to performance, for a sec, in order to not always allocate a List<Object> when NamedList is used to hold only one value per parameter, we can either:
- Use Collections.singletonList() on first add, and change to a concrete List on the second add only.
- Use an Object, it's less expensive than a List object.
- Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate.
BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object array (instead of List). But I think we should remove the generic anyway.
Maybe we should break this down into 3 issues:
- Get rid of SimpleOrderedMap – if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly.
- Remove the generics from NamedList's API. We can add sugar getters for specific types if we want.
- Simplify NamedList internal implementation. On the performance side – how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request.