currently SimpleTextCodec uses binary docValues we should move that to a simple text impl.
fix once on trunk
I'm gonna take a stab at this, because we really want to have multiple implementations to ensure the abstract api is good,
and doesnt have any assumptions about the current implementation.
Any issues would be harder to debug if our first alternative implementation was something like crazy 3.x norms versus an easy
plain text impl.
ill commit LUCENE-3622 to a branch and start playing around.
FYI I am working on this. Robert you are not on it right now, right?
No... i got caught up in 3622 trying to factor out the lucene40 implementation better.
But a lot of those issues are now fixed (e.g. we have a default merge implementation).
I just marked 3622 closed since I think the goals of that issue are now resolved.
here is a first patch adding SimpleTextDV and replacing SimpleTextNorms with it directly.
I had to change some upstream classes and especially the merging done in the DocValuesConsumer which used the "wrong" type for merging. In general we should use the target type instead of the source type and sources need to implement getBytes and do auto conversion otherwise type promotion doesn't work.
this patch writes individual files per field like sep codec which made things a lot easier and is maybe better suited for SimpleText
first thoughts: looks nice! Thanks for working on this!
I will try to take a look later, I noticed a few imports from lucene40 codec into simpletext (which i think we should avoid),
but I think these were just javadocs relics!
any comments on this? I don't want this to go out of date too much
merged with trunk...
SimpleTextPerDocProducer still uses 2 lucene40 classes DocValuesArray & DocValuesReaderBase.
I think DocValues array is generally useful and should go to o.a.l.codec and we can rename DocValuesReaderBase to PerDocProducerBase and move to o.a.l.codec too. Thoughts?
committed to trunk in revision 1297920