Lucene - Core
  1. Lucene - Core
  2. LUCENE-3074

SimpleTextCodec needs SimpleText DocValues impl

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index, core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      currently SimpleTextCodec uses binary docValues we should move that to a simple text impl.

      1. LUCENE-3074.patch
        87 kB
        Simon Willnauer
      2. LUCENE-3074.patch
        87 kB
        Simon Willnauer

        Activity

        Hide
        Simon Willnauer added a comment -

        fix once on trunk

        Show
        Simon Willnauer added a comment - fix once on trunk
        Hide
        Robert Muir added a comment -

        I'm gonna take a stab at this, because we really want to have multiple implementations to ensure the abstract api is good,
        and doesnt have any assumptions about the current implementation.

        Any issues would be harder to debug if our first alternative implementation was something like crazy 3.x norms versus an easy
        plain text impl.

        ill commit LUCENE-3622 to a branch and start playing around.

        Show
        Robert Muir added a comment - I'm gonna take a stab at this, because we really want to have multiple implementations to ensure the abstract api is good, and doesnt have any assumptions about the current implementation. Any issues would be harder to debug if our first alternative implementation was something like crazy 3.x norms versus an easy plain text impl. ill commit LUCENE-3622 to a branch and start playing around.
        Hide
        Simon Willnauer added a comment -

        FYI I am working on this. Robert you are not on it right now, right?

        Show
        Simon Willnauer added a comment - FYI I am working on this. Robert you are not on it right now, right?
        Hide
        Robert Muir added a comment -

        No... i got caught up in 3622 trying to factor out the lucene40 implementation better.

        But a lot of those issues are now fixed (e.g. we have a default merge implementation).

        I just marked 3622 closed since I think the goals of that issue are now resolved.

        Show
        Robert Muir added a comment - No... i got caught up in 3622 trying to factor out the lucene40 implementation better. But a lot of those issues are now fixed (e.g. we have a default merge implementation). I just marked 3622 closed since I think the goals of that issue are now resolved.
        Hide
        Simon Willnauer added a comment -

        here is a first patch adding SimpleTextDV and replacing SimpleTextNorms with it directly.
        I had to change some upstream classes and especially the merging done in the DocValuesConsumer which used the "wrong" type for merging. In general we should use the target type instead of the source type and sources need to implement getBytes and do auto conversion otherwise type promotion doesn't work.

        this patch writes individual files per field like sep codec which made things a lot easier and is maybe better suited for SimpleText

        comments welcome

        Show
        Simon Willnauer added a comment - here is a first patch adding SimpleTextDV and replacing SimpleTextNorms with it directly. I had to change some upstream classes and especially the merging done in the DocValuesConsumer which used the "wrong" type for merging. In general we should use the target type instead of the source type and sources need to implement getBytes and do auto conversion otherwise type promotion doesn't work. this patch writes individual files per field like sep codec which made things a lot easier and is maybe better suited for SimpleText comments welcome
        Hide
        Robert Muir added a comment -

        first thoughts: looks nice! Thanks for working on this!

        I will try to take a look later, I noticed a few imports from lucene40 codec into simpletext (which i think we should avoid),
        but I think these were just javadocs relics!

        Show
        Robert Muir added a comment - first thoughts: looks nice! Thanks for working on this! I will try to take a look later, I noticed a few imports from lucene40 codec into simpletext (which i think we should avoid), but I think these were just javadocs relics!
        Hide
        Simon Willnauer added a comment -

        any comments on this? I don't want this to go out of date too much

        Show
        Simon Willnauer added a comment - any comments on this? I don't want this to go out of date too much
        Hide
        Simon Willnauer added a comment -

        merged with trunk...
        SimpleTextPerDocProducer still uses 2 lucene40 classes DocValuesArray & DocValuesReaderBase.
        I think DocValues array is generally useful and should go to o.a.l.codec and we can rename DocValuesReaderBase to PerDocProducerBase and move to o.a.l.codec too. Thoughts?

        Show
        Simon Willnauer added a comment - merged with trunk... SimpleTextPerDocProducer still uses 2 lucene40 classes DocValuesArray & DocValuesReaderBase. I think DocValues array is generally useful and should go to o.a.l.codec and we can rename DocValuesReaderBase to PerDocProducerBase and move to o.a.l.codec too. Thoughts?
        Hide
        Simon Willnauer added a comment -

        committed to trunk in revision 1297920

        Show
        Simon Willnauer added a comment - committed to trunk in revision 1297920
        Hide
        Robert Muir added a comment -

        Thanks Simon!

        Show
        Robert Muir added a comment - Thanks Simon!

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development