Lucene - Core
  1. Lucene - Core
  2. LUCENE-3309

Add narrow API for loading stored fields, to replace FieldSelector

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Field Type branch
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think we should "invert" the FieldSelector API, with a "push" API
      whereby FieldsReader invokes this API once per field in the document
      being visited.

      Implementations of the API can then do arbitrary things like save away
      the field's size, load the field, clone the IndexInput for later lazy
      loading, etc.

      This very thin API would be a mirror image of the very thin index time
      API we now have (IndexableField) and, importantly, it would have no
      dependence on our "user space" Document/Field/FieldType impl, so apps
      are free to do something totally custom.

      After we have this, we should build the "sugar" API that rebuilds a
      Document instance (ie IR.document(int docID)) on top of this new thin
      API. This'll also be a good test that the API is sufficient.

      Relevant discussions from IRC this morning at
      http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-07-13#l76

      1. LUCENE-3309.patch
        177 kB
        Michael McCandless
      2. LUCENE-3309.patch
        140 kB
        Michael McCandless
      3. LUCENE-3309.patch
        21 kB
        Michael McCandless

        Activity

        Hide
        Simon Willnauer added a comment -

        Wait, shouldn't the app just not call IR.document() if it wants no fields?

        nevermind I was confused about the null set.

        simon

        Show
        Simon Willnauer added a comment - Wait, shouldn't the app just not call IR.document() if it wants no fields? nevermind I was confused about the null set. simon
        Hide
        Michael McCandless added a comment -

        ah man I am to late

        No you're not! It's just software The iterations go on and on and on...

        I only looked at the StoredFieldVisitor briefly, what I always missed is a String... fieldsToLoad ctor for convenience which simply passes an array that is used to create a set

        You mean DocumentStoredFieldVisitor, right? Ie the default/sugar visitor that makes a Document. I agree we should add a sugar ctor taking String... I'll do that.

        I also don't understand why the set is null in the default ctors, can't we use the empty set here.

        Hmm but the default ctor means all fields are accepted (empty set would mean none).

        If we want to prevent the method call if no fields should be loaded there should be a LoadNoFieldsVisitor I think, no?

        Wait, shouldn't the app just not call IR.document() if it wants no fields?

        Show
        Michael McCandless added a comment - ah man I am to late No you're not! It's just software The iterations go on and on and on... I only looked at the StoredFieldVisitor briefly, what I always missed is a String... fieldsToLoad ctor for convenience which simply passes an array that is used to create a set You mean DocumentStoredFieldVisitor, right? Ie the default/sugar visitor that makes a Document. I agree we should add a sugar ctor taking String... I'll do that. I also don't understand why the set is null in the default ctors, can't we use the empty set here. Hmm but the default ctor means all fields are accepted (empty set would mean none). If we want to prevent the method call if no fields should be loaded there should be a LoadNoFieldsVisitor I think, no? Wait, shouldn't the app just not call IR.document() if it wants no fields?
        Hide
        Simon Willnauer added a comment -

        ah man I am to late

        I only looked at the StoredFieldVisitor briefly, what I always missed is a String... fieldsToLoad ctor for convenience which simply passes an array that is used to create a set. in several environments this would be helpful for example when you get the fields to load from a request or something like that.

        I also don't understand why the set is null in the default ctors, can't we use the empty set here. If we want to prevent the method call if no fields should be loaded there should be a LoadNoFieldsVisitor I think, no?

        Show
        Simon Willnauer added a comment - ah man I am to late I only looked at the StoredFieldVisitor briefly, what I always missed is a String... fieldsToLoad ctor for convenience which simply passes an array that is used to create a set. in several environments this would be helpful for example when you get the fields to load from a request or something like that. I also don't understand why the set is null in the default ctors, can't we use the empty set here. If we want to prevent the method call if no fields should be loaded there should be a LoadNoFieldsVisitor I think, no?
        Hide
        Chris Male added a comment -

        Super, +1 to committing.

        Show
        Chris Male added a comment - Super, +1 to committing.
        Hide
        Michael McCandless added a comment -

        Thanks Chris – great feedback! I agree with all your suggestions...

        New patch:

        • Removed oal.documen2.FieldSelector*
        • Renamed & moved to standalone StoredFieldVisitor.java
        • Made the default/sugar visitor first class, as
          DocumentStoredFieldVisitor.java, also allowing it to optionally
          accept Set<String> fieldsToLoad.
        • Fix javadoc errors I think I caused, but we [separately] have
          pre-existing javadoc errors that need fixing (Nikola: can you
          address these? Thanks!).

        Sorry, my changes have some svn mvs, and my last patch was from "svn
        diff". New patch should be fully applyable... (to field type
        branch).

        Show
        Michael McCandless added a comment - Thanks Chris – great feedback! I agree with all your suggestions... New patch: Removed oal.documen2.FieldSelector* Renamed & moved to standalone StoredFieldVisitor.java Made the default/sugar visitor first class, as DocumentStoredFieldVisitor.java, also allowing it to optionally accept Set<String> fieldsToLoad. Fix javadoc errors I think I caused, but we [separately] have pre-existing javadoc errors that need fixing (Nikola: can you address these? Thanks!). Sorry, my changes have some svn mvs, and my last patch was from "svn diff". New patch should be fully applyable... (to field type branch).
        Hide
        Chris Male added a comment -

        Hey Mike,

        Wow you're fast! I love the ideas in this. Just a few things:

        • Applied your patch and theres something not quite right. It doesn't contain FieldSelectorVisitor and FieldSelector hasn't been moved to contrib/misc (in fact, there are imports of document.FieldSelector when its actually in document2.FieldSelector).
        • Can we rename FieldVisitor to StoredFieldVisitor? FieldVisitor sounds really general and like it visits all fields, not just stored ones.
        • What do you think about popping FieldVisitor and the standard impl out of IndexReader? I'm envisaging situations where people may want to extend the standard impl in some way.
        • I saw a few javadoc errors floating round but they may just be intellij lying to me.
        Show
        Chris Male added a comment - Hey Mike, Wow you're fast! I love the ideas in this. Just a few things: Applied your patch and theres something not quite right. It doesn't contain FieldSelectorVisitor and FieldSelector hasn't been moved to contrib/misc (in fact, there are imports of document.FieldSelector when its actually in document2.FieldSelector). Can we rename FieldVisitor to StoredFieldVisitor? FieldVisitor sounds really general and like it visits all fields, not just stored ones. What do you think about popping FieldVisitor and the standard impl out of IndexReader? I'm envisaging situations where people may want to extend the standard impl in some way. I saw a few javadoc errors floating round but they may just be intellij lying to me.
        Hide
        Michael McCandless added a comment -

        New patch, this time moving *FieldSelector (including LazyField) to
        contrib/misc, and adding a "bridge class" (FieldSelectorVisitor) that
        implements the new FieldVisitor, wrapping a FieldSelector.

        FieldsReader is now very simple – the only API it supports for
        reading a doc is visitDocument (taking FieldsVisitor). It now knows
        nothing about Document/Field (same w/ the IR impls; only IR base class
        knows about Document/Field since it impls the sugar API).

        Tests pass; I think it's ready to commit!

        Show
        Michael McCandless added a comment - New patch, this time moving *FieldSelector (including LazyField) to contrib/misc, and adding a "bridge class" (FieldSelectorVisitor) that implements the new FieldVisitor, wrapping a FieldSelector. FieldsReader is now very simple – the only API it supports for reading a doc is visitDocument (taking FieldsVisitor). It now knows nothing about Document/Field (same w/ the IR impls; only IR base class knows about Document/Field since it impls the sugar API). Tests pass; I think it's ready to commit!
        Hide
        Michael McCandless added a comment -

        Initial patch:

        • Adds IndexReader.FieldVisitor and abstract
          IndexReader.document(int docID, FieldVisitor visitor). IR
          subclasses and FieldsReader implement this.
        • Stop storing tokenized bit in stored fields – the fast vector
          highlighter was actually using this metadata; I change it to
          always insert the separator char (even for not-analyzed fields)
          and changed the keyword analyzer in the tests to set offset gap
          to 1. We need to do this anyway for LUCENE-2309, and since we are
          making an API here we may as well remove it now...
        • Implement IndexReader.document(int) using only the new
          FieldVisitor API.

        All tests pass.

        I think next up is to cutover the various FieldSelector impls to
        FieldVisitor impls, move them to contrib, and then remove
        FieldSelector from core.

        Show
        Michael McCandless added a comment - Initial patch: Adds IndexReader.FieldVisitor and abstract IndexReader.document(int docID, FieldVisitor visitor). IR subclasses and FieldsReader implement this. Stop storing tokenized bit in stored fields – the fast vector highlighter was actually using this metadata; I change it to always insert the separator char (even for not-analyzed fields) and changed the keyword analyzer in the tests to set offset gap to 1. We need to do this anyway for LUCENE-2309 , and since we are making an API here we may as well remove it now... Implement IndexReader.document(int) using only the new FieldVisitor API. All tests pass. I think next up is to cutover the various FieldSelector impls to FieldVisitor impls, move them to contrib, and then remove FieldSelector from core.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development