Lucene - Core
  1. Lucene - Core
  2. LUCENE-3109

Rename FieldsConsumer to InvertedFieldsConsumer

    Details

    • Type: Task Task
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/codecs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The name FieldsConsumer is missleading here it really is an InvertedFieldsConsumer and since we are extending codecs to consume non-inverted Fields we should be clear here. Same applies to Fields.java as well as FieldsProducer.

      1. LUCENE-3109.patch
        263 kB
        Iulius Curt
      2. LUCENE-3109.patch
        126 kB
        Iulius Curt
      3. LUCENE-3109.patch
        123 kB
        Michael McCandless
      4. LUCENE-3109.patch
        263 kB
        Iulius Curt
      5. LUCENE-3109.patch
        134 kB
        Iulius Curt
      6. LUCENE-3109.patch
        35 kB
        Iulius Curt

        Activity

        Uwe Schindler made changes -
        Fix Version/s 4.9 [ 12326730 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.8 [ 12326269 ]
        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.
        David Smiley made changes -
        Fix Version/s 4.8 [ 12326269 ]
        Fix Version/s 4.7 [ 12325572 ]
        Simon Willnauer made changes -
        Fix Version/s 4.7 [ 12325572 ]
        Fix Version/s 4.6 [ 12324999 ]
        Adrien Grand made changes -
        Fix Version/s 4.6 [ 12324999 ]
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.5 [ 12324742 ]
        Steve Rowe made changes -
        Fix Version/s 5.0 [ 12321663 ]
        Fix Version/s 4.5 [ 12324742 ]
        Fix Version/s 4.4 [ 12324323 ]
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Uwe Schindler made changes -
        Fix Version/s 4.4 [ 12324323 ]
        Fix Version/s 4.3 [ 12324143 ]
        Robert Muir made changes -
        Fix Version/s 4.3 [ 12324143 ]
        Fix Version/s 4.2 [ 12323899 ]
        Steve Rowe made changes -
        Fix Version/s 4.2 [ 12323899 ]
        Fix Version/s 4.1 [ 12321140 ]
        Hide
        Mark Miller added a comment -

        4.1 or push to 4.2?

        Show
        Mark Miller added a comment - 4.1 or push to 4.2?
        Robert Muir made changes -
        Fix Version/s 4.1 [ 12321140 ]
        Fix Version/s 4.0 [ 12322550 ]
        Robert Muir made changes -
        Fix Version/s 4.0 [ 12322550 ]
        Fix Version/s 4.0-BETA [ 12322456 ]
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hoss Man made changes -
        Fix Version/s 4.0 [ 12322456 ]
        Fix Version/s 4.0-ALPHA [ 12314025 ]
        Hide
        Hoss Man added a comment -

        bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment

        Show
        Hoss Man added a comment - bulk cleanup of 4.0-ALPHA / 4.0 Jira versioning. all bulk edited issues have hoss20120711-bulk-40-change in a comment
        Hide
        Robert Muir added a comment -

        Also just another idea to throw out there:

        • AtomicReader.termDocsEnum -> AtomicReader.termDocs
        • AtomicReader.termPositionsEnum -> AtomicReader.termPositions
        • TermsEnum.docs -> TermsEnum.termDocs
        • TermsEnum.docsAndPositions -> TermsEnum.termPositions

        This terminology would be more consistent with all previous
        lucene APIs and seems like an easy win?

        Show
        Robert Muir added a comment - Also just another idea to throw out there: AtomicReader.termDocsEnum -> AtomicReader.termDocs AtomicReader.termPositionsEnum -> AtomicReader.termPositions TermsEnum.docs -> TermsEnum.termDocs TermsEnum.docsAndPositions -> TermsEnum.termPositions This terminology would be more consistent with all previous lucene APIs and seems like an easy win?
        Hide
        Robert Muir added a comment -

        Also I think there are other improvements we can do here that would be more natural:

        • Fields.getUniqueFieldCount() -> Fields.size()
        • Terms.getUniqueTermCount() -> Terms.size()

        In general the names of the other statistics could probably use some help.
        When i gave a lucene talk on the new stats i had to add "wtf are these things in english"
        beside the name of each new stat. I can't suggest good java names at this time,
        but these are the english names:

        • Terms.getDocCount() -> "number of documents with value"
        • Terms.getSumDocFreq() -> "number of postings"
        • Terms.getSumTotalTermFreq() -> "number of tokens"

        TermsEnum.totalTermFreq is probably ok, but maybe it was named that way
        only to be consistent with docFreq? Really something like "number of occurrences"
        is what most people would expect here.

        Show
        Robert Muir added a comment - Also I think there are other improvements we can do here that would be more natural: Fields.getUniqueFieldCount() -> Fields.size() Terms.getUniqueTermCount() -> Terms.size() In general the names of the other statistics could probably use some help. When i gave a lucene talk on the new stats i had to add "wtf are these things in english" beside the name of each new stat. I can't suggest good java names at this time, but these are the english names: Terms.getDocCount() -> "number of documents with value" Terms.getSumDocFreq() -> "number of postings" Terms.getSumTotalTermFreq() -> "number of tokens" TermsEnum.totalTermFreq is probably ok, but maybe it was named that way only to be consistent with docFreq? Really something like "number of occurrences" is what most people would expect here.
        Hide
        Robert Muir added a comment -

        Thanks, when looking at naming of apis that users will interact with,
        I think we should go with the simplest possible naming thats easiest to consume.

        For example things like "Term", "Query", "Document", etc. I think this kind of
        naming helps to keep the API consumable: as far as more expert stuff inside codec,
        thats sort of a different story (though we shouldnt just name it whatever, i think
        we don't have to be nearly as picky about names).

        For the core APIs that hook into IndexReader and IndexWriter, and for things in
        the o.a.l.document package, and things like that, I think we should be shooting
        for these super-simplistic names that have worked for lucene all along.

        To me, taking an IndexReader and enumerating Fields->FieldsEnum->Terms->TermsEnum...
        is pretty clear and makes sense.

        Fields and Terms being plural makes sense to me, but one improvement to think of is
        removing the confusing plural ending of these enum classes (FieldEnum, TermEnum).
        It seems this only have existed to not conflict with the pre-flex API before
        (for backwards compatibility). I think that would be an easy improvement to
        those enum classes..., for consistency maybe do the same with DocsEnum, or even
        think of a new name for that one entirely, I'm not sure.

        Show
        Robert Muir added a comment - Thanks, when looking at naming of apis that users will interact with, I think we should go with the simplest possible naming thats easiest to consume. For example things like "Term", "Query", "Document", etc. I think this kind of naming helps to keep the API consumable: as far as more expert stuff inside codec, thats sort of a different story (though we shouldnt just name it whatever, i think we don't have to be nearly as picky about names). For the core APIs that hook into IndexReader and IndexWriter, and for things in the o.a.l.document package, and things like that, I think we should be shooting for these super-simplistic names that have worked for lucene all along. To me, taking an IndexReader and enumerating Fields->FieldsEnum->Terms->TermsEnum... is pretty clear and makes sense. Fields and Terms being plural makes sense to me, but one improvement to think of is removing the confusing plural ending of these enum classes (FieldEnum, TermEnum). It seems this only have existed to not conflict with the pre-flex API before (for backwards compatibility). I think that would be an easy improvement to those enum classes..., for consistency maybe do the same with DocsEnum, or even think of a new name for that one entirely, I'm not sure.
        Michael McCandless committed 1310998 (93 files)
        Reviews: none

        LUCENE-3109: revert

        Lucene trunk
        Hide
        Michael McCandless added a comment -

        OK I'll revert so we can discuss more...

        Show
        Michael McCandless added a comment - OK I'll revert so we can discuss more...
        Hide
        Robert Muir added a comment -

        I don't understand the reasoning to add "Inverted" to all these apis.

        its damaging when we had a perfectly good single-syllable "Fields" before.
        Now we make a harder-to-consume multi-syllable API, for what reason?
        What other kind of Field is there?!

        But, again I'm not gonna spark a huge argument/discussion about this.
        I'm just asking for a revert.

        Show
        Robert Muir added a comment - I don't understand the reasoning to add "Inverted" to all these apis. its damaging when we had a perfectly good single-syllable "Fields" before. Now we make a harder-to-consume multi-syllable API, for what reason? What other kind of Field is there?! But, again I'm not gonna spark a huge argument/discussion about this. I'm just asking for a revert.
        Hide
        Robert Muir added a comment -

        Can we please revert the renaming of Fields to InvertedFields?

        The title of this issue made me think it only affects low-level codec apis but now
        we are talking about a massive renaming of postings apis that, in my opinion,
        goes in the wrong direction, and in the least requires more discussion.

        Show
        Robert Muir added a comment - Can we please revert the renaming of Fields to InvertedFields? The title of this issue made me think it only affects low-level codec apis but now we are talking about a massive renaming of postings apis that, in my opinion, goes in the wrong direction, and in the least requires more discussion.
        Iulius Curt made changes -
        Attachment LUCENE-3109.patch [ 12521886 ]
        Hide
        Iulius Curt added a comment -

        Should we change AtomicReader to have invertedField() instead fields()?

        I worked out all the fields() methods that returned InvertedFields.
        Also MIGRATE.txt

        Also the name FieldsEnum is now inconsistent.

        This is not included in the patch because I have some difficulty deciding whether or not also rename all the classes derived from FieldsEnum (like FilterAtomicReader.FilterFieldsEnum)

        Also, should MultiFields and MultiFieldsEnum get renamed?

        Show
        Iulius Curt added a comment - Should we change AtomicReader to have invertedField() instead fields()? I worked out all the fields() methods that returned InvertedFields . Also MIGRATE.txt Also the name FieldsEnum is now inconsistent. This is not included in the patch because I have some difficulty deciding whether or not also rename all the classes derived from FieldsEnum (like FilterAtomicReader.FilterFieldsEnum ) Also, should MultiFields and MultiFieldsEnum get renamed?
        Hide
        Uwe Schindler added a comment -

        I can do it, too (not now). It's 5 minutes work with Eclipse...

        Show
        Uwe Schindler added a comment - I can do it, too (not now). It's 5 minutes work with Eclipse...
        Hide
        Michael McCandless added a comment -

        We need to change CHANGES.txt and MIGRATE.txt to the new API, it's now heavily outdated.

        Thanks Uwe, you're right, my bad.

        Should we change AtomicReader to have invertedField() instead fields()?

        +1

        Also the name FieldsEnum is now inconsistent.

        I think it should be InvertedFieldsEnum?

        Iulius do you want to make these changes? Or I can... let me know.

        Show
        Michael McCandless added a comment - We need to change CHANGES.txt and MIGRATE.txt to the new API, it's now heavily outdated. Thanks Uwe, you're right, my bad. Should we change AtomicReader to have invertedField() instead fields()? +1 Also the name FieldsEnum is now inconsistent. I think it should be InvertedFieldsEnum? Iulius do you want to make these changes? Or I can... let me know.
        Uwe Schindler made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Uwe Schindler added a comment -

        Documentation fixes needed.

        Show
        Uwe Schindler added a comment - Documentation fixes needed.
        Hide
        Uwe Schindler added a comment -

        We also changed public APIs (Fields -> InvertedFields). We need to change CHANGES.txt and MIGRATE.txt to the new API, it's now heavily outdated.

        Should we change AtomicReader to have invertedField() instead fields()? Also the name FieldsEnum is now inconsistent.

        Show
        Uwe Schindler added a comment - We also changed public APIs (Fields -> InvertedFields). We need to change CHANGES.txt and MIGRATE.txt to the new API, it's now heavily outdated. Should we change AtomicReader to have invertedField() instead fields()? Also the name FieldsEnum is now inconsistent.
        Michael McCandless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Michael McCandless [ mikemccand ]
        Resolution Fixed [ 1 ]
        Hide
        Michael McCandless added a comment -

        Thanks Iulius!

        Show
        Michael McCandless added a comment - Thanks Iulius!
        Michael McCandless committed 1310969 (93 files)
        Reviews: none

        LUCENE-3109: rename Fields/Producer/Consumer to InvertedFields*

        Lucene trunk
        Hide
        Michael McCandless added a comment -

        Thanks Iulius, looks great! I'll commit...

        Show
        Michael McCandless added a comment - Thanks Iulius, looks great! I'll commit...
        Iulius Curt made changes -
        Attachment LUCENE-3109.patch [ 12521870 ]
        Hide
        Iulius Curt added a comment -

        In general I prefer seeing each import (not the wildcard)... can you redo patch putting them back? Thanks!

        Totally agree on that. I should blame IDEA for this one, should I?

        Thanks for all the patience.

        Show
        Iulius Curt added a comment - In general I prefer seeing each import (not the wildcard)... can you redo patch putting them back? Thanks! Totally agree on that. I should blame IDEA for this one, should I? Thanks for all the patience.
        Hide
        Michael McCandless added a comment -

        Hmm, one thing: I noticed the imports got changed into wildcards, eg:

        +import org.apache.lucene.index.*;
         import org.apache.lucene.util.LuceneTestCase;
         import org.apache.lucene.document.Document;
         import org.apache.lucene.document.TextField;
        -import org.apache.lucene.index.RandomIndexWriter;
        -import org.apache.lucene.index.TermsEnum;
        -import org.apache.lucene.index.IndexReader;
        -import org.apache.lucene.index.Term;
        -import org.apache.lucene.index.MultiFields;
        +import org.apache.lucene.index.MultiInvertedFields;
        

        In general I prefer seeing each import (not the wildcard)... can you redo patch putting them back? Thanks!

        (I'm assuming/hoping this is a simple setting in your IDE?).

        Show
        Michael McCandless added a comment - Hmm, one thing: I noticed the imports got changed into wildcards, eg: +import org.apache.lucene.index.*; import org.apache.lucene.util.LuceneTestCase; import org.apache.lucene.document.Document; import org.apache.lucene.document.TextField; -import org.apache.lucene.index.RandomIndexWriter; -import org.apache.lucene.index.TermsEnum; -import org.apache.lucene.index.IndexReader; -import org.apache.lucene.index.Term; -import org.apache.lucene.index.MultiFields; +import org.apache.lucene.index.MultiInvertedFields; In general I prefer seeing each import (not the wildcard)... can you redo patch putting them back? Thanks! (I'm assuming/hoping this is a simple setting in your IDE?).
        Hide
        Michael McCandless added a comment -

        Thanks for the fast turnaround Iulius!

        Did you use "svn mv" to rename the sources? (I'm guessing not – I don't see the removed original sources).

        But it's fine: I got this to apply quite easily. Thanks! I'll commit shortly...

        Show
        Michael McCandless added a comment - Thanks for the fast turnaround Iulius! Did you use "svn mv" to rename the sources? (I'm guessing not – I don't see the removed original sources). But it's fine: I got this to apply quite easily. Thanks! I'll commit shortly...
        Iulius Curt made changes -
        Attachment LUCENE-3109.patch [ 12521834 ]
        Hide
        Iulius Curt added a comment -

        It was cleaner to redo it from scratch. Hope didn't miss anything this time.

        It built fine and tests got passed.

        Show
        Iulius Curt added a comment - It was cleaner to redo it from scratch. Hope didn't miss anything this time. It built fine and tests got passed.
        Michael McCandless made changes -
        Attachment LUCENE-3109.patch [ 12521824 ]
        Hide
        Michael McCandless added a comment -

        Hi Iulius,

        Here's my current patch – it doesn't compile because of the missing renamed sources but possibly from other things (eg if I messed up any of the merging). But hopefully it's close Thanks!

        Show
        Michael McCandless added a comment - Hi Iulius, Here's my current patch – it doesn't compile because of the missing renamed sources but possibly from other things (eg if I messed up any of the merging). But hopefully it's close Thanks!
        Hide
        Iulius Curt added a comment -

        Good question. It seems I didn't add the renamed sources.

        Could you please upload the patch with the shifted-code-conflicts you mentioned about solved?
        This would be wonderful, I could much easily redo only the renamed sources.

        Thanks for your feedback.

        Show
        Iulius Curt added a comment - Good question. It seems I didn't add the renamed sources. Could you please upload the patch with the shifted-code-conflicts you mentioned about solved? This would be wonderful, I could much easily redo only the renamed sources. Thanks for your feedback.
        Hide
        Michael McCandless added a comment -

        Hi Iulius, this patch is great: this rename is badly needed...

        I was able to apply the patch (resolving a few conflicts since the code has shifted since it was created), but... some things seem to be missing (eg InvertedFieldsProducer rename). How did you generate the patch?

        Show
        Michael McCandless added a comment - Hi Iulius, this patch is great: this rename is badly needed... I was able to apply the patch (resolving a few conflicts since the code has shifted since it was created), but... some things seem to be missing (eg InvertedFieldsProducer rename). How did you generate the patch?
        Iulius Curt made changes -
        Attachment LUCENE-3109.patch [ 12514899 ]
        Hide
        Iulius Curt added a comment -

        Repaired a foolish mistake.
        Also limited to the classes sepcified in the ticket.

        Show
        Iulius Curt added a comment - Repaired a foolish mistake. Also limited to the classes sepcified in the ticket.
        Iulius Curt made changes -
        Field Original Value New Value
        Attachment LUCENE-3109.patch [ 12514850 ]
        Hide
        Iulius Curt added a comment -

        Attached a patch with the refactoring of Fields, FieldsProducer, FieldsConsumer and any other related classes.
        It turned out to be pretty ample (also affected Solr)

        Please give some feedback if something is wrong.

        Show
        Iulius Curt added a comment - Attached a patch with the refactoring of Fields, FieldsProducer, FieldsConsumer and any other related classes. It turned out to be pretty ample (also affected Solr) Please give some feedback if something is wrong.
        Hide
        Simon Willnauer added a comment -

        Is this still valid? (It looks like a good place for me to enter the community)

        I think so there should also be an InvertedFieldsProducer

        Show
        Simon Willnauer added a comment - Is this still valid? (It looks like a good place for me to enter the community) I think so there should also be an InvertedFieldsProducer
        Hide
        Iulius Curt added a comment -

        Is this still valid? (It looks like a good place for me to enter the community)

        Should also the *FieldsReader/Writer classes that derive FieldsProducer/Consumer become *InvertedFieldsReader/Writer?

        Show
        Iulius Curt added a comment - Is this still valid? (It looks like a good place for me to enter the community) Should also the *FieldsReader/Writer classes that derive FieldsProducer/Consumer become *InvertedFieldsReader/Writer?
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Simon Willnauer created issue -

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development