Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm working on improving Numeric* javadocs.

      1. LUCENE-1872-cardinality.patch
        2 kB
        Uwe Schindler
      2. LUCENE-1872-uwe.patch
        9 kB
        Uwe Schindler
      3. LUCENE-1872.patch
        26 kB
        Michael McCandless
      4. LUCENE-1872.patch
        26 kB
        Michael McCandless
      5. LUCENE-1872.patch
        21 kB
        Michael McCandless

        Activity

        Hide
        Uwe Schindler added a comment -

        Committed revision: 815195

        Show
        Uwe Schindler added a comment - Committed revision: 815195
        Hide
        Michael McCandless added a comment -

        Looks good Uwe!

        Show
        Michael McCandless added a comment - Looks good Uwe!
        Hide
        Uwe Schindler added a comment -

        Some notes about cardinality and precision step.

        Show
        Uwe Schindler added a comment - Some notes about cardinality and precision step.
        Hide
        Uwe Schindler added a comment -

        Some small updates after discussion on java-user.

        Show
        Uwe Schindler added a comment - Some small updates after discussion on java-user.
        Hide
        Michael McCandless added a comment -

        Good morning!

        Show
        Michael McCandless added a comment - Good morning!
        Hide
        Uwe Schindler added a comment -

        Oh I thought you were still sleeping... Good morning!

        Show
        Uwe Schindler added a comment - Oh I thought you were still sleeping... Good morning!
        Hide
        Michael McCandless added a comment -

        The new changes look good – thanks Uwe!

        Show
        Michael McCandless added a comment - The new changes look good – thanks Uwe!
        Hide
        Uwe Schindler added a comment -

        Hi Mike, I made some small improvements in formatting and also added a relation between precisionStep and "brackets" which one would not understand (what is the relation between terms bracket and precisionStep). Also the term bracket does not appear anywhere else. So I added, that the larger brackets are simply lower-precision representations of the original value. I also added a link to NumericUtils which get lost, that describes the format (in the advanced section of NumericField).

        I committed this, revision: 809284

        Show
        Uwe Schindler added a comment - Hi Mike, I made some small improvements in formatting and also added a relation between precisionStep and "brackets" which one would not understand (what is the relation between terms bracket and precisionStep). Also the term bracket does not appear anywhere else. So I added, that the larger brackets are simply lower-precision representations of the original value. I also added a link to NumericUtils which get lost, that describes the format (in the advanced section of NumericField). I committed this, revision: 809284
        Hide
        Michael McCandless added a comment -

        OK thanks Uwe!

        Show
        Michael McCandless added a comment - OK thanks Uwe!
        Hide
        Uwe Schindler added a comment -

        I think this is fine now! Thanks.

        Show
        Uwe Schindler added a comment - I think this is fine now! Thanks.
        Hide
        Michael McCandless added a comment -

        State that precSteps is "bits", and trie is the entire data structure.

        Show
        Michael McCandless added a comment - State that precSteps is "bits", and trie is the entire data structure.
        Hide
        Michael McCandless added a comment -

        "Trie" is the whole "prefix tree".

        OK – I tweaked NumericField to fix that. I'll fix LIA2's manuscript too... thanks!

        Show
        Michael McCandless added a comment - "Trie" is the whole "prefix tree". OK – I tweaked NumericField to fix that. I'll fix LIA2's manuscript too... thanks!
        Hide
        Uwe Schindler added a comment - - edited

        One small thing (I found this also in the LIA chapter):
        "Trie" is the whole "prefix tree". So "trie" means the whole list of different precision terms for one numeric value. You have this explanation which suggests one trie is one term:

        + * <p>Within lucene, each numeric value is indexed as
        + * multiple encoded terms representing larger and larger
        + * pre-defined brackets called <em>trie</em>s.  The step
        + * size between each successive trie is called the
        + * <code>precisionStep</code> in this API.  Smaller
        

        There should also be the explanation that the precStep is in "bits".

        Show
        Uwe Schindler added a comment - - edited One small thing (I found this also in the LIA chapter): "Trie" is the whole "prefix tree". So "trie" means the whole list of different precision terms for one numeric value. You have this explanation which suggests one trie is one term: + * <p>Within lucene, each numeric value is indexed as + * multiple encoded terms representing larger and larger + * pre-defined brackets called <em>trie</em>s. The step + * size between each successive trie is called the + * <code>precisionStep</code> in this API. Smaller There should also be the explanation that the precStep is in "bits".
        Hide
        Uwe Schindler added a comment -

        As far as I can tell, I didn't lose any of the experimental warnings - Uwe where did you see that?

        My fault. I did not see that you moved the warning to the top in NumericRangeQuery.

        Looks good.

        Show
        Uwe Schindler added a comment - As far as I can tell, I didn't lose any of the experimental warnings - Uwe where did you see that? My fault. I did not see that you moved the warning to the top in NumericRangeQuery. Looks good.
        Hide
        Michael McCandless added a comment -

        New rev.

        As far as I can tell, I didn't lose any of the experimental warnings – Uwe where did you see that?

        I worked on NumericTokenStream's javadocs too.

        Finally, I removed reference to DateTools entirely and suggested either quantizing by dividing getTime()'s returned result, or, using Date's getters (getYear, getMonth, etc.) to construct a numeric value.

        Show
        Michael McCandless added a comment - New rev. As far as I can tell, I didn't lose any of the experimental warnings – Uwe where did you see that? I worked on NumericTokenStream's javadocs too. Finally, I removed reference to DateTools entirely and suggested either quantizing by dividing getTime()'s returned result, or, using Date's getters (getYear, getMonth, etc.) to construct a numeric value.
        Hide
        Michael McCandless added a comment -

        Will we mark it as experimental or not?

        Yes, I think we should keep the experimental warning. I didn't mean to remove any – I'll make sure they're all still there.

        By the way, I also reviewd the LIA chapter and have also seen the reference to DateTools (will send you my updates soon)

        Excellent, thanks!

        ate to a String with a special date resolution and then convertin back to numeric is somehow ineffective. When indexing only hour resolution, I would take Date.getTime() / 3600L / 1000L.

        True, and that'll work for DAY as well, but not for MONTH / YEAR resolution.

        Maybe you could copy some of your explanations also to NumericTokenStream, e.g. the example. In the original, both classes had similar introduction texts.

        OK I'll update NumericTokenStream too.

        Show
        Michael McCandless added a comment - Will we mark it as experimental or not? Yes, I think we should keep the experimental warning. I didn't mean to remove any – I'll make sure they're all still there. By the way, I also reviewd the LIA chapter and have also seen the reference to DateTools (will send you my updates soon) Excellent, thanks! ate to a String with a special date resolution and then convertin back to numeric is somehow ineffective. When indexing only hour resolution, I would take Date.getTime() / 3600L / 1000L. True, and that'll work for DAY as well, but not for MONTH / YEAR resolution. Maybe you could copy some of your explanations also to NumericTokenStream, e.g. the example. In the original, both classes had similar introduction texts. OK I'll update NumericTokenStream too.
        Hide
        Uwe Schindler added a comment -

        Maybe you could copy some of your explanations also to NumericTokenStream, e.g. the example. In the original, both classes had similar introduction texts.

        Show
        Uwe Schindler added a comment - Maybe you could copy some of your explanations also to NumericTokenStream, e.g. the example. In the original, both classes had similar introduction texts.
        Hide
        Uwe Schindler added a comment -

        Will we mark it as experimental or not? In some classes you remove the experimental warning, in others not. I will review it more detailed later!

        By the way, I also reviewd the LIA chapter and have also seen the reference to DateTools (will send you my updates soon). ate to a String with a special date resolution and then convertin back to numeric is somehow ineffective. When indexing only hour resolution, I would take Date.getTime() / 3600L / 1000L.

        Show
        Uwe Schindler added a comment - Will we mark it as experimental or not? In some classes you remove the experimental warning, in others not. I will review it more detailed later! By the way, I also reviewd the LIA chapter and have also seen the reference to DateTools (will send you my updates soon). ate to a String with a special date resolution and then convertin back to numeric is somehow ineffective. When indexing only hour resolution, I would take Date.getTime() / 3600L / 1000L.
        Hide
        Michael McCandless added a comment -

        Only javadoc fixes, mostly for Numeric* but also some small fixes to a few other classes.

        Show
        Michael McCandless added a comment - Only javadoc fixes, mostly for Numeric* but also some small fixes to a few other classes.
        Hide
        Michael McCandless added a comment -

        Thank you at all for looking after the documentation! A native speaker is always better than my "frankish" English (as my colleague call it).

        No problem! This is how open source works It's great.

        Show
        Michael McCandless added a comment - Thank you at all for looking after the documentation! A native speaker is always better than my "frankish" English (as my colleague call it). No problem! This is how open source works It's great.
        Hide
        Uwe Schindler added a comment -

        Thank you at all for looking after the documentation! A native speaker is always better than my "frankish" English (as my colleague call it).

        Show
        Uwe Schindler added a comment - Thank you at all for looking after the documentation! A native speaker is always better than my "frankish" English (as my colleague call it).
        Hide
        Michael McCandless added a comment -

        Because of the uninverter.

        Ahh right!

        I think it's best to document that the effect of sorting is undefined/not guaranteed, but filtering/searching by range is defined.

        Show
        Michael McCandless added a comment - Because of the uninverter. Ahh right! I think it's best to document that the effect of sorting is undefined/not guaranteed, but filtering/searching by range is defined.
        Hide
        Uwe Schindler added a comment -

        I'm confused on why it's the "largest" value? EG if I add 4, 17, 10 as a NumericField "x" on my doc, when I then try to sort, wouldn't 10 "win" since it was added last to the document?

        Because of the uninverter. It iterates over all terms, starting from the lowest one in the field. When he comes to term 4, it would add it to the field cache at the document's position. The same with 10 and then 17. Because the TermEnum lists 17 at last, it would win:

        long[] retArray = null;
        TermDocs termDocs = reader.termDocs();
        TermEnum termEnum = reader.terms (new Term(field));
        try {
          do {
            Term term = termEnum.term();
            if (term==null || term.field() != field) break;
            long termval = parser.parseLong(term.text());
            if (retArray == null) // late init
              retArray = new long[reader.maxDoc()];
            termDocs.seek (termEnum);
            while (termDocs.next()) {
              retArray[termDocs.doc()] = termval;
            }
          } while (termEnum.next());
        } catch (StopFillCacheException stop) {
        } finally {
          termDocs.close();
          termEnum.close();
        }
        if (retArray == null) // no values
          retArray = new long[reader.maxDoc()];
        
        Show
        Uwe Schindler added a comment - I'm confused on why it's the "largest" value? EG if I add 4, 17, 10 as a NumericField "x" on my doc, when I then try to sort, wouldn't 10 "win" since it was added last to the document? Because of the uninverter. It iterates over all terms, starting from the lowest one in the field. When he comes to term 4, it would add it to the field cache at the document's position. The same with 10 and then 17. Because the TermEnum lists 17 at last, it would win: long [] retArray = null ; TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms ( new Term(field)); try { do { Term term = termEnum.term(); if (term== null || term.field() != field) break ; long termval = parser.parseLong(term.text()); if (retArray == null ) // late init retArray = new long [reader.maxDoc()]; termDocs.seek (termEnum); while (termDocs.next()) { retArray[termDocs.doc()] = termval; } } while (termEnum.next()); } catch (StopFillCacheException stop) { } finally { termDocs.close(); termEnum.close(); } if (retArray == null ) // no values retArray = new long [reader.maxDoc()];
        Hide
        Michael McCandless added a comment -

        About the second one: Sorting is not different from other numeric multi-value fields (in contrast to StringIndex where only one value/doc is allowed). The univerter assigns a value to each document. As it overrides values previously assigned to the document when iterating over all terms in the field, the last term would be the one saved in the FieldCache. So you would sort against the largest value per doc (because the smaller values are overridden by the larger ones in the field cache).

        Ahh right.

        I'm confused on why it's the "largest" value? EG if I add 4, 17, 10 as a NumericField "x" on my doc, when I then try to sort, wouldn't 10 "win" since it was added last to the document?

        Show
        Michael McCandless added a comment - About the second one: Sorting is not different from other numeric multi-value fields (in contrast to StringIndex where only one value/doc is allowed). The univerter assigns a value to each document. As it overrides values previously assigned to the document when iterating over all terms in the field, the last term would be the one saved in the FieldCache. So you would sort against the largest value per doc (because the smaller values are overridden by the larger ones in the field cache). Ahh right. I'm confused on why it's the "largest" value? EG if I add 4, 17, 10 as a NumericField "x" on my doc, when I then try to sort, wouldn't 10 "win" since it was added last to the document?
        Hide
        Uwe Schindler added a comment - - edited

        This is correct (first part), you can add a field more than once and it really works (there is a test about that: TestMultiValuedNumericRangeQuery). The results are the same like with a normal range query (so the trie terms do not interfere between each other). See SOLR-1322

        I think your sentence is a little bit complicated. It would hit all documents in the range, where at least one of the values of a multi-valued doc is in the range.

        About the second one: Sorting is not different from other numeric multi-value fields (in contrast to StringIndex where only one value/doc is allowed). The univerter assigns a value to each document. As it overrides values previously assigned to the document when iterating over all terms in the field, the last term would be the one saved in the FieldCache. So you would sort against the largest value per doc (because the smaller values are overridden by the larger ones in the field cache).

        Show
        Uwe Schindler added a comment - - edited This is correct (first part), you can add a field more than once and it really works (there is a test about that: TestMultiValuedNumericRangeQuery). The results are the same like with a normal range query (so the trie terms do not interfere between each other). See SOLR-1322 I think your sentence is a little bit complicated. It would hit all documents in the range, where at least one of the values of a multi-valued doc is in the range. About the second one: Sorting is not different from other numeric multi-value fields (in contrast to StringIndex where only one value/doc is allowed). The univerter assigns a value to each document. As it overrides values previously assigned to the document when iterating over all terms in the field, the last term would be the one saved in the FieldCache. So you would sort against the largest value per doc (because the smaller values are overridden by the larger ones in the field cache).
        Hide
        Michael McCandless added a comment -

        Can someone confirm that this is technically accurate (I've added it to the javadoc for NumericField):

        You may add the same field name as a NumericField to the same document more than once. Range querying and filtering will be the logical OR of all values, and sorting will sort according to the first value added.

        Show
        Michael McCandless added a comment - Can someone confirm that this is technically accurate (I've added it to the javadoc for NumericField): You may add the same field name as a NumericField to the same document more than once. Range querying and filtering will be the logical OR of all values, and sorting will sort according to the first value added.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development