Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In Searchable.java, the javadoc for maxdoc() is:

      /** Expert: Returns one greater than the largest possible document number.

      • Called by search code to compute term weights.
      • @see org.apache.lucene.index.IndexReader#maxDoc()

      The qualification "expert" and the statement "called by search code to compute term weights" is a bit confusing, It implies that maxdoc() somehow computes weights, which is obviously not true (what it does is explained in the other sentence). Maybe it is used as one factor of the weight, but do we really need to mention this here?

        Activity

        Hide
        Mark Miller added a comment -

        Yeah - could be more clear.

        I agree that there doesn't appear to be a need to mark it as expert. And I guess we should just remove the part that mentions its involved in the weight? Or re word.

        "This is used as an approximation of the number of documents in the index for scoring calculations."

        Or something to that effect (namely, better). Perhaps approximation isnt even needed -

        "Because un-merged deleted docs still contribute to index statistics, this is used as the number of documents in the index for scoring calculations."

        or ...

        Show
        Mark Miller added a comment - Yeah - could be more clear. I agree that there doesn't appear to be a need to mark it as expert. And I guess we should just remove the part that mentions its involved in the weight? Or re word. "This is used as an approximation of the number of documents in the index for scoring calculations." Or something to that effect (namely, better). Perhaps approximation isnt even needed - "Because un-merged deleted docs still contribute to index statistics, this is used as the number of documents in the index for scoring calculations." or ...
        Hide
        Marvin Humphrey added a comment -

        maxDoc() isn't just used for calculating weights. It's also used for e.g.
        figuring out how big your bit vector needs to be in order to accommodate the
        largest doc in the collection.

        My vote would be to just strip that extra comment about calculating term
        weights.

        Show
        Marvin Humphrey added a comment - maxDoc() isn't just used for calculating weights. It's also used for e.g. figuring out how big your bit vector needs to be in order to accommodate the largest doc in the collection. My vote would be to just strip that extra comment about calculating term weights.
        Hide
        Mark Miller added a comment -

        My vote would be to just strip that extra comment about calculating term weights.

        +1 - unless someone else comments, I'm just going to do that.

        Show
        Mark Miller added a comment - My vote would be to just strip that extra comment about calculating term weights. +1 - unless someone else comments, I'm just going to do that.
        Hide
        Doron Cohen added a comment -

        hi.., I think the 'expert' is here because it reveals internal information which users should not rely on unless understanding exactly what they are doing with it. - well, at least as internal as are the unstable docids - on the other hand, IndexReader.maxDoc() is not marked 'expert', so perhaps this one also should not be marked 'expert'.

        Similarity.idfExplain(Term,Searcher) explains nicely why maxDoc() is used rather than numDocs():

           * Note that {@link Searcher#maxDoc()} is used instead of
           * {@link org.apache.lucene.index.IndexReader#numDocs()} because it is
           * proportional to {@link Searcher#docFreq(Term)} , i.e., when one is
           * inaccurate, so is the other, and in the same direction.
        
        Show
        Doron Cohen added a comment - hi.., I think the 'expert' is here because it reveals internal information which users should not rely on unless understanding exactly what they are doing with it. - well, at least as internal as are the unstable docids - on the other hand, IndexReader.maxDoc() is not marked 'expert', so perhaps this one also should not be marked 'expert'. Similarity.idfExplain(Term,Searcher) explains nicely why maxDoc() is used rather than numDocs(): * Note that {@link Searcher#maxDoc()} is used instead of * {@link org.apache.lucene.index.IndexReader#numDocs()} because it is * proportional to {@link Searcher#docFreq(Term)} , i.e., when one is * inaccurate, so is the other, and in the same direction.
        Hide
        Mark Miller added a comment -

        Thanks Doron - I was going to err on leaving the expert and just stripping:

        "Called by search code to compute term weights."

        Sounds like your not opposed to that?

        Show
        Mark Miller added a comment - Thanks Doron - I was going to err on leaving the expert and just stripping: "Called by search code to compute term weights." Sounds like your not opposed to that?
        Hide
        Doron Cohen added a comment -

        stripping: "Called by search code to compute term weights."
        Sounds like your not opposed to that?

        Yes, I agree, go ahead...

        While looking at this though, how about other "Expert" and "called by" javadoc comments in this class? - like the one in doc(int i) - I am not sure what's so expert about it..? Also there are 3 more 'called by' javadoc comments in that class, are they really needed?

        Show
        Doron Cohen added a comment - stripping: "Called by search code to compute term weights." Sounds like your not opposed to that? Yes, I agree, go ahead... While looking at this though, how about other "Expert" and "called by" javadoc comments in this class? - like the one in doc(int i) - I am not sure what's so expert about it..? Also there are 3 more 'called by' javadoc comments in that class, are they really needed?
        Hide
        Mark Miller added a comment -

        Good point - we should remove that expert and the called by's - one of them references a deprecated class (HitCollector), so it def needs to be changed in either case.

        I'll make a quick patch.

        Show
        Mark Miller added a comment - Good point - we should remove that expert and the called by's - one of them references a deprecated class (HitCollector), so it def needs to be changed in either case. I'll make a quick patch.
        Hide
        Doron Cohen added a comment -

        Mark, thanks for removing the 'called by's

        After applying the patch there are still a few 'expert' statements:

        • maxDoc() - I think you wanted to remove this one?
        • docFreq() - it is not marked expert in IndexReader... should it be marked so here?
        • docFeqs() - not sure here

        There are 3 more 'expert' sttmnts which seem okay to me.

        Show
        Doron Cohen added a comment - Mark, thanks for removing the 'called by's After applying the patch there are still a few 'expert' statements: maxDoc() - I think you wanted to remove this one? docFreq() - it is not marked expert in IndexReader... should it be marked so here? docFeqs() - not sure here There are 3 more 'expert' sttmnts which seem okay to me.
        Hide
        Mark Miller added a comment -

        I figured I'd leave em -

        but I do agree that it makes more sense to pull the expert from them. Will add to the patch.

        If it doesn't belong on docFreq, it doesn't belong on docFreqs (which was just added to reduce chatter over RMI it appears by the comment)

        Show
        Mark Miller added a comment - I figured I'd leave em - but I do agree that it makes more sense to pull the expert from them. Will add to the patch. If it doesn't belong on docFreq, it doesn't belong on docFreqs (which was just added to reduce chatter over RMI it appears by the comment)
        Hide
        Marvin Humphrey added a comment -

        IMO, maxDoc(), docFreq(), and docFreqs() are all expert, because they all
        require an understanding of the deletions mechanism to grasp their behavior.

        I'd vote for adding the "expert" tag to IndexReader.maxDoc() before stripping
        it from those.

        Show
        Marvin Humphrey added a comment - IMO, maxDoc(), docFreq(), and docFreqs() are all expert, because they all require an understanding of the deletions mechanism to grasp their behavior. I'd vote for adding the "expert" tag to IndexReader.maxDoc() before stripping it from those.
        Hide
        Mark Miller added a comment - - edited

        Good point. I've jumped sides.

        I'd vote for adding the "expert" tag to IndexReader.maxDoc() before stripping it from those.

        +1

        *edit

        Actually - as I look over IndexReader, I think its just that the context changes - what is expert in Searchable is not necessarily expert in IndexReader - moving to that level already has more advanced implications, and current labeling of expert is slanted to whats more difficult in reference of IndexReader - its all expert compared to use Searchable.

        I think that argues for just leaving as is (the current patch).

        Show
        Mark Miller added a comment - - edited Good point. I've jumped sides. I'd vote for adding the "expert" tag to IndexReader.maxDoc() before stripping it from those. +1 *edit Actually - as I look over IndexReader, I think its just that the context changes - what is expert in Searchable is not necessarily expert in IndexReader - moving to that level already has more advanced implications, and current labeling of expert is slanted to whats more difficult in reference of IndexReader - its all expert compared to use Searchable. I think that argues for just leaving as is (the current patch).
        Hide
        Doron Cohen added a comment -

        what is expert in Searchable is not necessarily expert in IndexReader - moving to that level already has more advanced implications

        I agree (even though IndexReader is not marked 'expert').

        I think that argues for just leaving as is (the current patch).

        +1

        Show
        Doron Cohen added a comment - what is expert in Searchable is not necessarily expert in IndexReader - moving to that level already has more advanced implications I agree (even though IndexReader is not marked 'expert'). I think that argues for just leaving as is (the current patch). +1
        Hide
        Mark Miller added a comment -

        thanks all

        Show
        Mark Miller added a comment - thanks all

          People

          • Assignee:
            Mark Miller
            Reporter:
            Nadav Har'El
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development