Lucene - Core
  1. Lucene - Core
  2. LUCENE-4425

Unclear documentation of StoredFieldVisitor.binaryValue

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When reading the binary value of a stored field, a StoredFieldsReader calls StoredFieldVisitor.binaryValue(arr, offset, length).

      Documentation currently doesn't state whether the byte[] can be reused outside of the scope of StoredFieldVisitor.binaryValue but DocumentStoredFieldVisitor assumes (as of r1389812) that it can.

      So DocumentStoredFieldVisitor would break with a custom StoredFieldsFormat that would call StoredFieldVisitor.binaryValue with a slice of a reusable buffer.

      1. LUCENE-4425.patch
        5 kB
        Robert Muir
      2. LUCENE-4425.patch
        5 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Currently the responsibility is on the codecs to always allocate new things in this visitor API.
        This keeps the API simple (like String.toCharArray doing a copy does).

        We should proceed with caution: StoredFieldVisitor is essentially part of our .document API
        (despite its packaging, .index is wrong, sorry) and a parameter passed to IndexReader.document.

        I think making this api more complicated for no real gain would be the wrong tradeoff.

        Show
        Robert Muir added a comment - Currently the responsibility is on the codecs to always allocate new things in this visitor API. This keeps the API simple (like String.toCharArray doing a copy does). We should proceed with caution: StoredFieldVisitor is essentially part of our .document API (despite its packaging, .index is wrong, sorry) and a parameter passed to IndexReader.document. I think making this api more complicated for no real gain would be the wrong tradeoff.
        Hide
        Robert Muir added a comment -

        I would support removing offset and length here in the visitor. I think then it would be more intuitive.

        Show
        Robert Muir added a comment - I would support removing offset and length here in the visitor. I think then it would be more intuitive.
        Hide
        Adrien Grand added a comment -

        I would support removing offset and length here in the visitor. I think then it would be more intuitive.

        +1 If the responsability is on the codec side, then we should remove offset and length.

        Show
        Adrien Grand added a comment - I would support removing offset and length here in the visitor. I think then it would be more intuitive. +1 If the responsability is on the codec side, then we should remove offset and length.
        Hide
        Robert Muir added a comment -

        If we change this API, one way is to do it trunk only, and we add docs for 4.1?

        However StoredFieldsVisitor is an abstract class (not an interface), so maybe
        its not the only option.

        Show
        Robert Muir added a comment - If we change this API, one way is to do it trunk only, and we add docs for 4.1? However StoredFieldsVisitor is an abstract class (not an interface), so maybe its not the only option.
        Hide
        Adrien Grand added a comment -

        However StoredFieldsVisitor is an abstract class (not an interface), so maybe its not the only option.

        If everybody agrees that the copy is on the codec side, then I think we should push it to 4.x given that this API is counter-intuitive and might lead to hard-to-track bugs with custom codecs?

        Show
        Adrien Grand added a comment - However StoredFieldsVisitor is an abstract class (not an interface), so maybe its not the only option. If everybody agrees that the copy is on the codec side, then I think we should push it to 4.x given that this API is counter-intuitive and might lead to hard-to-track bugs with custom codecs?
        Hide
        Robert Muir added a comment -

        I agree Adrien. I guess i immediately jumped to splitting the api change from the bugfix because
        I didn't want one blocked on the other.

        I think we can still do it this way, commit LUCENE-4423 and get the bug fixed (ignore my comment
        about testing). Then we can think about this issue.

        In my opinion the api is buggy

        Show
        Robert Muir added a comment - I agree Adrien. I guess i immediately jumped to splitting the api change from the bugfix because I didn't want one blocked on the other. I think we can still do it this way, commit LUCENE-4423 and get the bug fixed (ignore my comment about testing). Then we can think about this issue. In my opinion the api is buggy
        Hide
        Adrien Grand added a comment -

        commit LUCENE-4423 and get the bug fixed (ignore my comment about testing). Then we can think about this issue.

        I just committed. I'll wait for a couple of Jenkins builds and merge into 4.0.

        Show
        Adrien Grand added a comment - commit LUCENE-4423 and get the bug fixed (ignore my comment about testing). Then we can think about this issue. I just committed. I'll wait for a couple of Jenkins builds and merge into 4.0.
        Hide
        Michael McCandless added a comment -

        +1 to remove offset/length and make it clear that you can hold onto that byte[] (caller will not reuse it).

        Show
        Michael McCandless added a comment - +1 to remove offset/length and make it clear that you can hold onto that byte[] (caller will not reuse it).
        Hide
        Michael McCandless added a comment -

        Maybe we should fix this for 4.0? It really is an API bug...

        Show
        Michael McCandless added a comment - Maybe we should fix this for 4.0? It really is an API bug...
        Hide
        Adrien Grand added a comment -

        Maybe we should fix this for 4.0? It really is an API bug...

        That would be great. Maybe we should first make sure that everyone agrees that it is the codec responsability to make the copy.

        Show
        Adrien Grand added a comment - Maybe we should fix this for 4.0? It really is an API bug... That would be great. Maybe we should first make sure that everyone agrees that it is the codec responsability to make the copy.
        Hide
        Robert Muir added a comment -

        here's a patch (not tested) just to see what it would look like.

        SolrIndexSearcher was ignoring offset and length too, if we dont fix this issue, we should at least fix that.

        Show
        Robert Muir added a comment - here's a patch (not tested) just to see what it would look like. SolrIndexSearcher was ignoring offset and length too, if we dont fix this issue, we should at least fix that.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Adrien Grand added a comment -

        +1 too

        Show
        Adrien Grand added a comment - +1 too
        Hide
        Robert Muir added a comment -

        I do think we should fix the API bug for 4.0, lets just give it another day or so.

        I'll commit the fix to SolrIndexSearcher now though, i dont want that bug either way.

        Show
        Robert Muir added a comment - I do think we should fix the API bug for 4.0, lets just give it another day or so. I'll commit the fix to SolrIndexSearcher now though, i dont want that bug either way.
        Hide
        Robert Muir added a comment -

        updated patch: so its just the API fix.

        Show
        Robert Muir added a comment - updated patch: so its just the API fix.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1390701

        LUCENE-4425: Unclear documentation of StoredFieldVisitor.binaryValue

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1390701 LUCENE-4425 : Unclear documentation of StoredFieldVisitor.binaryValue
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Robert Muir
        http://svn.apache.org/viewvc?view=revision&revision=1390262

        LUCENE-4425: respect offset/length for binary field

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1390262 LUCENE-4425 : respect offset/length for binary field
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development