Solr
  1. Solr
  2. SOLR-3628

SolrDocument uses user-provided collections unsafely

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.0, 6.0
    • Component/s: clients - java
    • Labels:
      None
    • Environment:

      Mac OS X 10.7.4, Java 6

      Description

      Adding a RO Collection as the value of a field (ie. SolrDocument or SolrInputField) will result in an UnsupportedOperationException later on when adding more values to that field.

      This happens because no defensive copy of collections are made. Instead, if a collection is given first, then it becomes the backing collection for the field. This can cause problems if the collection is modified after the fact or if a read-only collection is given (eg. Collection.unmodifiableList(...)).

      It can be reproduced with:

      SolrDocument doc = new SolrDocument()
      doc.addField("v", Collections.unmodifiableList(new ArrayList<Object>()))
      doc.addField("v", "a")

      I've created a patch that includes a fix and a test with, essentially, the above. The patch just ensures that SolrDocument and SolrInputField always use a Collection they created as the value, rather than relying on what was given to them.

      1. SOLR-3628.patch
        5 kB
        Hoss Man
      2. solrdoc-ro-list-bug.patch
        2 kB
        Tom Switzer
      3. solrdoc-ro-list-bug-comp.patch
        4 kB
        Tom Switzer

        Activity

        Hide
        Tom Switzer added a comment -

        A patch to the current trunk. Includes test case.

        Show
        Tom Switzer added a comment - A patch to the current trunk. Includes test case.
        Hide
        Hoss Man added a comment -

        I seem to recall more then a few discussions in the past about how "defensive" APIs like this should be with respect to user provided collections and arrays – the trade off being GC overhead (if the client code and the solrj code both make defensive copies, we've doubled the number of shot lived objects)

        Either way: before 4.0 we should either commit this patch, or significantly improve the docs for SolrInputDocument and SolrInputField to make it clear that no copying is done and the code may modify the underlying Objects addded as fields

        Show
        Hoss Man added a comment - I seem to recall more then a few discussions in the past about how "defensive" APIs like this should be with respect to user provided collections and arrays – the trade off being GC overhead (if the client code and the solrj code both make defensive copies, we've doubled the number of shot lived objects) Either way: before 4.0 we should either commit this patch, or significantly improve the docs for SolrInputDocument and SolrInputField to make it clear that no copying is done and the code may modify the underlying Objects addded as fields
        Hide
        Tom Switzer added a comment -

        One problem, I think, is that addValue (field) and addField (doc) are overloaded. It behaves different depending on if the call is the first time a value is being added, if it's an array, if it's a collection, or some other value. Perhaps a less contentious solution would be to make addValue/addField ALWAYS defensively copy collections, but keep the current setValue/setField behaviour. If I am setting a value to a collection, then I can see how a defensive copy may not be done. However, if I am "adding" a collection, it would seem I am appending the value to an already existing set.

        This solution (and more documentation would be good too) would let people who are worried about excessive GC use setValue/setField to get what they want and those who just don't want to think about it use add/addField.

        Show
        Tom Switzer added a comment - One problem, I think, is that addValue (field) and addField (doc) are overloaded. It behaves different depending on if the call is the first time a value is being added, if it's an array, if it's a collection, or some other value. Perhaps a less contentious solution would be to make addValue/addField ALWAYS defensively copy collections, but keep the current setValue/setField behaviour. If I am setting a value to a collection, then I can see how a defensive copy may not be done. However, if I am "adding" a collection, it would seem I am appending the value to an already existing set. This solution (and more documentation would be good too) would let people who are worried about excessive GC use setValue/setField to get what they want and those who just don't want to think about it use add/addField.
        Hide
        Tom Switzer added a comment -

        Here is a patch which changes SolrInputField#addValue's & SolrDocument#addField's behaviour to always copy collection, but keeps SolrInputField#setValue's & SolrDocument#setField's behaviour to use whatever it is given. Also added a bit more to the docs.

        Show
        Tom Switzer added a comment - Here is a patch which changes SolrInputField#addValue's & SolrDocument#addField's behaviour to always copy collection, but keeps SolrInputField#setValue's & SolrDocument#setField's behaviour to use whatever it is given. Also added a bit more to the docs.
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        Perhaps a less contentious solution would be to make addValue/addField ALWAYS defensively copy collections, but keep the current setValue/setField behaviour. If I am setting a value to a collection, then I can see how a defensive copy may not be done. However, if I am "adding" a collection, it would seem I am appending the value to an already existing set

        That sounds like a good approach ... assigning to myself to remind me to do a more thorough review of your patch and try to get this into 4.0

        Show
        Hoss Man added a comment - Perhaps a less contentious solution would be to make addValue/addField ALWAYS defensively copy collections, but keep the current setValue/setField behaviour. If I am setting a value to a collection, then I can see how a defensive copy may not be done. However, if I am "adding" a collection, it would seem I am appending the value to an already existing set That sounds like a good approach ... assigning to myself to remind me to do a more thorough review of your patch and try to get this into 4.0
        Hide
        Hoss Man added a comment -

        updated patch to include a test that the field really is backed by the collection (since we now explicitly document it)

        will commit a soon as full test run finishes

        Show
        Hoss Man added a comment - updated patch to include a test that the field really is backed by the collection (since we now explicitly document it) will commit a soon as full test run finishes
        Hide
        Hoss Man added a comment -

        Committed revision 1383520. - trunk
        Committed revision 1383533. - 4x

        Thanks Tom!

        Show
        Hoss Man added a comment - Committed revision 1383520. - trunk Committed revision 1383533. - 4x Thanks Tom!
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1383533

        SOLR-3628: SolrInputField and SolrInputDocument are now consistently backed by Collections passed in to setValue/setField, and defensively copy values from Collections passed to addValue/addField (merge r1383520)

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1383533 SOLR-3628 : SolrInputField and SolrInputDocument are now consistently backed by Collections passed in to setValue/setField, and defensively copy values from Collections passed to addValue/addField (merge r1383520)
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Hoss Man
            Reporter:
            Tom Switzer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development