Solr
  1. Solr
  2. SOLR-1357

SolrInputDocument cannot process dynamic fields

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: clients - java
    • Labels:
      None

      Description

      Adding data via SolrInputDocument is normally done by calling the addField method with a field name, field value and an optional boost. In case of dynamic fields, if field names are known upfront, then caller of this method just passes in the right name and it automatically works.

      This does not go well with users who use @interface Field annotations for automatic binding.
      As of SOLR-1129, users can annotate Map<String, String> propertyName with @Field ("field_*") kind of annotations to bind dynamic field data to. SolrInputDocument should exhibit the same behavior. The field value currently supported are - primitive, array, collection or an instance of Iterable. It can also take Map as values. If the field, for which addField method is called, is of dynamicField type (which can be derived from the field name), then the keys of the Map, passed as value, should be used to "compose" the correct field name.

      This should be supported

      //This code sample should populate the dynamic fields "brands_Nokia" and "brands_Samsung"
      public class MyBean{
        @Field("brands_*)
        Map<String, Integer> brands;
        
        ...
      }
      Map<String, String> brands= new HashMap<String, String>();
      brands.put("Nokia", 1000);
      brands.put("Samsung", 100);
      
      MyBean myBean = new MyBean();
      myBean.setBrands(brands);
      solrServer.addBean(myBean);
      

      We can think of supporting this too ...

      //This code sample should populate the dynamic fields "brands_Nokia" and "brands_Samsung"
      Map<String, String> brands= new HashMap<String, String>();
      brands.put("Nokia", 1000);
      brands.put("Samsung", 100);
      
      SolrInputDocument doc = new SolrInputDocument();
      doc.addField("brands_*", brands);
      
      1. SOLR-1357.patch
        5 kB
        Lars Grote
      2. SOLR-1357.patch
        3 kB
        Lars Grote

        Activity

        Hide
        Noble Paul added a comment - - edited

        for the bean

        public class MyBean{
          @Field("brands_*)
          Map<String, Integer> brands;
          
          ...
        }
        

        the following should not work

        brands.put("Nokia", 1000);
        brands.put("Samsung", 100);
        

        but this should work

        brands.put("brands_Nokia", 1000);
        brands.put("brands_Samsung", 100);
        

        we may not need to support it is SolrInputDocument

        Show
        Noble Paul added a comment - - edited for the bean public class MyBean{ @Field("brands_*) Map< String , Integer > brands; ... } the following should not work brands.put( "Nokia" , 1000); brands.put( "Samsung" , 100); but this should work brands.put( "brands_Nokia" , 1000); brands.put( "brands_Samsung" , 100); we may not need to support it is SolrInputDocument
        Hide
        Avlesh Singh added a comment -

        Well, I don't see a reason behind the proposed behavior (brand_nokia versus nokia), Noble. And why do you think SolrInputDocument should not facilitate this?

        Show
        Avlesh Singh added a comment - Well, I don't see a reason behind the proposed behavior (brand_nokia versus nokia), Noble. And why do you think SolrInputDocument should not facilitate this?
        Hide
        Avlesh Singh added a comment -

        My point is that when a user has already annotated the field as "brand_*", appending them to keys of the Map is counter intuitive. This also goes against the behavior in SOLR-1129.
        As far as supporting the behavior in SolrInputDocument goes, unless there is some low level API restriction, I would say we should support it.

        Show
        Avlesh Singh added a comment - My point is that when a user has already annotated the field as "brand_*", appending them to keys of the Map is counter intuitive. This also goes against the behavior in SOLR-1129 . As far as supporting the behavior in SolrInputDocument goes, unless there is some low level API restriction, I would say we should support it.
        Hide
        Noble Paul added a comment -

        My point is that when a user has already annotated the field as "brand_*", appending them to keys of the Map is counter intuitive.

        IMHO It would be more clear if the fields are created in the same name as the user input. appending it to the wild card part is not so obvious.

        This is how serialization/deserialization works.

        When I fetch an instance of MyBean ,it contains the keys as "brands_Nokia" and brands_Samsung" . When I store it back why should I chop off the "brands_" part. I should be able to put back in the same Object into Solr w/o modifying it

        And why do you think SolrInputDocument should not facilitate this?

        I'm not against the idea itself. I just meant that it is beyond the scope of this issue. Moreover, SolrInputDocument does not have a problem in achieving this requirement .

        Show
        Noble Paul added a comment - My point is that when a user has already annotated the field as "brand_*", appending them to keys of the Map is counter intuitive. IMHO It would be more clear if the fields are created in the same name as the user input. appending it to the wild card part is not so obvious. This is how serialization/deserialization works. When I fetch an instance of MyBean ,it contains the keys as "brands_Nokia" and brands_Samsung" . When I store it back why should I chop off the "brands_" part. I should be able to put back in the same Object into Solr w/o modifying it And why do you think SolrInputDocument should not facilitate this? I'm not against the idea itself. I just meant that it is beyond the scope of this issue. Moreover, SolrInputDocument does not have a problem in achieving this requirement .
        Hide
        Jean Baptiste Pionnier added a comment - - edited

        I have not been able to set dynamic fields via annotations in SolrJ 1.4.
        I have modified the following loop in org.apache.solr.client.solrj.beans.DocumentObjectBinder.toSolrInputDocument()

        DocumentObjectBinder.java
        for (DocField field : fields) {
              doc.setField( field.name, field.get( obj ), 1.0f );
        }
        

        Becomes

        for (DocField field : fields) {
              if (field.dynamicFieldNamePatternMatcher != null && field.get(obj) != null && field.isContainedInMap) {
                    Map<String, Object> mapValue = (HashMap<String, Object>) field.get(obj);
        
                    for (Map.Entry<String, Object> e : mapValue.entrySet()) {
                          doc.setField(e.getKey(), e.getValue(), 1.0f);
                    }
              } else {
                    doc.setField(field.name, field.get(obj), 1.0f);
              }
        }
        

        This modification allows the following :

        public class MyBean{
          @Field("brands_*)
          Map<String, Integer> brands;
          
          ...
        }
        
        
        Map<String, String> brands= new HashMap<String, String>();
        brands.put("brands_Nokia", 1000);
        brands.put("brands_Samsung", 100);
        
        MyBean myBean = new MyBean();
        myBean.setBrands(brands);
        solrServer.addBean(myBean);
        

        Unfortunately, I have not been able to find the correct process to submit this patch to SolrJ repository for official review.
        Would you so kind as to inform me of the necessary steps to include this patch (or another to the same effect) in future revisions of SolrJ ?

        Show
        Jean Baptiste Pionnier added a comment - - edited I have not been able to set dynamic fields via annotations in SolrJ 1.4. I have modified the following loop in org.apache.solr.client.solrj.beans.DocumentObjectBinder.toSolrInputDocument() DocumentObjectBinder.java for (DocField field : fields) { doc.setField( field.name, field.get( obj ), 1.0f ); } Becomes for (DocField field : fields) { if (field.dynamicFieldNamePatternMatcher != null && field.get(obj) != null && field.isContainedInMap) { Map< String , Object > mapValue = (HashMap< String , Object >) field.get(obj); for (Map.Entry< String , Object > e : mapValue.entrySet()) { doc.setField(e.getKey(), e.getValue(), 1.0f); } } else { doc.setField(field.name, field.get(obj), 1.0f); } } This modification allows the following : public class MyBean{ @Field("brands_*) Map< String , Integer > brands; ... } Map< String , String > brands= new HashMap< String , String >(); brands.put( "brands_Nokia" , 1000); brands.put( "brands_Samsung" , 100); MyBean myBean = new MyBean(); myBean.setBrands(brands); solrServer.addBean(myBean); Unfortunately, I have not been able to find the correct process to submit this patch to SolrJ repository for official review. Would you so kind as to inform me of the necessary steps to include this patch (or another to the same effect) in future revisions of SolrJ ?
        Hide
        Lars Grote added a comment -

        Hello,
        I think the way Avlesh suggested is more intuitive. At least I tried exactly that and wondered why it did not work.

        This is what I think would be the best:

        // a Field like this:
        @Field("*_brands")
        Map<String,Integer> brands;
        

        you should add data like this:

        brands.put("Nokia", 1000);
        brands.put("Samsung", 100);
        

        and get the data out like this:

        bean.getBrands().get("Nokia");
        bean.getBrands().get("Samsung");
        

        I used the idear of Jean and enhanced it to work like shown above.
        I attached a patch for this, maybe someone can have a look. It is not well tested.

        I have never upload a Patch or something before. so if i'm doing something wrong, or do not follow the rules or anything.
        Please tell me.

        Regards, Lars

        Show
        Lars Grote added a comment - Hello, I think the way Avlesh suggested is more intuitive. At least I tried exactly that and wondered why it did not work. This is what I think would be the best: // a Field like this : @Field( "*_brands" ) Map< String , Integer > brands; you should add data like this: brands.put( "Nokia" , 1000); brands.put( "Samsung" , 100); and get the data out like this: bean.getBrands().get( "Nokia" ); bean.getBrands().get( "Samsung" ); I used the idear of Jean and enhanced it to work like shown above. I attached a patch for this, maybe someone can have a look. It is not well tested. I have never upload a Patch or something before. so if i'm doing something wrong, or do not follow the rules or anything. Please tell me. Regards, Lars
        Hide
        Noble Paul added a comment -

        I think the way Avlesh suggested is more intuitive. At least I tried exactly that and wondered why it did not work.

        What i feel is that it is not consistent. The problem is that the same bean canot be added to Solr w/ the same annotation. That is not a very good design

        Show
        Noble Paul added a comment - I think the way Avlesh suggested is more intuitive. At least I tried exactly that and wondered why it did not work. What i feel is that it is not consistent. The problem is that the same bean canot be added to Solr w/ the same annotation. That is not a very good design
        Hide
        Lars Grote added a comment -

        Hello Noble,
        I'm sorry to bother you, but I don't understand the problem.

        Can you please explain what you mean by the following:

        The problem is that the same bean canot be added to Solr w/ the same annotation.

        Maybe I'm missing something, but at the moment I don't know what. Is it possible that you give a code example of the 'bad' design? So that I can fix the patch.

        Regards, Lars

        Show
        Lars Grote added a comment - Hello Noble, I'm sorry to bother you, but I don't understand the problem. Can you please explain what you mean by the following: The problem is that the same bean canot be added to Solr w/ the same annotation. Maybe I'm missing something, but at the moment I don't know what. Is it possible that you give a code example of the 'bad' design? So that I can fix the patch. Regards, Lars
        Hide
        Noble Paul added a comment -

        assume that the data that is fetched works as follows

        bean.getBrands().get("Nokia");
        bean.getBrands().get("Samsung");
        

        So if I do a an add w/ the same bean as follows

        solrServer.add(bean);
        

        it adds the document with names as "Nokia" and "Samsung" instead of "Nokia_Brands" and "Samsung_Brands" . Atleast that is the implementation now.

        Show
        Noble Paul added a comment - assume that the data that is fetched works as follows bean.getBrands().get( "Nokia" ); bean.getBrands().get( "Samsung" ); So if I do a an add w/ the same bean as follows solrServer.add(bean); it adds the document with names as "Nokia" and "Samsung" instead of "Nokia_Brands" and "Samsung_Brands" . Atleast that is the implementation now.
        Hide
        Lars Grote added a comment -

        Thanks for the claification!
        Indeed that would be VERY bad design. But it is not what I intended to do. This would be the behavior if I had copied the loop of Jean. In the Patch the loop has a modification.

        Have look at the inner loop. I think this should solve the problem.

        DocumentObjectBinder.java Line 91 to 102
            for (DocField field : fields) {
        		if (field.dynamicFieldNamePatternMatcher != null
        				&& field.get(obj) != null && field.isContainedInMap) {
        			Map<String, Object> mapValue = (HashMap<String, Object>) field
        					.get(obj);
        
        			for (Map.Entry<String, Object> e : mapValue.entrySet()) {
        				doc.setField(field.name.replaceFirst("\\*", e.getKey()), e.getValue(), 1.0f);
        			}
        		} else {
        			doc.setField(field.name, field.get(obj), 1.0f);
        		}
        	}
        

        I think I'll create a patch for the testcase too, then we can see the behavior more clearly.

        What do you think about the snippet above? Do I miss something?
        Regards ,Lars

        Show
        Lars Grote added a comment - Thanks for the claification! Indeed that would be VERY bad design. But it is not what I intended to do. This would be the behavior if I had copied the loop of Jean. In the Patch the loop has a modification. Have look at the inner loop. I think this should solve the problem. DocumentObjectBinder.java Line 91 to 102 for (DocField field : fields) { if (field.dynamicFieldNamePatternMatcher != null && field.get(obj) != null && field.isContainedInMap) { Map< String , Object > mapValue = (HashMap< String , Object >) field .get(obj); for (Map.Entry< String , Object > e : mapValue.entrySet()) { doc.setField(field.name.replaceFirst( "\\*" , e.getKey()), e.getValue(), 1.0f); } } else { doc.setField(field.name, field.get(obj), 1.0f); } } I think I'll create a patch for the testcase too, then we can see the behavior more clearly. What do you think about the snippet above? Do I miss something? Regards ,Lars
        Hide
        Noble Paul added a comment -

        Please clarify what the behavior is going to be?

        Show
        Noble Paul added a comment - Please clarify what the behavior is going to be?
        Hide
        Lars Grote added a comment - - edited

        First of all, you are right there is someting wrong with the implementation.

        The behavior should be the following

        The Bean should act like schon above. Add Brands, get Brands ... The SolrInputDocument should store the values with the full name i.e. "nokia_brands", that's why I'm replacing the "*" with the key. The idear was that you can access the value like this:

        bean.getBrands('Nokia')
        

        or like this:

        solrDocument.getFieldValue("brands_Nokia")
        

        Anyway, I just started writting a unittest: Create Item -> SolrInputDocument -> SolrDocument -> retriev with getBean -> SolrInputDocument -> SolrDocument -> retriev with getBean

        And if I have a Field like this:

        @Field("supplier_simple_*")
        Map<String, String> supplier_simple;
        

        everything is fine and green. But if I try the same with a Filed like this:

        @Field("supplier_*")
        Map<String, List<String>> supplier;
        

        I does not work, and at the moment I don't know why. So you are right, I'm having a closer look at this in the next few days, and hopefully I'll come back with a patch
        Regards Lars

        Show
        Lars Grote added a comment - - edited First of all, you are right there is someting wrong with the implementation. The behavior should be the following The Bean should act like schon above. Add Brands, get Brands ... The SolrInputDocument should store the values with the full name i.e. "nokia_brands", that's why I'm replacing the "*" with the key. The idear was that you can access the value like this: bean.getBrands('Nokia') or like this: solrDocument.getFieldValue( "brands_Nokia" ) Anyway, I just started writting a unittest: Create Item -> SolrInputDocument -> SolrDocument -> retriev with getBean -> SolrInputDocument -> SolrDocument -> retriev with getBean And if I have a Field like this: @Field( "supplier_simple_*" ) Map< String , String > supplier_simple; everything is fine and green. But if I try the same with a Filed like this: @Field( "supplier_*" ) Map< String , List< String >> supplier; I does not work, and at the moment I don't know why. So you are right, I'm having a closer look at this in the next few days, and hopefully I'll come back with a patch Regards Lars
        Hide
        Shalin Shekhar Mangar added a comment -

        Solr does not need to add any intelligence here. Lets keep it a dumb and simple system, put the correct field name and get the correct field name. The behavior should be to put brands_Nokia and get back brands_Nokia for a dynamic field brands_*. The same thing is done for non-solrj clients and the solr responses.

        Show
        Shalin Shekhar Mangar added a comment - Solr does not need to add any intelligence here. Lets keep it a dumb and simple system, put the correct field name and get the correct field name. The behavior should be to put brands_Nokia and get back brands_Nokia for a dynamic field brands_*. The same thing is done for non-solrj clients and the solr responses.
        Hide
        Noble Paul added a comment -

        put the correct field name and get the correct field name

        +1

        Show
        Noble Paul added a comment - put the correct field name and get the correct field name +1
        Hide
        Lars Grote added a comment -

        2 to 1 you win!

        I'll implement that and update the unittest in the next few days.
        Regards, Lars

        Show
        Lars Grote added a comment - 2 to 1 you win! I'll implement that and update the unittest in the next few days. Regards, Lars
        Hide
        Jean Baptiste Pionnier added a comment -

        ok, Thank you very much.

        Show
        Jean Baptiste Pionnier added a comment - ok, Thank you very much.
        Hide
        Lars Grote added a comment -

        Hey Noble,
        I have a question ... again.

        The question is somehow related to the the issue SOLR-1129.

        I copied the loop of Jean, and started implementing the unit test and came across a problem.
        The code below is copied from the existing unittest.

        Assume you have two fields annotated like this:

        TestDocumentObjectBinder .java
        @Field("supplier_*")
        Map<String, List<String>> supplier;
        
        @Field("supplier_*")
        public void setAllSuppliers(String[] allSuppliers){
             this.allSuppliers = allSuppliers;
        }
        

        In my opinion the second field should be ignored when the bean is converted to the SolrImportDocument (it is not ignored in the current implementation). Otherwise you get in trouble when you try to get the bean out and try to put it back in.
        So I would ignore all fields that are annotated as a dynamic field, an are not of the type Map when converting a Bean to a SolrInputDocument.
        When converting a SolrDocument to a Bean, it should be like it is now (not ignored, the field "allSuppliers" is injected with all Suppliers).

        What do you think?

        Regards, Lars

        Show
        Lars Grote added a comment - Hey Noble, I have a question ... again. The question is somehow related to the the issue SOLR-1129 . I copied the loop of Jean, and started implementing the unit test and came across a problem. The code below is copied from the existing unittest. Assume you have two fields annotated like this: TestDocumentObjectBinder .java @Field( "supplier_*" ) Map< String , List< String >> supplier; @Field( "supplier_*" ) public void setAllSuppliers( String [] allSuppliers){ this .allSuppliers = allSuppliers; } In my opinion the second field should be ignored when the bean is converted to the SolrImportDocument (it is not ignored in the current implementation). Otherwise you get in trouble when you try to get the bean out and try to put it back in. So I would ignore all fields that are annotated as a dynamic field, an are not of the type Map when converting a Bean to a SolrInputDocument. When converting a SolrDocument to a Bean, it should be like it is now (not ignored, the field "allSuppliers" is injected with all Suppliers). What do you think? Regards, Lars
        Hide
        Noble Paul added a comment -

        The second one is not ignored. In fact this is not a robust OR mapping tool. if the user makes mistakes he is likely to get wrong data. This is intended to be a simple helper utility for users of POJOs.

        Show
        Noble Paul added a comment - The second one is not ignored. In fact this is not a robust OR mapping tool. if the user makes mistakes he is likely to get wrong data. This is intended to be a simple helper utility for users of POJOs.
        Hide
        Lars Grote added a comment -

        So here is the patch. Maybe you can have a look at the testcase and the comment there, line 140. Just to make sure that this is the desired behavior.
        Regards, Lars

        Show
        Lars Grote added a comment - So here is the patch. Maybe you can have a look at the testcase and the comment there, line 140. Just to make sure that this is the desired behavior. Regards, Lars
        Hide
        Noble Paul added a comment -

        committed r886127
        Thanks Lars

        Show
        Noble Paul added a comment - committed r886127 Thanks Lars
        Hide
        Hoss Man added a comment -

        Correcting Fix Version based on CHANGES.txt, see this thread for more details...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Show
        Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release

          People

          • Assignee:
            Noble Paul
            Reporter:
            Avlesh Singh
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development