Solr
  1. Solr
  2. SOLR-3798

copyField logic in LukeRequestHandler is primitive, doesn't work well with dynamicFields

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      looking into SOLR-3795 i realized there is a much bigger problem with how LukeRequestHandler tries to get copyfield info for fields and dynamicFields the same way, and it just doesn't work.

      see the patch in SOLR-3795 for a commented out example of a test that still fails (ie: trying to get the "copySource" info for a dynamicField)

        Issue Links

          Activity

          Hide
          Steve Rowe added a comment -

          This patch fixes the issue and uncomments the commented out test added in SOLR-3795, which passes (after fixing a typo).

          The issue is in IndexSchema, which issn't properly tracking dynamic fields, rather than in LukeRequestHandler.

          Show
          Steve Rowe added a comment - This patch fixes the issue and uncomments the commented out test added in SOLR-3795 , which passes (after fixing a typo). The issue is in IndexSchema, which issn't properly tracking dynamic fields, rather than in LukeRequestHandler.
          Hide
          Steve Rowe added a comment -

          I've been thinking about a related problem: reporting of what I'm calling "undeclared explicit fields".

          In the schema, fields can be either <field> (aka concrete, aka explicit), or <dynamicField> (aka pattern, aka regex, aka prototype).

          There is also a third kind of thing that can be introduced by a <copyField>: an undeclared explicit field (UEF). Here's an example from test-files/solr/collection1/conf/schema15.xml:

          <field name="copyfield_source" type="string" indexed="true" stored="true" multiValued="true"/>
          ...
          <dynamicField name="*_ss"  type="string"  indexed="true"  stored="true" multiValued="true"/>
          ...
          <copyField source="copyfield_source" dest="copyfield_dest_ss"/>
          

          copyfield_dest_ss isn't declared anywhere else in the schema - this is an instruction to use the (first matching) dynamic field type *_ss when copying copyfield_source to UEF copyfield_dest_ss.

          The schema.xml in solr/example has another one of these, though this could be rewritten to instead use dest="*_s" and still function the same:

          <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />
          ...
          <copyField source="author" dest="author_s"/>
          

          In my (so far limited ad hoc) testing, I can't see undeclared explicit fields in reported copyfield sources or destinations.

          I think schema info reports (LukeRequestHandler and the new schema info requests I'm working on in SOLR-3250) should include UEFs in their reports.

          Show
          Steve Rowe added a comment - I've been thinking about a related problem: reporting of what I'm calling "undeclared explicit fields". In the schema, fields can be either <field> (aka concrete, aka explicit), or <dynamicField> (aka pattern, aka regex, aka prototype). There is also a third kind of thing that can be introduced by a <copyField>: an undeclared explicit field (UEF). Here's an example from test-files/solr/collection1/conf/schema15.xml : <field name= "copyfield_source" type= "string" indexed= "true" stored= "true" multiValued= "true" /> ... <dynamicField name= "*_ss" type= "string" indexed= "true" stored= "true" multiValued= "true" /> ... <copyField source= "copyfield_source" dest= "copyfield_dest_ss" /> copyfield_dest_ss isn't declared anywhere else in the schema - this is an instruction to use the (first matching) dynamic field type *_ss when copying copyfield_source to UEF copyfield_dest_ss . The schema.xml in solr/example has another one of these, though this could be rewritten to instead use dest="*_s" and still function the same: <dynamicField name= "*_s" type= "string" indexed= "true" stored= "true" /> ... <copyField source= "author" dest= "author_s" /> In my (so far limited ad hoc) testing, I can't see undeclared explicit fields in reported copyfield sources or destinations. I think schema info reports (LukeRequestHandler and the new schema info requests I'm working on in SOLR-3250 ) should include UEFs in their reports.
          Hide
          Steve Rowe added a comment -

          After chatting with Hoss on #lucene-dev IRC, I understand copyFields a little better. Hoss argued that "undeclared explicit field" is an inaccurate description of the "third kind of thing" I was referring to, and I agree.

          A hopefully better characterization - something like this should be on the wiki:

          <copyField> source or dest values can be either field names or dynamic field references.

          A dynamic field reference is either an exact <dynamicField> name, or a pattern that accepts a subset of the language accepted by the pattern for a referenced dynamic field. Subset pattern syntax is the same as that for dynamic field names ("*string" or "string*"), with the additional possibility of excluding the asterisk ("string").

          A <copyField> source subset pattern operates as a filter: instead of triggering a field copy for all field names matched by the referenced dynamic field pattern, only those that match the subset pattern will trigger a field copy.

          A <copyField> dest subset pattern operates in two ways: the target field's type is drawn from the referenced <dynamicField>; and the target field name is generated using the subset pattern as a template, unless the subset pattern excludes the asterisk, in which case the subset pattern itself becomes the target field name.

          Show
          Steve Rowe added a comment - After chatting with Hoss on #lucene-dev IRC , I understand copyFields a little better. Hoss argued that "undeclared explicit field" is an inaccurate description of the "third kind of thing" I was referring to, and I agree. A hopefully better characterization - something like this should be on the wiki: <copyField> source or dest values can be either field names or dynamic field references. A dynamic field reference is either an exact <dynamicField> name, or a pattern that accepts a subset of the language accepted by the pattern for a referenced dynamic field. Subset pattern syntax is the same as that for dynamic field names ( "*string" or "string*" ), with the additional possibility of excluding the asterisk ( "string" ). A <copyField> source subset pattern operates as a filter: instead of triggering a field copy for all field names matched by the referenced dynamic field pattern, only those that match the subset pattern will trigger a field copy. A <copyField> dest subset pattern operates in two ways: the target field's type is drawn from the referenced <dynamicField> ; and the target field name is generated using the subset pattern as a template, unless the subset pattern excludes the asterisk, in which case the subset pattern itself becomes the target field name.
          Hide
          Steve Rowe added a comment -

          Returning to the subject of this issue ... with the previously attached patch, I can see dynamic field copySource info in the response from /admin/luke?show=shema, but not in all combinations of possible <copyField> source and dest value types.

          The current situation, with the patch applied:

          case # source value type dest value type Example In /admin/luke?show=schema reponse? Schema parse succeeds?
          1 <field> name <field> name <copyField source="title" dest="text"/> Yes Yes
          2 <field> name <dynamicField> name <copyField source="title" dest="*_s"/> N/A No: "copyField only supports a dynamic destination if the source is also dynamic"
          3 <field> name subset pattern <copyField source="title" dest="*_dest_sub_s"/> N/A No: "copyField only supports a dynamic destination if the source is also dynamic"
          4 <field> name subset pattern no asterisk <copyField source="title" dest="dest_sub_no_ast_s"/> Yes Yes
           
          5 <dynamicField> name <field> name <copyField source="*_i" dest="title"/> Yes Yes
          6 <dynamicField> name <dynamicField> name <copyField source="*_i" dest="*_s"/> Yes Yes
          7 <dynamicField> name subset pattern <copyField source="*_i" dest="*_dest_sub_s"/> N/A No: "copyField dynamic destination must match a dynamicField."
          8 <dynamicField> name subset pattern no asterisk <copyField source="*_i" dest="dest_sub_no_ast_s"/> Yes Yes
           
          9 subset pattern <field> name <copyField source="*_src_sub_i" dest="title"/> Yes Yes
          10 subset pattern <dynamicField> name <copyField source="*_src_sub_i" dest="*_s"/> Yes Yes
          11 subset pattern subset pattern <copyField source="*_src_sub_i" dest="*_dest_sub_s"/> N/A No: "copyField dynamic destination must match a dynamicField."
          12 subset pattern subset pattern no asterisk <copyField source="*_src_sub_i" dest="dest_sub_no_ast_s"/> No Yes
           
          13 subset pattern no asterisk <field> name <copyField source="src_sub_no_ast_i" dest="title"/> Yes Yes
          14 subset pattern no asterisk <dynamicField> name <copyField source="src_sub_no_ast_i" dest="*_s"/> N/A No: "copyField only supports a dynamic destination if the source is also dynamic"
          15 subset pattern no asterisk subset pattern <copyField source="src_sub_no_ast_i" dest="*_dest_sub_s"/> N/A No: "copyField only supports a dynamic destination if the source is also dynamic"
          16 subset pattern no asterisk subset pattern no asterisk <copyField source="src_sub_no_ast_i" dest="dest_sub_no_ast_s"/> No Yes

          Hoss pointed out that cases 2 and 3 are expected failures, since Solr doesn't have a source name template to use when generating the destination field name.

          However, I think it's a bug that cases 7, 11, 14 and 15 cause Solr to puke - there's no reason I can see to disallow them.

          Cases 12 and 16 are directly relevant to this issue, since they are successfully parsed, but aren't returned in LukeRequestHandler's report.

          Show
          Steve Rowe added a comment - Returning to the subject of this issue ... with the previously attached patch, I can see dynamic field copySource info in the response from /admin/luke?show=shema , but not in all combinations of possible <copyField> source and dest value types. The current situation, with the patch applied: case # source value type dest value type Example In /admin/luke?show=schema reponse? Schema parse succeeds? 1 <field> name <field> name <copyField source="title" dest="text" /> Yes Yes 2 <field> name <dynamicField> name <copyField source="title" dest="*_s" /> N/A No: "copyField only supports a dynamic destination if the source is also dynamic" 3 <field> name subset pattern <copyField source="title" dest="*_dest_sub_s" /> N/A No: "copyField only supports a dynamic destination if the source is also dynamic" 4 <field> name subset pattern no asterisk <copyField source="title" dest="dest_sub_no_ast_s" /> Yes Yes   5 <dynamicField> name <field> name <copyField source="*_i" dest="title" /> Yes Yes 6 <dynamicField> name <dynamicField> name <copyField source="*_i" dest="*_s" /> Yes Yes 7 <dynamicField> name subset pattern <copyField source="*_i" dest="*_dest_sub_s" /> N/A No: "copyField dynamic destination must match a dynamicField." 8 <dynamicField> name subset pattern no asterisk <copyField source="*_i" dest="dest_sub_no_ast_s" /> Yes Yes   9 subset pattern <field> name <copyField source="*_src_sub_i" dest="title" /> Yes Yes 10 subset pattern <dynamicField> name <copyField source="*_src_sub_i" dest="*_s" /> Yes Yes 11 subset pattern subset pattern <copyField source="*_src_sub_i" dest="*_dest_sub_s" /> N/A No: "copyField dynamic destination must match a dynamicField." 12 subset pattern subset pattern no asterisk <copyField source="*_src_sub_i" dest="dest_sub_no_ast_s" /> No Yes   13 subset pattern no asterisk <field> name <copyField source="src_sub_no_ast_i" dest="title" /> Yes Yes 14 subset pattern no asterisk <dynamicField> name <copyField source="src_sub_no_ast_i" dest="*_s" /> N/A No: "copyField only supports a dynamic destination if the source is also dynamic" 15 subset pattern no asterisk subset pattern <copyField source="src_sub_no_ast_i" dest="*_dest_sub_s" /> N/A No: "copyField only supports a dynamic destination if the source is also dynamic" 16 subset pattern no asterisk subset pattern no asterisk <copyField source="src_sub_no_ast_i" dest="dest_sub_no_ast_s" /> No Yes Hoss pointed out that cases 2 and 3 are expected failures, since Solr doesn't have a source name template to use when generating the destination field name. However, I think it's a bug that cases 7, 11, 14 and 15 cause Solr to puke - there's no reason I can see to disallow them. Cases 12 and 16 are directly relevant to this issue, since they are successfully parsed, but aren't returned in LukeRequestHandler's report.
          Hide
          Steve Rowe added a comment - - edited

          In the latest patch on SOLR-4503, I've included a dynamic copy field refactoring in IndexSchema.java that fixes cases 7, 11, 14, and 15 from the above table - with that patch, the Solr schema parse succeeds for those cases.

          But LukeRequestHandler's response can only carry those cases where at least one of the source or the dest is a declared field or dynamic field name, so its current implementation won't handle cases 11, 12, 15, or 16. I think LukeRequestHandler should split out copyField info, something like the way I did it on SOLR-4503, so that subset patterns can be reported.

          Show
          Steve Rowe added a comment - - edited In the latest patch on SOLR-4503 , I've included a dynamic copy field refactoring in IndexSchema.java that fixes cases 7, 11, 14, and 15 from the above table - with that patch, the Solr schema parse succeeds for those cases. But LukeRequestHandler's response can only carry those cases where at least one of the source or the dest is a declared field or dynamic field name, so its current implementation won't handle cases 11, 12, 15, or 16. I think LukeRequestHandler should split out copyField info, something like the way I did it on SOLR-4503 , so that subset patterns can be reported.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Steven Rowe
          http://svn.apache.org/viewvc?view=revision&revision=1453161

          SOLR-4503: Add REST API methods to get schema information: fields, dynamicFields, fieldTypes, and copyFields. Restlet 2.1.1 is integrated and is used to service these requests.
          Also fixes bugs in dynamic copyField logic described in SOLR-3798.
          Also fixes a bug with proxied SolrCloud requests (SOLR-4210) when using the GET method.

          Show
          Commit Tag Bot added a comment - [trunk commit] Steven Rowe http://svn.apache.org/viewvc?view=revision&revision=1453161 SOLR-4503 : Add REST API methods to get schema information: fields, dynamicFields, fieldTypes, and copyFields. Restlet 2.1.1 is integrated and is used to service these requests. Also fixes bugs in dynamic copyField logic described in SOLR-3798 . Also fixes a bug with proxied SolrCloud requests ( SOLR-4210 ) when using the GET method.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Steven Rowe
          http://svn.apache.org/viewvc?view=revision&revision=1453162

          SOLR-4503: Add REST API methods to get schema information: fields, dynamicFields, fieldTypes, and copyFields. Restlet 2.1.1 is integrated and is used to service these requests.
          Also fixes bugs in dynamic copyField logic described in SOLR-3798.
          Also fixes a bug with proxied SolrCloud requests (SOLR-4210) when using the GET method.
          (merged trunk r1453161)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revision&revision=1453162 SOLR-4503 : Add REST API methods to get schema information: fields, dynamicFields, fieldTypes, and copyFields. Restlet 2.1.1 is integrated and is used to service these requests. Also fixes bugs in dynamic copyField logic described in SOLR-3798 . Also fixes a bug with proxied SolrCloud requests ( SOLR-4210 ) when using the GET method. (merged trunk r1453161)

            People

            • Assignee:
              Unassigned
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development