Solr
  1. Solr
  2. SOLR-494

LukeRequestHandler/Ajax-based schema explorer

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: web gui
    • Labels:
      None
    • Environment:

      N/A

      Description

      This patch submits a schema browsing tool based on making Ajax calls to LukeRequestHandler. It is in progress, but far enough along to generate discussion and see if people find it useful/perhaps incorporate some feedback. It is similar to the XSLT-based schema browser in SOLR-75, in that it provides cross-referenced exploring of the major schema components (fields/field types/dynamic fields). Since LukeRequestHandler provides more information, this version can provide more information than could the XSLT version, including statsitics and more information about dynamic fields. Also, since it hits LukeRequestHandler, it probably also has much different performance that just transforming schema.xml.

      1. Field View.jpg
        107 kB
        Greg Ludington
      2. jsonschemabrowser.patch
        90 kB
        Greg Ludington
      3. multicoreupdate.patch
        2 kB
        Greg Ludington

        Issue Links

          Activity

          Hide
          Greg Ludington added a comment -

          This patch consists of 5 files:

          1) Changes to IndexSchema to expose more information for cross referencing – the source fields for a copyField, as well the prototypes for each DynamicField

          2) Changes to LukeRequestHandler to pass this additional information (copyField sources and destinations, as well as analyzer information, and dynamic field information.)

          3) Changes to solr-admin.css for the new page (adding new styles, not changing any existing ones)

          4) A javascript-heavy schema.jsp to retrieve this information and present it in a browsable form

          5) The inclusion of jquery as a foundation for the javascript in schema.jsp

          It is the last two parts that could be a concern for committers. jquery is dual-licensed under the GPL and under the MIT license, which I believe is ASF-compatible, but I have not checked the contribution checkbox until I know for sure. Similarly, schema.jsp itself is heavily dependent on javascript that the project may or may not wish to maintain as versions change.

          The page is also not set up to degrade gracefully. Normally, I would consider that a large faux pas, but I am creating this as an internal aid, where graceful degradation will not be an issue, so I have not had the itch to redo this server-side. It may be an issue in the larger context of being included in Solr, as, while it provides a few more ways to look at the schema than the XML/XSL LukeRequestHandler, it will not work across as many clients. As a result, I did not include any direct link to it from any of the stock admin jsps, so you would have to hit

          (your path)/admin/schema.jsp

          directly in order to try it out. I have tried it in several different browsers against my own small (single core) indexes, but I would be interested in feedback on how well it works for large indexes or indexes with large numbers of field definitions.

          Show
          Greg Ludington added a comment - This patch consists of 5 files: 1) Changes to IndexSchema to expose more information for cross referencing – the source fields for a copyField, as well the prototypes for each DynamicField 2) Changes to LukeRequestHandler to pass this additional information (copyField sources and destinations, as well as analyzer information, and dynamic field information.) 3) Changes to solr-admin.css for the new page (adding new styles, not changing any existing ones) 4) A javascript-heavy schema.jsp to retrieve this information and present it in a browsable form 5) The inclusion of jquery as a foundation for the javascript in schema.jsp It is the last two parts that could be a concern for committers. jquery is dual-licensed under the GPL and under the MIT license, which I believe is ASF-compatible, but I have not checked the contribution checkbox until I know for sure. Similarly, schema.jsp itself is heavily dependent on javascript that the project may or may not wish to maintain as versions change. The page is also not set up to degrade gracefully. Normally, I would consider that a large faux pas, but I am creating this as an internal aid, where graceful degradation will not be an issue, so I have not had the itch to redo this server-side. It may be an issue in the larger context of being included in Solr, as, while it provides a few more ways to look at the schema than the XML/XSL LukeRequestHandler, it will not work across as many clients. As a result, I did not include any direct link to it from any of the stock admin jsps, so you would have to hit (your path)/admin/schema.jsp directly in order to try it out. I have tried it in several different browsers against my own small (single core) indexes, but I would be interested in feedback on how well it works for large indexes or indexes with large numbers of field definitions.
          Hide
          Greg Ludington added a comment -

          Screen shot of the basic view of the "text" field from the example schema, as viewed in Firefox 2. Shows links to field type, the fields it is copied from, index/query analyzers, as well as the histogram and the top terms form.

          Show
          Greg Ludington added a comment - Screen shot of the basic view of the "text" field from the example schema, as viewed in Firefox 2. Shows links to field type, the fields it is copied from, index/query analyzers, as well as the histogram and the top terms form.
          Hide
          Erik Hatcher added a comment -

          committed! Thanks Greg for this very slick addition to Solr's schema view!

          Show
          Erik Hatcher added a comment - committed! Thanks Greg for this very slick addition to Solr's schema view!
          Hide
          Hoss Man added a comment -

          this seems to have broken two SolrJ tests: SolrExampleEmbeddedTest and SolrExampleJettyTest which parse the output from the LukeRequestHandler.

          reopening issue until we figure out how to fix it.

          Greg: reading the diff, it's not clear to me what changed in the response format from LukeRequestHandler. Can you elaborate on what changed and if it was intentional and what value it adds?

          Show
          Hoss Man added a comment - this seems to have broken two SolrJ tests: SolrExampleEmbeddedTest and SolrExampleJettyTest which parse the output from the LukeRequestHandler. reopening issue until we figure out how to fix it. Greg: reading the diff, it's not clear to me what changed in the response format from LukeRequestHandler. Can you elaborate on what changed and if it was intentional and what value it adds?
          Hide
          Greg Ludington added a comment -

          The LukeRequestHandler output (and IndexSchema) was changed to provide
          a bit more information to cross reference fields, field types, and
          dynamic fields. These additions allow the user to browse through the
          relationships between fields/types, hopefully to get a more complete
          picture of the schema.

          1) In the default no-argument LukeRequestHandler output, dynamic
          fields are outputted with a reference to the dynamicField used to
          generate them. In the example schema.conf, the field
          "incubationdate_dt", would contain this extra child in the XML
          response:

          <str name="dynamicBase">*_dt</str>

          2) In the show=schema view, dynamicField definitions are also
          outputted. In the XML response, this would be:

          <lst name="random*">
          <str name="type">random</str>
          <str name="flags">I-S----------</str>
          <arr name="copyDests"/>
          <arr name="copySources"/>
          </lst>

          3) In that schema view, fields reference the fields they are copied
          from, or copied to. The text field in the example schema would look
          like the following:

          <arr name="copySources">
          <str>
          org.apache.solr.schema.SchemaField:cat

          {type=text_ws,properties=indexed,tokenized,stored,omitNorms,termVectors,multiValued}

          </str>
          ...and so on for each field
          </arr>

          3) In the show=schema view, FieldTypes also output the dynamicFields
          that use their definitions. In the example schema, sdouble has no
          fields, and so does not show up. After this patch, it shows up as
          follows, because there is a dynamicField available of that type:

          <lst name="sdouble">
          <arr name="fields">
          <str>*_d</str>
          </arr>
          <bool name="tokenized">false</bool>
          <str name="className">org.apache.solr.schema.SortableDoubleField</str>
          <lst name="indexAnalyzer">
          <str name="className">org.apache.solr.schema.FieldType$DefaultAnalyzer</str>
          </lst>
          <lst name="queryAnalyzer">
          <str name="className">org.apache.solr.schema.FieldType$DefaultAnalyzer</str>
          </lst>
          </lst>

          4) Again in the show=schema view, there is some addition information
          about the analyzers. Each Field is output with its
          positionIncrementGap, and each FieldType is output with its tokenizers
          and filters. This FieldType snippet is long, but it appears the solrj
          issue is here:

          <lst name="indexAnalyzer">
          <str name="className">org.apache.solr.analysis.TokenizerChain</str>
          <lst name="tokenizer">
          <str name="className">
          org.apache.solr.analysis.WhitespaceTokenizerFactory </str>
          <lst name="args"/>
          </lst>
          <arr name="filters">
          <lst>
          <lst name="args">
          <str name="synonyms">synonyms.txt</str>
          <str name="expand">false</str>
          <str name="ignoreCase">true</str>
          </lst>
          <str
          name="className">org.apache.solr.analysis.SynonymFilterFactory</str>
          </lst>
          <lst>
          <lst name="args">
          <str name="words">stopwords.txt</str>
          <str name="ignoreCase">true</str>
          </lst>
          <str
          name="className">org.apache.solr.analysis.StopFilterFactory</str>
          </lst>
          <lst>
          <lst name="args">
          <str name="generateNumberParts">0</str>
          <str name="catenateWords">1</str>
          <str name="generateWordParts">0</str>
          <str name="catenateAll">0</str>
          <str name="catenateNumbers">1</str>
          </lst>
          <str name="className">
          org.apache.solr.analysis.WordDelimiterFilterFactory </str>
          </lst>
          <lst>
          <lst name="args"/>
          <str
          name="className">org.apache.solr.analysis.LowerCaseFilterFactory</str>
          </lst>
          <lst>
          <lst name="args">
          <str name="protected">protwords.txt</str>
          </lst>
          <str name="className">
          org.apache.solr.analysis.EnglishPorterFilterFactory </str>
          </lst>
          <lst>
          <lst name="args"/>
          <str name="className">
          org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory
          </str>
          </lst>
          </arr>
          </lst>

          I had not been looking at the solrj effects yet, but it the failure is
          in the way filters and analyzers are output in the show=schema view
          (or how they are parsed in solrj). I will try and make some time to
          look at this tonight, but I would be not able to look at other client
          implementations.

          Show
          Greg Ludington added a comment - The LukeRequestHandler output (and IndexSchema) was changed to provide a bit more information to cross reference fields, field types, and dynamic fields. These additions allow the user to browse through the relationships between fields/types, hopefully to get a more complete picture of the schema. 1) In the default no-argument LukeRequestHandler output, dynamic fields are outputted with a reference to the dynamicField used to generate them. In the example schema.conf, the field "incubationdate_dt", would contain this extra child in the XML response: <str name="dynamicBase">*_dt</str> 2) In the show=schema view, dynamicField definitions are also outputted. In the XML response, this would be: <lst name="random*"> <str name="type">random</str> <str name="flags">I-S----------</str> <arr name="copyDests"/> <arr name="copySources"/> </lst> 3) In that schema view, fields reference the fields they are copied from, or copied to. The text field in the example schema would look like the following: <arr name="copySources"> <str> org.apache.solr.schema.SchemaField:cat {type=text_ws,properties=indexed,tokenized,stored,omitNorms,termVectors,multiValued} </str> ...and so on for each field </arr> 3) In the show=schema view, FieldTypes also output the dynamicFields that use their definitions. In the example schema, sdouble has no fields, and so does not show up. After this patch, it shows up as follows, because there is a dynamicField available of that type: <lst name="sdouble"> <arr name="fields"> <str>*_d</str> </arr> <bool name="tokenized">false</bool> <str name="className">org.apache.solr.schema.SortableDoubleField</str> <lst name="indexAnalyzer"> <str name="className">org.apache.solr.schema.FieldType$DefaultAnalyzer</str> </lst> <lst name="queryAnalyzer"> <str name="className">org.apache.solr.schema.FieldType$DefaultAnalyzer</str> </lst> </lst> 4) Again in the show=schema view, there is some addition information about the analyzers. Each Field is output with its positionIncrementGap, and each FieldType is output with its tokenizers and filters. This FieldType snippet is long, but it appears the solrj issue is here: <lst name="indexAnalyzer"> <str name="className">org.apache.solr.analysis.TokenizerChain</str> <lst name="tokenizer"> <str name="className"> org.apache.solr.analysis.WhitespaceTokenizerFactory </str> <lst name="args"/> </lst> <arr name="filters"> <lst> <lst name="args"> <str name="synonyms">synonyms.txt</str> <str name="expand">false</str> <str name="ignoreCase">true</str> </lst> <str name="className">org.apache.solr.analysis.SynonymFilterFactory</str> </lst> <lst> <lst name="args"> <str name="words">stopwords.txt</str> <str name="ignoreCase">true</str> </lst> <str name="className">org.apache.solr.analysis.StopFilterFactory</str> </lst> <lst> <lst name="args"> <str name="generateNumberParts">0</str> <str name="catenateWords">1</str> <str name="generateWordParts">0</str> <str name="catenateAll">0</str> <str name="catenateNumbers">1</str> </lst> <str name="className"> org.apache.solr.analysis.WordDelimiterFilterFactory </str> </lst> <lst> <lst name="args"/> <str name="className">org.apache.solr.analysis.LowerCaseFilterFactory</str> </lst> <lst> <lst name="args"> <str name="protected">protwords.txt</str> </lst> <str name="className"> org.apache.solr.analysis.EnglishPorterFilterFactory </str> </lst> <lst> <lst name="args"/> <str name="className"> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory </str> </lst> </arr> </lst> I had not been looking at the solrj effects yet, but it the failure is in the way filters and analyzers are output in the show=schema view (or how they are parsed in solrj). I will try and make some time to look at this tonight, but I would be not able to look at other client implementations.
          Hide
          Erik Hatcher added a comment -

          Fixed issue that caused SolrJ tests to fail. Sorry, my bad for not running "ant test" before committing!

          Show
          Erik Hatcher added a comment - Fixed issue that caused SolrJ tests to fail. Sorry, my bad for not running "ant test" before committing!
          Hide
          Greg Ludington added a comment -

          I finally had occasion to look at this in a multicore setting, and the extra core-identifying field caused the schema browser problems, and, more importantly, caused an Exception to be thrown in the LukeRequestHandler when trying to output schema information for that extra multicore field. The upcoming patch adds the necessary check to LukeRequestHandler, and adjusts the javascript in schema.jsp

          Show
          Greg Ludington added a comment - I finally had occasion to look at this in a multicore setting, and the extra core-identifying field caused the schema browser problems, and, more importantly, caused an Exception to be thrown in the LukeRequestHandler when trying to output schema information for that extra multicore field. The upcoming patch adds the necessary check to LukeRequestHandler, and adjusts the javascript in schema.jsp
          Hide
          Greg Ludington added a comment -

          In a multicore setting, these changes cause the LukeRequestHandler to throw an Exception on the core-identifying field because there was not a null check for sfield in the appropriate new line of LukeRequestHandler This patch adds this check, and also updates the javascript.

          Show
          Greg Ludington added a comment - In a multicore setting, these changes cause the LukeRequestHandler to throw an Exception on the core-identifying field because there was not a null check for sfield in the appropriate new line of LukeRequestHandler This patch adds this check, and also updates the javascript.
          Hide
          Erik Hatcher added a comment -

          patch applied, thanks Greg!

          Show
          Erik Hatcher added a comment - patch applied, thanks Greg!

            People

            • Assignee:
              Unassigned
              Reporter:
              Greg Ludington
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development