Solr
  1. Solr
  2. SOLR-116

StructureRequestHandler - allowing client to discover all fields in the index

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: search
    • Labels:
      None

      Description

      This request handler returns all fields and their type. In Ruby format (&wt=ruby) the results, for the example index, look like this currently:

      {'responseHeader'=>

      {'status'=>0,'QTime'=>1}

      ,'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}

      A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index. Should more information per field be returned, or is simply the type name sufficient? What else is desirable for this request handler?

        Activity

        Hide
        Erik Hatcher added a comment -

        The initial example was from an older example index. From trunk, the response is this:

        {'responseHeader'=>

        {'status'=>0,'QTime'=>2}

        ,'fields'=>{'includes'=>'text','cat'=>'text_ws','alphaNameSort'=>'alphaOnlySort','id'=>'string','text'=>'text','manu_exact'=>'string','features'=>'text','price'=>'sfloat','incubationdate_dt'=>'date','timestamp'=>'date','sku'=>'textTight','name'=>'text','nameSort'=>'string','manu'=>'text','weight'=>'sfloat','inStock'=>'boolean','popularity'=>'sint'}}

        incubationdate_dt is a dynamic field, and thus could not be gleaned from simply reading schema.xml.

        Show
        Erik Hatcher added a comment - The initial example was from an older example index. From trunk, the response is this: {'responseHeader'=> {'status'=>0,'QTime'=>2} ,'fields'=>{'includes'=>'text','cat'=>'text_ws','alphaNameSort'=>'alphaOnlySort','id'=>'string','text'=>'text','manu_exact'=>'string','features'=>'text','price'=>'sfloat','incubationdate_dt'=>'date','timestamp'=>'date','sku'=>'textTight','name'=>'text','nameSort'=>'string','manu'=>'text','weight'=>'sfloat','inStock'=>'boolean','popularity'=>'sint'}} incubationdate_dt is a dynamic field, and thus could not be gleaned from simply reading schema.xml.
        Hide
        Yonik Seeley added a comment -

        Looks good, I like the fieldnames as the keys. The only change I might make is to make it extensible by returning a map as the value.

        Instead of:
        'id'=>'string'
        It could be
        'id'=>

        {type=>'string'}

        And then other info could optionally go in there:
        'id'=>

        {type=>'string', multiValued=>'false', 'indexed'=>'true', 'stored'=>'true', 'defaultValue'=>'...'}

        Hmmm, and what are the aesthetics of the XML?

        <lst name="fields">
        <lst name="id> <str name="type">string</str> </lst>
        <lst name="text">...

        Not bad...

        Show
        Yonik Seeley added a comment - Looks good, I like the fieldnames as the keys. The only change I might make is to make it extensible by returning a map as the value. Instead of: 'id'=>'string' It could be 'id'=> {type=>'string'} And then other info could optionally go in there: 'id'=> {type=>'string', multiValued=>'false', 'indexed'=>'true', 'stored'=>'true', 'defaultValue'=>'...'} Hmmm, and what are the aesthetics of the XML? <lst name="fields"> <lst name="id> <str name="type">string</str> </lst> <lst name="text">... Not bad...
        Hide
        Yonik Seeley added a comment -

        If you want to commit early and still mess around with the parameters and response formats,
        one could add a 'NOTICE'=>'This interface is experimental and will be changing'
        to the response.

        As this handler returns info about the index, is this where listing of terms and docfreqs should also go?

        Show
        Yonik Seeley added a comment - If you want to commit early and still mess around with the parameters and response formats, one could add a 'NOTICE'=>'This interface is experimental and will be changing' to the response. As this handler returns info about the index, is this where listing of terms and docfreqs should also go?
        Hide
        Erik Hatcher added a comment -

        I had thought of the Map for the field name keyed value as well.

        Terms and document frequencies make more sense from a facet handler, it seems, which you can already do with &qt=standard&facet=true&facet.field=fieldname&q=[* TO *] I believe.

        I'll add the Map level in there, and the notice, and commit soon so we can tinker with it in Flare as a way to provide a dynamic UI based on the fields in the index.

        Show
        Erik Hatcher added a comment - I had thought of the Map for the field name keyed value as well. Terms and document frequencies make more sense from a facet handler, it seems, which you can already do with &qt=standard&facet=true&facet.field=fieldname&q= [* TO *] I believe. I'll add the Map level in there, and the notice, and commit soon so we can tinker with it in Flare as a way to provide a dynamic UI based on the fields in the index.
        Hide
        Yonik Seeley added a comment -

        Facets are slightly different than docfreq's... one is expensive, and one is very cheap since it's pre-calculated by lucene.
        The disad to the lucene version is that the docfreq doesn't take deleted docs into account.

        If you want to page through or download all terms of a full-text field, the faceting code would take forever in comparison.

        other ideas for info:

        "index" :

        { "numDocs" : 10123, "maxDoc" : 12345, "age" : 2000, #number of milliseconds the index has been open... sort of equivalent to index freshness, but not really. "version":123425235, #index version. Actually, I think this should be in responseHeader to aid in client-side caching }

        I think this stuff is useful, it's just a matter of preference if it goes in the same handler or not.
        If this does go in this handler, then perhaps it should be named "indexinfo" or something. I'd be fine with this hander being only about schema too though.

        Show
        Yonik Seeley added a comment - Facets are slightly different than docfreq's... one is expensive, and one is very cheap since it's pre-calculated by lucene. The disad to the lucene version is that the docfreq doesn't take deleted docs into account. If you want to page through or download all terms of a full-text field, the faceting code would take forever in comparison. other ideas for info: "index" : { "numDocs" : 10123, "maxDoc" : 12345, "age" : 2000, #number of milliseconds the index has been open... sort of equivalent to index freshness, but not really. "version":123425235, #index version. Actually, I think this should be in responseHeader to aid in client-side caching } I think this stuff is useful, it's just a matter of preference if it goes in the same handler or not. If this does go in this handler, then perhaps it should be named "indexinfo" or something. I'd be fine with this hander being only about schema too though.
        Hide
        Erik Hatcher added a comment -

        I've committed IndexInfoRequestHandler based on the feedback here. The field information is now returned as a map, with type being the only value currently. I also added in an "index" keyed map which contains numDocs, maxDoc, and Lucene index version. I wasn't sure how the "age" value should be computed, so I commented that out for now.

        I'm closing this issue, and tweaks to this handler can be discussed in solr-dev now.

        Thanks for the feedback.

        Show
        Erik Hatcher added a comment - I've committed IndexInfoRequestHandler based on the feedback here. The field information is now returned as a map, with type being the only value currently. I also added in an "index" keyed map which contains numDocs, maxDoc, and Lucene index version. I wasn't sure how the "age" value should be computed, so I commented that out for now. I'm closing this issue, and tweaks to this handler can be discussed in solr-dev now. Thanks for the feedback.
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked ("Resolved" or "Closed") and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.2

        The Fix Version for all 39 issues found was set to 1.2, email notification
        was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this
        (hopefully) unique string: 20080415hossman2

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.2 The Fix Version for all 39 issues found was set to 1.2, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman2

          People

          • Assignee:
            Erik Hatcher
            Reporter:
            Erik Hatcher
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development