Solr
  1. Solr
  2. SOLR-162

lucene index browser / admin helpers (Luke)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: web gui
    • Labels:
      None

      Description

      Luke (http://www.getopt.org/luke/) is a great tool to help learn / understand / debug lucene indexes.

      Solr already does a lot of what luke does... but it could do a bit more. Specifically:

      • browse top terms across all fields (similar to faceting)
      • browse lucene documents / properties directly
      1. SOLR-162-Admin-XML-luke.patch
        46 kB
        Ryan McKinley
      2. SOLR-162-Admin-XML-luke.patch
        77 kB
        Ryan McKinley
      3. SOLR-162-Admin-XML-luke.patch
        95 kB
        Ryan McKinley
      4. SOLR-162-Admin-XML-luke.patch
        68 kB
        Ryan McKinley
      5. SOLR-162-Admin-XML-luke.patch
        68 kB
        Ryan McKinley
      6. SOLR-162-Admin-XML-luke.patch
        62 kB
        Ryan McKinley

        Issue Links

          Activity

          Show
          Ryan McKinley added a comment - This patch moves a lot of the /admin jsp pages to SolrRequestHandlers and adds a LukeRequestHandler to let you browse a lucene index (even if it does not match the solr schema). I don't know XSLT well enough to make anything look good yet, but with an ok XSLT file, we could replace many of the .jsp files. pages of interest: http://localhost:8983/solr/admin/ http://localhost:8983/solr/admin/file http://localhost:8983/solr/admin/file?file=solrconfig.xml http://localhost:8983/solr/admin/threads http://localhost:8983/solr/admin/registry http://localhost:8983/solr/admin/registry?wt=json&indent=true http://localhost:8983/solr/admin/stats http://localhost:8983/solr/admin/ping http://localhost:8983/solr/admin/properties http://localhost:8983/solr/admin/properties?name=java.home http://localhost:8983/solr/admin/logging http://localhost:8983/solr/admin/logging?set=FINE http://localhost:8983/solr/admin/luke (field info + top fields) http://localhost:8983/solr/admin/luke?field=cat (like faceting) http://localhost:8983/solr/admin/luke?docID=10 (lucene doc + solr doc) http://localhost:8983/solr/admin/luke?id=10 (lucene doc + solr doc) http://localhost:8983/solr/admin/luke?id=MA147LL/A
          Hide
          Yonik Seeley added a comment -

          Wow, what can I say... you continue to out pace us committers, Ryan.
          I'll try and look at at least one of the open issues this weekend.

          Show
          Yonik Seeley added a comment - Wow, what can I say... you continue to out pace us committers, Ryan. I'll try and look at at least one of the open issues this weekend.
          Hide
          Erik Hatcher added a comment -

          This is truly great stuff, Ryan! whew I can't keep up.

          I did however become really intrigued by this particular patch and tried it out. I like all these details and Flare will certainly leverage this stuff deeply.

          One comment. for this request http://localhost:8888/solr/admin/ping?wt=ruby&indent=on all that was logged was this:

          Feb 16, 2007 8:59:22 PM org.apache.solr.core.SolrCore execute
          INFO: wt=ruby&indent=on 0 8

          I realize this is tangentially related to this issue, and nothing introduced with this patch, but I'd like to see the path (admin/ping, in this case) in the log as well so that requests could be recreated easily. I'm used to the solr/select? stuff and tacking on what I get in the log file, but with the newly revamped mega flexible paths, it'd be handy to see the path here.

          Show
          Erik Hatcher added a comment - This is truly great stuff, Ryan! whew I can't keep up. I did however become really intrigued by this particular patch and tried it out. I like all these details and Flare will certainly leverage this stuff deeply. One comment. for this request http://localhost:8888/solr/admin/ping?wt=ruby&indent=on all that was logged was this: Feb 16, 2007 8:59:22 PM org.apache.solr.core.SolrCore execute INFO: wt=ruby&indent=on 0 8 I realize this is tangentially related to this issue, and nothing introduced with this patch, but I'd like to see the path (admin/ping, in this case) in the log as well so that requests could be recreated easily. I'm used to the solr/select? stuff and tacking on what I get in the log file, but with the newly revamped mega flexible paths, it'd be handy to see the path here.
          Hide
          Ryan McKinley added a comment -

          check SOLR-149. it adds the path to a request and prints it out with the log.

          Show
          Ryan McKinley added a comment - check SOLR-149 . it adds the path to a request and prints it out with the log.
          Hide
          Erik Hatcher added a comment -

          Ryan - I would like to see the Maps used instead of NamedLists for things that truly don't need to be lists. For example, /admin/file?wt=ruby returns this:

          'files'=>[
          'admin-extra.html',[
          'size',1094,
          'modified','2006-12-05T02:30:56Z'],...

          'size' and 'modified' should be keys in a hash instead of in a list. Should be no big deal to switch over things where order doesn't matter to maps though. Likewise for /admin/threads and /admin/registry, and maybe others.

          Show
          Erik Hatcher added a comment - Ryan - I would like to see the Maps used instead of NamedLists for things that truly don't need to be lists. For example, /admin/file?wt=ruby returns this: 'files'=>[ 'admin-extra.html',[ 'size',1094, 'modified','2006-12-05T02:30:56Z'],... 'size' and 'modified' should be keys in a hash instead of in a list. Should be no big deal to switch over things where order doesn't matter to maps though. Likewise for /admin/threads and /admin/registry, and maybe others.
          Hide
          Bertrand Delacretaz added a comment -

          I briefly tested this, it looks very useful, and the different RequestHandlers make the code very modular, way to go!

          One nitpick: I'd use "system.properties" instead of "properties", it's more precise.

          And two suggestions related to XSLT transformations for presentation:

          1) It'd be good to systematically include in the output XML the class name of the SolrRequestHandler used. XSLT transforms can then use this info to adapt themselves to the information being output.

          2) It'd be good to name <lst> elements, as much as possible, also to allow XSLT transforms to adapt themselves to the content.

          For example, using a NamedList instead of an ArralyList in the "Now show all the threads" loop in ThreadDumpRequestHandler:

          NamedList<NamedList<Object>> lst = new NamedList<NamedList<Object>>();
          for (ThreadInfo ti : tinfos)

          { lst.add( "thread", getThreadInfo( ti ) ); }

          Outputs this:

          <lst name="thread">
          <long name="id">35</long>
          <str name="name">P1-19</str>
          <str name="state">RUNNABLE</str>...

          where the name="thread" attribute can be used to decide how to present the contents of the <lst> element.

          Thinking about it, we might want to add a "datatype" attribute to these lists, to use when presenting them?

          <lst datatype="java.lang.Thread">

          would help present all Thread info in a consistent way, no matter where it comes from.

          Show
          Bertrand Delacretaz added a comment - I briefly tested this, it looks very useful, and the different RequestHandlers make the code very modular, way to go! One nitpick: I'd use "system.properties" instead of "properties", it's more precise. And two suggestions related to XSLT transformations for presentation: 1) It'd be good to systematically include in the output XML the class name of the SolrRequestHandler used. XSLT transforms can then use this info to adapt themselves to the information being output. 2) It'd be good to name <lst> elements, as much as possible, also to allow XSLT transforms to adapt themselves to the content. For example, using a NamedList instead of an ArralyList in the "Now show all the threads" loop in ThreadDumpRequestHandler: NamedList<NamedList<Object>> lst = new NamedList<NamedList<Object>>(); for (ThreadInfo ti : tinfos) { lst.add( "thread", getThreadInfo( ti ) ); } Outputs this: <lst name="thread"> <long name="id">35</long> <str name="name">P1-19</str> <str name="state">RUNNABLE</str>... where the name="thread" attribute can be used to decide how to present the contents of the <lst> element. Thinking about it, we might want to add a "datatype" attribute to these lists, to use when presenting them? <lst datatype="java.lang.Thread"> would help present all Thread info in a consistent way, no matter where it comes from.
          Hide
          Ryan McKinley added a comment -

          Thanks for your feedback, here is an updated version that:

          1. replaces NamedList<> with Map<String,> wherever possible. This makes the direct XML output look funny (the stack trace is displayed before the thread name), but it is probably a good idea so clients can easily access stuff by name.

          2. I added a parameter "echoHandler" that behaves just like "echoParams" - it writes the handler name to the responseHeader.

          3. I added the default params echoHandler=true and echoParams=explicit to all the /admin/* handlers. This gets a bit verbose and will be helped by

          4. I moved the responseHeader writing from SolrCore to RequestBaseHandler. This is good because RequestHandler authors control the header more explicitly if necessary.

          5. added a name to each thread in the thread list. I don't see any other lists without names, but i could be missing something.

          6. changed the output in PropertiesRequestHandler from "properties" to "system.properties"

          Show
          Ryan McKinley added a comment - Thanks for your feedback, here is an updated version that: 1. replaces NamedList<> with Map<String,> wherever possible. This makes the direct XML output look funny (the stack trace is displayed before the thread name), but it is probably a good idea so clients can easily access stuff by name. 2. I added a parameter "echoHandler" that behaves just like "echoParams" - it writes the handler name to the responseHeader. 3. I added the default params echoHandler=true and echoParams=explicit to all the /admin/* handlers. This gets a bit verbose and will be helped by 4. I moved the responseHeader writing from SolrCore to RequestBaseHandler. This is good because RequestHandler authors control the header more explicitly if necessary. 5. added a name to each thread in the thread list. I don't see any other lists without names, but i could be missing something. 6. changed the output in PropertiesRequestHandler from "properties" to "system.properties"
          Hide
          Ryan McKinley added a comment -

          3. ... will be helped by SOLR-112

          Show
          Ryan McKinley added a comment - 3. ... will be helped by SOLR-112
          Hide
          Yonik Seeley added a comment -

          > 1. replaces NamedList<> with Map<String,> wherever possible. This makes the direct XML output look funny (the stack > trace is displayed before the thread name), but it is probably a good idea so clients can easily access stuff by name.

          If you want map output semantics (important distinction for other formats like JSON), but the ability to control order w/o the overhead of LinkedHashMap, see SimpleOrderedMap.
          It subclasses NamedList, so it's easy to convert code that previously used NamedList.

          Show
          Yonik Seeley added a comment - > 1. replaces NamedList<> with Map<String,> wherever possible. This makes the direct XML output look funny (the stack > trace is displayed before the thread name), but it is probably a good idea so clients can easily access stuff by name. If you want map output semantics (important distinction for other formats like JSON), but the ability to control order w/o the overhead of LinkedHashMap, see SimpleOrderedMap. It subclasses NamedList, so it's easy to convert code that previously used NamedList.
          Hide
          Ryan McKinley added a comment -

          using SimpleOrderedMap - this keeps nice ordering for XML and uses map syntax for JSON/ruby.

          thanks Yonik!

          Show
          Ryan McKinley added a comment - using SimpleOrderedMap - this keeps nice ordering for XML and uses map syntax for JSON/ruby. thanks Yonik!
          Hide
          Hoss Man added a comment -

          i barely started to scratch the surface looking at this patch while checking something for SOLR-142, but before i forget i wanted to raise one red flag...

          this patch removes the call to setResponseHeaderValues from SolrCore and moves it to RequestHandlerBase with the assumption that RequestHandlers can take care of it ... this is a bad idea since there is no requirement that RequestHandler extend that class – any solr1.1 users who have written their own request handlers will be screwed.

          Show
          Hoss Man added a comment - i barely started to scratch the surface looking at this patch while checking something for SOLR-142 , but before i forget i wanted to raise one red flag... this patch removes the call to setResponseHeaderValues from SolrCore and moves it to RequestHandlerBase with the assumption that RequestHandlers can take care of it ... this is a bad idea since there is no requirement that RequestHandler extend that class – any solr1.1 users who have written their own request handlers will be screwed.
          Hide
          Ryan McKinley added a comment -

          yes, that is a problem!

          I'll wait till SOLR-142 gets checked in, then post an update that puts the header stuff back in SolrCore

          Show
          Ryan McKinley added a comment - yes, that is a problem! I'll wait till SOLR-142 gets checked in, then post an update that puts the header stuff back in SolrCore
          Hide
          Ryan McKinley added a comment -

          I put setResponseHeaderValues from SolrCore.

          Uses the automatic configuration and utility classes from SOLR-85

          <admin>
          <registerStandardHandlers>/admin</registerStandardHandlers>
          ...
          </admin>

          (perhaps this and SOLR-85 should be combined?)

          Show
          Ryan McKinley added a comment - I put setResponseHeaderValues from SolrCore. Uses the automatic configuration and utility classes from SOLR-85 <admin> <registerStandardHandlers>/admin</registerStandardHandlers> ... </admin> (perhaps this and SOLR-85 should be combined?)
          Hide
          Ryan McKinley added a comment -

          oops, had the link the wrong way

          Show
          Ryan McKinley added a comment - oops, had the link the wrong way
          Hide
          Ryan McKinley added a comment -

          Updated to:

          • use SOLR-182 rather then implement its own dynamic loading at startup
          • apply without conflicts
          • use Luke 0.7 style to represent more field properties:

          "key":

          { "I":"Indexed", "T":"Tokenized", "S":"Stored", "M":"Multivalued", "V":"TermVector Stored", "o":"Store Offset With TermVector", "p":"Store Position With TermVector", "O":"Omit Norms", "L":"Lazy", "B":"Binary", "C":"Compressed", "f":"Sort Missing First", "l":"Sort Missing Last"}

          ,
          "fields":{
          "id":{
          "type":"string",
          "schema":"I-S---O---l",
          "flags":"I-S---O----",
          ...

          Show
          Ryan McKinley added a comment - Updated to: use SOLR-182 rather then implement its own dynamic loading at startup apply without conflicts use Luke 0.7 style to represent more field properties: "key": { "I":"Indexed", "T":"Tokenized", "S":"Stored", "M":"Multivalued", "V":"TermVector Stored", "o":"Store Offset With TermVector", "p":"Store Position With TermVector", "O":"Omit Norms", "L":"Lazy", "B":"Binary", "C":"Compressed", "f":"Sort Missing First", "l":"Sort Missing Last"} , "fields":{ "id":{ "type":"string", "schema":"I-S--- O ---l", "flags":"I-S--- O ----", ...
          Hide
          Ryan McKinley added a comment -

          Major updates to the 'luke' part. It is getting really good!

          I removed everything that complicated integration - rather then try to replace exiting /admin/xxx.jsp, this will sit next to it until someone wants to make a nice XSLT thing so we can remove the jsp/jdk requirement.

          This includes request handlers for:

          <requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
          <requestHandler name="/admin/system" class="org.apache.solr.handler.admin.SystemInfoHandler" />
          <requestHandler name="/admin/plugins" class="org.apache.solr.handler.admin.PluginInfoHandler" />
          <requestHandler name="/admin/threads" class="org.apache.solr.handler.admin.ThreadDumpHandler" />
          <requestHandler name="/admin/properties" class="org.apache.solr.handler.admin.PropertiesRequestHandler" />

          The only two i really care about are:
          LukeRequestHandler and SystemInfoHandler
          If removing the others makes anyone happier, its fine with me.

          I also started a wiki page for documentation:
          http://wiki.apache.org/solr/LukeRequestHandler

          I think this is almost ready to commit.

          Show
          Ryan McKinley added a comment - Major updates to the 'luke' part. It is getting really good! I removed everything that complicated integration - rather then try to replace exiting /admin/xxx.jsp, this will sit next to it until someone wants to make a nice XSLT thing so we can remove the jsp/jdk requirement. This includes request handlers for: <requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" /> <requestHandler name="/admin/system" class="org.apache.solr.handler.admin.SystemInfoHandler" /> <requestHandler name="/admin/plugins" class="org.apache.solr.handler.admin.PluginInfoHandler" /> <requestHandler name="/admin/threads" class="org.apache.solr.handler.admin.ThreadDumpHandler" /> <requestHandler name="/admin/properties" class="org.apache.solr.handler.admin.PropertiesRequestHandler" /> The only two i really care about are: LukeRequestHandler and SystemInfoHandler If removing the others makes anyone happier, its fine with me. I also started a wiki page for documentation: http://wiki.apache.org/solr/LukeRequestHandler I think this is almost ready to commit.
          Hide
          Ryan McKinley added a comment -

          commited

          Show
          Ryan McKinley added a comment - commited
          Hide
          Hoss Man added a comment -

          This bug was modified as part of a bulk update using the criteria...

          • Marked ("Resolved" or "Closed") and "Fixed"
          • Had no "Fix Version" versions
          • Was listed in the CHANGES.txt for 1.2

          The Fix Version for all 39 issues found was set to 1.2, email notification
          was suppressed to prevent excessive email.

          For a list of all the issues modified, search jira comments for this
          (hopefully) unique string: 20080415hossman2

          Show
          Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.2 The Fix Version for all 39 issues found was set to 1.2, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman2

            People

            • Assignee:
              Ryan McKinley
              Reporter:
              Ryan McKinley
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development