Solr
  1. Solr
  2. SOLR-3094

The statistics entry on the new admin UI is very slow

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:
      None
    • Environment:

      trunk only, all environments

      Description

      Prompted by Robert Reynolds (SOLR-2667), the entry point in the new Admin UI core drill down (e.g. clicking "singlecore" takes a long time. 28-46 minutes on a 13M-23M doc set.

      On an example Wikipedia index (11M) docs, I see 21 seconds, compared to less than 2 seconds in the old admin UI (I'm using the old admin UI linked to from the new UI page on trunk). I have a very simple index layout compared to a commercial site. Clearly something is not right. I suspect that all the terms are being walked.

      This is particularly an issue because this behavior happens when I click "singlecore", so getting to the really neat parts of the new UI is hard.

      Robert reports on a separate thread that the same behavior happens just hitting admin/luke in the URL which is also slow in the 3.x world, which hints at where the problem lies.

      I'm going to guess that the terms are being walked and we can use the tricks used in SOLR-1931 to deal with the fact that admin/luke takes a long time, and just change the call to the entry ("singlecore") for this issue.

      Robert: Thanks for pointing this out!

      1. SOLR-3094.patch
        0.6 kB
        Erick Erickson
      2. SOLR-3904-Faster-NewUI-Statspage.patch
        1 kB
        Greg Bowyer

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -
          Show
          Erick Erickson added a comment - See SOLR-3121
          Hide
          Greg Bowyer added a comment -

          So for the initial stats page, a patch is attached that avoids enumerating the terms for the fields. This still is a little redundant in that it brings back the field names for the initial core stats page, but it does not incure the large costs others have seen.

          I have tested this locally, on my local index it takes about 9 seconds to bring back the core dashboard initially, 93ms with the attached patch

          I will take a look to see what can be done about the scheme browser.

          Show
          Greg Bowyer added a comment - So for the initial stats page, a patch is attached that avoids enumerating the terms for the fields. This still is a little redundant in that it brings back the field names for the initial core stats page, but it does not incure the large costs others have seen. I have tested this locally, on my local index it takes about 9 seconds to bring back the core dashboard initially, 93ms with the attached patch I will take a look to see what can be done about the scheme browser.
          Hide
          Erick Erickson added a comment -

          OK, anyone with good javascript skills, this would be a good time to chime in...

          This is a variant of SOLR-1931. The new UI calls Luke at the top level in such a way that it enumerates all the terms in all the fields to gather the histogram data, which takes a long time. Note, this is what the old admin UI/Luke handler did when you clicked "schema browser" link.

          Once that data is accumulated, then clicking on the individual fields and showing that data is very fast since the data is local. But this data is accumulated before any field is selected from the "schema browser" drop-down and stored away.

          I think this design is too costly, especially the "get all the data for all the fields up-front" bit. The users pay a penalty (many minutes demonstrated) even when they may only care about one field. So here's what I propose.

          1> Tweak the LukeRequestHandler so it requires the fieldName parameter to gather the historgram data. That fixes the initial display of the stats issue that sparked this JIRA. I can do that in a few minutes, patch attached (do not commit yet, though). Problem is there is then no way at all to get the stats data.

          2> Tweak the javascript to call the luke request handler to collect the data for individual fields only when the user selects them from the drop-down, stowing them away at that point so they can be revisited if desired. Here's where I could use some help, my javascript skills are rudimentary at best. If anyone could work the javascript I'd be happy to field test. Or even just put some comments in the code pointing me to them. Any trunk code from after 6-Jan will have the right Luke handler in it (see SOLR-1931).

          There's also something wrong with the display of the histogram, the "bucket" and count in each bucket are mashed together on the bottom. With non-trivial indexes, this is largely unreadable since they're side-by-side...

          Anyway, the attached patch makes it so you can get into the admin page without paying the above penalties, but you never get histogram data when you go into "schema browser". If someone applies this to work on the admin UI bit, attaching "&fl=field1 field2" to the luke URL will cause the histogram data to be returned for the field(s) specified.

          If anyone has some spare cycles to help out here it would be outstanding.

          I think something similar could be done for the old admin UI as well in terms of only getting the fields when requested, otherwise the histogram data won't be returned either...

          Show
          Erick Erickson added a comment - OK, anyone with good javascript skills, this would be a good time to chime in... This is a variant of SOLR-1931 . The new UI calls Luke at the top level in such a way that it enumerates all the terms in all the fields to gather the histogram data, which takes a long time. Note, this is what the old admin UI/Luke handler did when you clicked "schema browser" link. Once that data is accumulated, then clicking on the individual fields and showing that data is very fast since the data is local. But this data is accumulated before any field is selected from the "schema browser" drop-down and stored away. I think this design is too costly, especially the "get all the data for all the fields up-front" bit. The users pay a penalty (many minutes demonstrated) even when they may only care about one field. So here's what I propose. 1> Tweak the LukeRequestHandler so it requires the fieldName parameter to gather the historgram data. That fixes the initial display of the stats issue that sparked this JIRA. I can do that in a few minutes, patch attached (do not commit yet, though). Problem is there is then no way at all to get the stats data. 2> Tweak the javascript to call the luke request handler to collect the data for individual fields only when the user selects them from the drop-down, stowing them away at that point so they can be revisited if desired. Here's where I could use some help, my javascript skills are rudimentary at best. If anyone could work the javascript I'd be happy to field test. Or even just put some comments in the code pointing me to them. Any trunk code from after 6-Jan will have the right Luke handler in it (see SOLR-1931 ). There's also something wrong with the display of the histogram, the "bucket" and count in each bucket are mashed together on the bottom. With non-trivial indexes, this is largely unreadable since they're side-by-side... Anyway, the attached patch makes it so you can get into the admin page without paying the above penalties, but you never get histogram data when you go into "schema browser". If someone applies this to work on the admin UI bit, attaching "&fl=field1 field2" to the luke URL will cause the histogram data to be returned for the field(s) specified. If anyone has some spare cycles to help out here it would be outstanding. I think something similar could be done for the old admin UI as well in terms of only getting the fields when requested, otherwise the histogram data won't be returned either...

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development