Solr
  1. Solr
  2. SOLR-7406

Support DV implementation in range faceting

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      interval faceting has a different implementation than range faceting based on DocValues API. This is sometimes faster and doesn't rely on filters / filter cache.
      I'm planning to add a "method" parameter that would allow users to choose between the current implementation ("filter"?) and the DV-based implementation ("dv"?). The result for both methods should be the same, but performance may vary.
      Default should continue to be "filter".

      1. SOLR-7406.patch
        64 kB
        Tomás Fernández Löbbe
      2. SOLR-7406.patch
        64 kB
        Tomás Fernández Löbbe
      3. SOLR-7406.patch
        57 kB
        Tomás Fernández Löbbe
      4. SOLR-7406.patch
        44 kB
        Tomás Fernández Löbbe

        Activity

        Hide
        Tomás Fernández Löbbe added a comment -

        Here is the initial patch. Still TODO:

        • javadocs
        • distributed test.
        • validate requests with grouping
        • Replace hardcoded strings with constants
        • Test hard end setting

        It has a random tests to compare both methods and make sure both return the same thing, although it accepts the results to be in different order.

        Show
        Tomás Fernández Löbbe added a comment - Here is the initial patch. Still TODO: javadocs distributed test. validate requests with grouping Replace hardcoded strings with constants Test hard end setting It has a random tests to compare both methods and make sure both return the same thing, although it accepts the results to be in different order.
        Hide
        Tomás Fernández Löbbe added a comment -

        I did the following benchmark on my mac:
        Geonames dataset (added 4 times making a total of 33.3M docs)
        Based on Solr's basic configset, just added the following fields:

           <field name="name" type="text_general"/>
           <field name="alternatenames" type="text_general" multiValued="true"/>
           <field name="latitude" type="double" docValues="true"/>
           <field name="longitude" type="double" docValues="true"/>
           <field name="feature_class" type="string"/>
           <field name="feature_code" type="string"/>
           <field name="country_code" type="string"/>
           <field name="cc2" type="string"/>
           <field name="admin1_code" type="string"/>
           <field name="admin2_code" type="string"/>
           <field name="admin3_code" type="string"/>
           <field name="admin4_code" type="string"/>
           <field name="population" type="long" docValues="true"/>
           <field name="elevation" type="int" docValues="true"/>
           <field name="dem" type="int" docValues="true"/>
           <field name="timezone" type="string"/>
           <field name="modification_date" type="string"/>
        

        AutoSoftCommit every second.
        AutoCommit every 15 seconds with openSearcher=false
        Updating one doc per second.
        Using Solr start script without modification to start.ini.sh
        "dem" and "population" have both docValues=true.
        All times are in milliseconds
        Single thread doing almost 5k different boolean queries
        On "dem" field:

        facet=true
        facet.range=dem
        facet.range.start=0
        facet.range.end=200
        facet.range.gap=1
        facet.range.method=filter/dv
        
        Method Min Max Average p10 p50 p90 p99
        Filter 77 3514 1141.5 1040 1128 1263 1374
        DV 47 1988 166.0 88 151 262 368

        On "population" field:

        facet=true
        facet.range=population
        facet.range.start=0
        facet.range.end=2000
        facet.range.gap=5
        facet.range.method=filter/dv
        
        Method Min Max Average p10 p50 p90 p99
        Filter 3 2055 321.1 47 70 891 955
        DV 10 972 67.7 35 60 102 150
        Show
        Tomás Fernández Löbbe added a comment - I did the following benchmark on my mac: Geonames dataset (added 4 times making a total of 33.3M docs) Based on Solr's basic configset, just added the following fields: <field name= "name" type= "text_general" /> <field name= "alternatenames" type= "text_general" multiValued= "true" /> <field name= "latitude" type= "double" docValues= "true" /> <field name= "longitude" type= "double" docValues= "true" /> <field name= "feature_class" type= "string" /> <field name= "feature_code" type= "string" /> <field name= "country_code" type= "string" /> <field name= "cc2" type= "string" /> <field name= "admin1_code" type= "string" /> <field name= "admin2_code" type= "string" /> <field name= "admin3_code" type= "string" /> <field name= "admin4_code" type= "string" /> <field name= "population" type= "long" docValues= "true" /> <field name= "elevation" type= "int" docValues= "true" /> <field name= "dem" type= "int" docValues= "true" /> <field name= "timezone" type= "string" /> <field name= "modification_date" type= "string" /> AutoSoftCommit every second. AutoCommit every 15 seconds with openSearcher=false Updating one doc per second. Using Solr start script without modification to start.ini.sh "dem" and "population" have both docValues=true. All times are in milliseconds Single thread doing almost 5k different boolean queries On "dem" field: facet=true facet.range=dem facet.range.start=0 facet.range.end=200 facet.range.gap=1 facet.range.method=filter/dv Method Min Max Average p10 p50 p90 p99 Filter 77 3514 1141.5 1040 1128 1263 1374 DV 47 1988 166.0 88 151 262 368 On "population" field: facet=true facet.range=population facet.range.start=0 facet.range.end=2000 facet.range.gap=5 facet.range.method=filter/dv Method Min Max Average p10 p50 p90 p99 Filter 3 2055 321.1 47 70 891 955 DV 10 972 67.7 35 60 102 150
        Hide
        Tomás Fernández Löbbe added a comment -

        New patch

        • Added javadocs.
        • Replaced hardcoded strings with constants and enums.
        • Added a distributed test.
        • Added tests for facet.range.hardend setting (for new and old method).
        • In case of a group Faceting request or a DateRangeField field, always use the "filter" method, even if users specify facet.range.method=dv. Log a warning if this happens.

        I think the patch is pretty much done, I'll add a couple of tests with bad requests and commit after that unless someone has any concerns.

        Show
        Tomás Fernández Löbbe added a comment - New patch Added javadocs. Replaced hardcoded strings with constants and enums. Added a distributed test. Added tests for facet.range.hardend setting (for new and old method). In case of a group Faceting request or a DateRangeField field, always use the "filter" method, even if users specify facet.range.method=dv . Log a warning if this happens. I think the patch is pretty much done, I'll add a couple of tests with bad requests and commit after that unless someone has any concerns.
        Hide
        Tomás Fernández Löbbe added a comment -

        New patch with some more tests with multiValued=true/false. Added tests for bad requests.
        Shalin Shekhar Mangar, I think this patch conflicts with your work in SOLR-4212, would you mind taking a quick look? I think this patch is mostly ready.

        Show
        Tomás Fernández Löbbe added a comment - New patch with some more tests with multiValued=true/false. Added tests for bad requests. Shalin Shekhar Mangar , I think this patch conflicts with your work in SOLR-4212 , would you mind taking a quick look? I think this patch is mostly ready.
        Hide
        Shalin Shekhar Mangar added a comment -

        Hi Tomás Fernández Löbbe, I wasn't able to look at the patch today but I'll do that tomorrow morning (India time).

        Show
        Shalin Shekhar Mangar added a comment - Hi Tomás Fernández Löbbe , I wasn't able to look at the patch today but I'll do that tomorrow morning (India time).
        Hide
        Shalin Shekhar Mangar added a comment -

        Hi Tomás Fernández Löbbe, I skimmed your patch and it looks fine to me. I will have to resolve some conflicts in SOLR-4212 to use the new ParsedParams class and to pull the new getFacetRangeCountsDocValues method into the RangeFacetProcessor class introduced in SOLR-4212 but that's okay. Given that it's ready and that I am waiting for Hoss to review SOLR-4212, I won't stop you from committing it in its current form.

        Show
        Shalin Shekhar Mangar added a comment - Hi Tomás Fernández Löbbe , I skimmed your patch and it looks fine to me. I will have to resolve some conflicts in SOLR-4212 to use the new ParsedParams class and to pull the new getFacetRangeCountsDocValues method into the RangeFacetProcessor class introduced in SOLR-4212 but that's okay. Given that it's ready and that I am waiting for Hoss to review SOLR-4212 , I won't stop you from committing it in its current form.
        Hide
        David Smiley added a comment -
        Show
        David Smiley added a comment - Nice Tomás Fernández Löbbe !
        Hide
        Tomás Fernández Löbbe added a comment -

        thanks Shalin Shekhar Mangar and David Smiley. New patch updated to trunk

        Show
        Tomás Fernández Löbbe added a comment - thanks Shalin Shekhar Mangar and David Smiley . New patch updated to trunk
        Hide
        ASF subversion and git services added a comment -

        Commit 1675706 from Tomás Fernández Löbbe in branch 'dev/trunk'
        [ https://svn.apache.org/r1675706 ]

        SOLR-7406: Add facet.range.method parameter with options 'filter' and 'dv' for range faceting

        Show
        ASF subversion and git services added a comment - Commit 1675706 from Tomás Fernández Löbbe in branch 'dev/trunk' [ https://svn.apache.org/r1675706 ] SOLR-7406 : Add facet.range.method parameter with options 'filter' and 'dv' for range faceting
        Hide
        ASF subversion and git services added a comment -

        Commit 1675711 from Tomás Fernández Löbbe in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1675711 ]

        SOLR-7406: Add facet.range.method parameter with options 'filter' and 'dv' for range faceting

        Show
        ASF subversion and git services added a comment - Commit 1675711 from Tomás Fernández Löbbe in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1675711 ] SOLR-7406 : Add facet.range.method parameter with options 'filter' and 'dv' for range faceting
        Hide
        Gregg Donovan added a comment -

        One minor bit - is the import of org.apache.log4j.Logger in SimpleFacets rather than SLF4J intended? I noticed this when running faceting within our unit tests, which don't have log4J on the classpath.

        Show
        Gregg Donovan added a comment - One minor bit - is the import of org.apache.log4j.Logger in SimpleFacets rather than SLF4J intended? I noticed this when running faceting within our unit tests, which don't have log4J on the classpath.
        Hide
        Tomás Fernández Löbbe added a comment -

        Reopen for changing import to slf4j

        Show
        Tomás Fernández Löbbe added a comment - Reopen for changing import to slf4j
        Hide
        ASF subversion and git services added a comment -

        Commit 1682675 from Tomás Fernández Löbbe in branch 'dev/trunk'
        [ https://svn.apache.org/r1682675 ]

        SOLR-7406: Use slf4j logger instead of log4j in SimpleFacets

        Show
        ASF subversion and git services added a comment - Commit 1682675 from Tomás Fernández Löbbe in branch 'dev/trunk' [ https://svn.apache.org/r1682675 ] SOLR-7406 : Use slf4j logger instead of log4j in SimpleFacets
        Hide
        ASF subversion and git services added a comment -

        Commit 1682676 from Tomás Fernández Löbbe in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1682676 ]

        SOLR-7406: Use slf4j logger instead of log4j in SimpleFacets

        Show
        ASF subversion and git services added a comment - Commit 1682676 from Tomás Fernández Löbbe in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1682676 ] SOLR-7406 : Use slf4j logger instead of log4j in SimpleFacets
        Hide
        ASF subversion and git services added a comment -

        Commit 1682677 from Tomás Fernández Löbbe in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1682677 ]

        SOLR-7406: Use slf4j logger instead of log4j in SimpleFacets

        Show
        ASF subversion and git services added a comment - Commit 1682677 from Tomás Fernández Löbbe in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1682677 ] SOLR-7406 : Use slf4j logger instead of log4j in SimpleFacets
        Hide
        Tomás Fernández Löbbe added a comment -

        Thanks for catching this issue Gregg Donovan

        Show
        Tomás Fernández Löbbe added a comment - Thanks for catching this issue Gregg Donovan
        Hide
        Gregg Donovan added a comment -

        Thanks, Tomás!

        Show
        Gregg Donovan added a comment - Thanks, Tomás!
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            Tomás Fernández Löbbe
            Reporter:
            Tomás Fernández Löbbe
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development