Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, Trunk
    • Component/s: update
    • Labels:

      Description

      Processor which reads a string field and outputs a float field with a boost value if the input string matched one of several RegEx.
      The processor reads a separate file with one RegEx per line with associated boost value.
      We used it to (de)boost web pages based on URL patterns. Could be used for many other use cases as well

      Kindly donated by Oslo University

      1. SOLR-2827.patch
        16 kB
        Jan Høydahl
      2. SOLR-2827.patch
        17 kB
        Jan Høydahl

        Activity

        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Jan Høydahl
        http://svn.apache.org/viewvc?view=revision&revision=1440944

        SOLR-2827: RegexpBoost Update Processor

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Jan Høydahl http://svn.apache.org/viewvc?view=revision&revision=1440944 SOLR-2827 : RegexpBoost Update Processor
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Jan Høydahl
        http://svn.apache.org/viewvc?view=revision&revision=1440940

        SOLR-2827: RegexpBoost Update Processor

        Show
        Commit Tag Bot added a comment - [trunk commit] Jan Høydahl http://svn.apache.org/viewvc?view=revision&revision=1440940 SOLR-2827 : RegexpBoost Update Processor
        Hide
        Jan Høydahl added a comment -

        New version of patch which passes tests and precommit.

        Show
        Jan Høydahl added a comment - New version of patch which passes tests and precommit.
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Erik Hatcher added a comment -

        How about we make this one an example of using the new script update processor rather than baked in Java classes?

        Show
        Erik Hatcher added a comment - How about we make this one an example of using the new script update processor rather than baked in Java classes?
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Jan Høydahl added a comment -

        Example usage:

        <processor class="org.apache.solr.update.processor.RegexpBoostProcessorFactory">
          <bool name="enabled">true</bool>
          <str name="inputField">url</str>
          <str name="boostField">urlboost</str>
          <str name="boostFilename">${solr.solr.home}/conf/rank/urlboosts.txt</str>
        </processor>
        

        Sample urlboosts.txt file:

        # Sample config file for RegexBoostProcessor
        # This example applies boost on the "url" field to boost or deboost certain urls
        # All rules are evaluated, and if several of them match, the boosts are multiplied.
        # If for example one rule with boost 2.0 and one rule with boost 0.1 match, the resulting urlboost=0.2
        
        https?://[^/]+/old/.* 0.1		#Comments are removed
        https?://[^/]+/.*index\([0-9]\).html$	0.5
        
        # Prioritize certain sites over others
        https?://www.mydomain.no/.*	1.5
        

        The output boost field can then be used query time to tune relevance.

        Show
        Jan Høydahl added a comment - Example usage: <processor class= "org.apache.solr.update.processor.RegexpBoostProcessorFactory" > <bool name= "enabled" > true </bool> <str name= "inputField" >url</str> <str name= "boostField" >urlboost</str> <str name= "boostFilename" >${solr.solr.home}/conf/rank/urlboosts.txt</str> </processor> Sample urlboosts.txt file: # Sample config file for RegexBoostProcessor # This example applies boost on the "url" field to boost or deboost certain urls # All rules are evaluated, and if several of them match, the boosts are multiplied. # If for example one rule with boost 2.0 and one rule with boost 0.1 match, the resulting urlboost=0.2 https?://[^/]+/old/.* 0.1 #Comments are removed https?://[^/]+/.*index\([0-9]\).html$ 0.5 # Prioritize certain sites over others https?://www.mydomain.no/.* 1.5 The output boost field can then be used query time to tune relevance.
        Hide
        Jan Høydahl added a comment -

        Here's the patch. This has been running in production for a few months.

        Show
        Jan Høydahl added a comment - Here's the patch. This has been running in production for a few months.

          People

          • Assignee:
            Jan Høydahl
            Reporter:
            Jan Høydahl
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development