Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      User should be able to boost a query by a function of other fields
      Some background: http://www.nabble.com/boosting-a-query-by-a-function-of-other-fields-tf4387856.html#a12510092

      1. FunctionQuery.patch
        58 kB
        Yonik Seeley
      2. FunctionQuery.patch
        26 kB
        Yonik Seeley
      3. linear_combination.patch
        7 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        This issue is more for a top-down approach (less concerned with underlying implementation, more with just making it possible via http get).

        • We need ways to add multiple fields together, multiply fields together, etc.
        • We need a scaling function... for example, the range of all the components may not be known, but one may want the final value to be between 1 and 2. For something like this to be done efficiently, it seems like a min and max would be needed on ValueSource? Barring that, this value source would need to iterate over all documents and scale appropriately. Perhaps do the latter for the first implementation and we could always optimize later?
        Show
        Yonik Seeley added a comment - This issue is more for a top-down approach (less concerned with underlying implementation, more with just making it possible via http get). We need ways to add multiple fields together, multiply fields together, etc. We need a scaling function... for example, the range of all the components may not be known, but one may want the final value to be between 1 and 2. For something like this to be done efficiently, it seems like a min and max would be needed on ValueSource? Barring that, this value source would need to iterate over all documents and scale appropriately. Perhaps do the latter for the first implementation and we could always optimize later?
        Hide
        Yonik Seeley added a comment -

        This patch extends the linear function to include other value sources.
        so instead of
        linear(a,2,3) // a*2+3
        you can do
        linear(a,2,3,b,4,c,5) // a*2+3+b*4+c*5

        Everyone OK with extending LinearFloatFunction this way? It seemed like the simplest way to get this capability.

        Show
        Yonik Seeley added a comment - This patch extends the linear function to include other value sources. so instead of linear(a,2,3) // a*2+3 you can do linear(a,2,3,b,4,c,5) // a*2+3+b*4+c*5 Everyone OK with extending LinearFloatFunction this way? It seemed like the simplest way to get this capability.
        Hide
        Yonik Seeley added a comment -

        And yes, the syntax is less than ideal.... it would be nice to extend the parser to accept infix, but that's a different issue for another day.

        Show
        Yonik Seeley added a comment - And yes, the syntax is less than ideal.... it would be nice to extend the parser to accept infix, but that's a different issue for another day.
        Hide
        Hoss Man added a comment -

        the idea that funcname(a,b,c,d,e,f,g,....) maps to a*b + 1*c + d*e + f*g + ... seems a little weird, even weirder that funcname is "linear" which in theory only has 3 inputs.

        wouldn't a new "sum(....)" function make more sense? ...

        sum(linear(a,2,3), linear(b,4,0), linear(c,5,0))

        (the key being that "sum" takes in a list of other ValueSources, while linear still takes only 1 and some constants)

        we'd also need a "mult" function right?

        Show
        Hoss Man added a comment - the idea that funcname(a,b,c,d,e,f,g,....) maps to a*b + 1*c + d*e + f*g + ... seems a little weird, even weirder that funcname is "linear" which in theory only has 3 inputs. wouldn't a new "sum(....)" function make more sense? ... sum(linear(a,2,3), linear(b,4,0), linear(c,5,0)) (the key being that "sum" takes in a list of other ValueSources, while linear still takes only 1 and some constants) we'd also need a "mult" function right?
        Hide
        Yonik Seeley added a comment -

        Yes, I suppose that sum() makes more sense... and ValueSources are so light weight it shouldn't really cause a performance difference. Although I think perhaps a constant should be allowed in sum(), otherwise people will be trying the very reasonable sum(a,1) and it won't work. Of course we could make a ConstantValueSource, so constants and value-sources would be interchangeable...

        For "mult", I was thinking "product".

        Also perhaps a map() to pseudo-handle default values.
        product(x, map(y,0,1)) // map all zeros to 1 before multiplying

        Show
        Yonik Seeley added a comment - Yes, I suppose that sum() makes more sense... and ValueSources are so light weight it shouldn't really cause a performance difference. Although I think perhaps a constant should be allowed in sum(), otherwise people will be trying the very reasonable sum(a,1) and it won't work. Of course we could make a ConstantValueSource, so constants and value-sources would be interchangeable... For "mult", I was thinking "product". Also perhaps a map() to pseudo-handle default values. product(x, map(y,0,1)) // map all zeros to 1 before multiplying
        Hide
        Yonik Seeley added a comment -

        Attaching functionquery.patch... adding many new functions:

        log, sqrt, abs, sum, product, const
        scale(source, min, max) // scale values from source to fall between min and max
        map(source,min,max,target) // change any values that fall between min&max to target
        query(lucene_query, defaultVal) // use relevancy score, w/ defaultVal when doc doesn't match

        I also changed the function parser to use a hash lookup on function name since the list was getting long. The parser will now also accept a float constant as a value source, allowing things like add(a,2)

        This is just a snapshot for feedback... no tests yet.

        Show
        Yonik Seeley added a comment - Attaching functionquery.patch... adding many new functions: log, sqrt, abs, sum, product, const scale(source, min, max) // scale values from source to fall between min and max map(source,min,max,target) // change any values that fall between min&max to target query(lucene_query, defaultVal) // use relevancy score, w/ defaultVal when doc doesn't match I also changed the function parser to use a hash lookup on function name since the list was getting long. The parser will now also accept a float constant as a value source, allowing things like add(a,2) This is just a snapshot for feedback... no tests yet.
        Hide
        Yonik Seeley added a comment -

        OK, my next step is to do tests for all these new functions provided no one sees an issue with the general approach.
        I chose to stick with VaueSource as the basis for new functions rather than CustomScoreQuery because of greater complexities with weights, scorers, explanations, etc, and performance issues wrt scorers (skipTo, next needing to know deleted docs).

        Show
        Yonik Seeley added a comment - OK, my next step is to do tests for all these new functions provided no one sees an issue with the general approach. I chose to stick with VaueSource as the basis for new functions rather than CustomScoreQuery because of greater complexities with weights, scorers, explanations, etc, and performance issues wrt scorers (skipTo, next needing to know deleted docs).
        Hide
        Yonik Seeley added a comment -

        Attatching a new version of the patch with tests.

        Show
        Yonik Seeley added a comment - Attatching a new version of the patch with tests.
        Hide
        Tom Hill added a comment -

        It looks like you are removing the hard coded parsing of the functions in QueryParsing.java.

        How hard would it be to allow user configured functions? We use several of our own functions, and currently have to hack up QueryParsing.java.

        I'd much prefer to be able to add the classname to the config file, and have it just be picked up and added to vsParsers.

        Show
        Tom Hill added a comment - It looks like you are removing the hard coded parsing of the functions in QueryParsing.java. How hard would it be to allow user configured functions? We use several of our own functions, and currently have to hack up QueryParsing.java. I'd much prefer to be able to add the classname to the config file, and have it just be picked up and added to vsParsers.
        Hide
        Yonik Seeley added a comment -

        Being able to add new functions to the parser is probably useful to people... I opened SOLR-356 for this.

        Show
        Yonik Seeley added a comment - Being able to add new functions to the parser is probably useful to people... I opened SOLR-356 for this.
        Hide
        Yonik Seeley added a comment -

        Added pow(a,b) and div(a,b) and committed.
        Still open for discussion+changes of course.

        Show
        Yonik Seeley added a comment - Added pow(a,b) and div(a,b) and committed. Still open for discussion+changes of course.
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked "Resolved" and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15

        The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked "Resolved" and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.3 as of today 2008-03-15 The Fix Version for all 29 issues found was set to 1.3, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: batch20070315hossman1

          People

          • Assignee:
            Unassigned
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development