Solr
  1. Solr
  2. SOLR-2444

Update fl syntax to support: pseudo fields, AS, transformers, and wildcards

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      The ReturnFields parsing needs to be improved. It should also support wildcards

      1. SOLR-2444-fl-parsing.patch
        45 kB
        Ryan McKinley
      2. SOLR-2444-fl-parsing.patch
        42 kB
        Ryan McKinley

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          This patch takes Yoniks patch from SOLR-1566 and updates to trunk

          i don't really understand the proposed syntax, so will need some help

          Show
          Ryan McKinley added a comment - This patch takes Yoniks patch from SOLR-1566 and updates to trunk i don't really understand the proposed syntax, so will need some help
          Hide
          Ryan McKinley added a comment -

          adding missing file

          Show
          Ryan McKinley added a comment - adding missing file
          Hide
          Ryan McKinley added a comment -

          In SOLR-1566, we added this first draft at paramater parsing. As is, we have a few things happening:

          Mapping Field Names

          with fl=id,score we get:

          <doc>
              <str name="id">GB18030TEST</str>
              <float name="score">1.0</float></doc>
           <doc>
          

          with &fl=xxx=id,score
          we get:

          <doc>
            <float name="score">1.0</float>
            <str name="xxx">GB18030TEST</str>
          </doc>
          

          id has been mapped to xxx

          DocTransformers

          Added support to select transformers in the fl param. See http://wiki.apache.org/solr/DocTransformers for more info (or help fill it in!)

          &fl=id,_explain_ will give:

          <doc>
              <str name="id">GB18030TEST</str>
              <str name="_explain_">1.0 = (MATCH) MatchAllDocsQuery, product of:
            1.0 = queryNorm
          
          </str></doc>
          

          Passing argument to transformer

          we can change the format with: &fl=id,_explain:nl_

          <doc>
              <str name="id">GB18030TEST</str>
              <lst name="_explain:nl_">
          
                <bool name="match">true</bool>
                <float name="value">1.0</float>
                <str name="description">MatchAllDocsQuery, product of:</str>
                <arr name="details">
                  <lst>
                    <bool name="match">true</bool>
                    <float name="value">1.0</float>
          
                    <str name="description">queryNorm</str>
                  </lst>
                </arr>
              </lst></doc>
          

          Similarly, we can use the _values_ transformer to add a constant value to the output

          &fl=id,_value:hello_

          <doc>
             <doc>
              <str name="id">GB18030TEST</str>
              <str name="_value:hello_">hello</str></doc>
          

          or specify a type:

          &fl=id,_value:int:10_

          <doc>
            <doc>
              <str name="id">GB18030TEST</str>
              <int name="_value:int:10_">10</int></doc>
          

          aliasing also works with transformers

          &fl=id,ten=_value:int:10_

          <doc>
            <doc>
              <str name="id">GB18030TEST</str>
              <int name="ten">10</int></doc>
          

          Wildcard parsing

          The parser excepts wildcards – we are not doing anything with them yet

          Multiple fl parameters

          &fl=id,score

          is equivalent to:

          &fl=id&id=score

          Show
          Ryan McKinley added a comment - In SOLR-1566 , we added this first draft at paramater parsing. As is, we have a few things happening: Mapping Field Names with fl=id,score we get: <doc> <str name= "id" > GB18030TEST </str> <float name= "score" > 1.0 </float> </doc> <doc> with &fl=xxx=id,score we get: <doc> <float name= "score" > 1.0 </float> <str name= "xxx" > GB18030TEST </str> </doc> id has been mapped to xxx DocTransformers Added support to select transformers in the fl param. See http://wiki.apache.org/solr/DocTransformers for more info (or help fill it in!) &fl=id,_explain_ will give: <doc> <str name= "id" > GB18030TEST </str> <str name= "_explain_" > 1.0 = (MATCH) MatchAllDocsQuery, product of: 1.0 = queryNorm </str> </doc> Passing argument to transformer we can change the format with: &fl=id,_explain:nl_ <doc> <str name= "id" > GB18030TEST </str> <lst name= "_explain:nl_" > <bool name= "match" > true </bool> <float name= "value" > 1.0 </float> <str name= "description" > MatchAllDocsQuery, product of: </str> <arr name= "details" > <lst> <bool name= "match" > true </bool> <float name= "value" > 1.0 </float> <str name= "description" > queryNorm </str> </lst> </arr> </lst> </doc> Similarly, we can use the _values_ transformer to add a constant value to the output &fl=id,_value:hello_ <doc> <doc> <str name= "id" > GB18030TEST </str> <str name= "_value:hello_" > hello </str> </doc> or specify a type: &fl=id,_value:int:10_ <doc> <doc> <str name= "id" > GB18030TEST </str> <int name= "_value:int:10_" > 10 </int> </doc> aliasing also works with transformers &fl=id,ten=_value:int:10_ <doc> <doc> <str name= "id" > GB18030TEST </str> <int name= "ten" > 10 </int> </doc> Wildcard parsing The parser excepts wildcards – we are not doing anything with them yet Multiple fl parameters &fl=id,score is equivalent to: &fl=id&id=score
          Hide
          Ryan McKinley added a comment -

          I am not wild about the field mapping syntax. To display the field 'id' as 'xxx', we use: fl=xxx=id,score

          What about using SQL style syntax?

          &fl=id AS xxx,score
          

          I think this reads better and is less confusing (for me) since there are not so many = signs

          Show
          Ryan McKinley added a comment - I am not wild about the field mapping syntax. To display the field 'id' as 'xxx', we use: fl=xxx=id,score What about using SQL style syntax? &fl=id AS xxx,score I think this reads better and is less confusing (for me) since there are not so many = signs
          Hide
          Ryan McKinley added a comment -

          I debated different ways to pass parameters to transformers, and now think i like the simple short method:

          &fl=field,_transformer:args_,...
          

          Another option would be to reuse the LocalParams syntax... something like:

          &fl=field,{!transformer name=explain style=html},score
          

          but that feels a bit ridiculous and most transformers don't need arguments and the few that do just take simple ones.

          thoughts?

          Show
          Ryan McKinley added a comment - I debated different ways to pass parameters to transformers, and now think i like the simple short method: &fl=field,_transformer:args_,... Another option would be to reuse the LocalParams syntax... something like: &fl=field,{!transformer name=explain style=html},score but that feels a bit ridiculous and most transformers don't need arguments and the few that do just take simple ones. thoughts?
          Hide
          Yonik Seeley added a comment -

          It's not even clear to me why invoking a transformer would even be part of "fl".

          "fl=name,docid" makes sense because the user is asking for the docid field back - the fact that it's a transformer is an implementation detail.

          If one wants to add a transformer that can do anything to a document, it feels like that should be specified elsewhere, and not in the field list?

          Show
          Yonik Seeley added a comment - It's not even clear to me why invoking a transformer would even be part of "fl". "fl=name, docid " makes sense because the user is asking for the docid field back - the fact that it's a transformer is an implementation detail. If one wants to add a transformer that can do anything to a document, it feels like that should be specified elsewhere, and not in the field list?
          Hide
          Ryan McKinley added a comment -

          I think it makes sense because it is the place you select what goes in the output.

          If I add fl=name,_stuff_from_my_db_ it is reasonable that the field will have stuff from my db (whatever they are called)

          If that is specified elsewhere, it seems odd to have to keep them in sync.

          Show
          Ryan McKinley added a comment - I think it makes sense because it is the place you select what goes in the output. If I add fl=name,_stuff_from_my_db_ it is reasonable that the field will have stuff from my db (whatever they are called) If that is specified elsewhere, it seems odd to have to keep them in sync.
          Hide
          Yonik Seeley added a comment -

          In the case where some fields may come from a DB, all of the clients definitely shouldn't be exposed to that mapping. The goal should be to have those fields look like any other fields, with the clients isolated from any mapping changes.

          Show
          Yonik Seeley added a comment - In the case where some fields may come from a DB, all of the clients definitely shouldn't be exposed to that mapping. The goal should be to have those fields look like any other fields, with the clients isolated from any mapping changes.
          Hide
          Ryan McKinley added a comment -

          That seems fine – if you don't want people to add a transformer as in the fl parameter, don't register the factory in solrconfig.xml, and add it to ReturnFields in a Component/Handler/whatever

          The Component/Handler/Whatever could be configured in solrconfig.xml with whatever it needs to make/edit the Transformer

          Show
          Ryan McKinley added a comment - That seems fine – if you don't want people to add a transformer as in the fl parameter, don't register the factory in solrconfig.xml, and add it to ReturnFields in a Component/Handler/whatever The Component/Handler/Whatever could be configured in solrconfig.xml with whatever it needs to make/edit the Transformer
          Hide
          Hoss Man added a comment -

          I think it makes sense because it is the place you select what goes in the output.

          part of the complexity here is that in a lot of cases you want the client to specy the "target" of the transformation, w/o knowing the source.

          in your previous example: clients may e in a situation where they know they want the "xxx" and "score" fields w/o knowing that "xxx" is the result of a transformation from the concrete "id" field.

          In an ideal world, a solr admin named Bob should be able to tell his client Carl that the "price" field is the one Carl wants to use. Carl could then query solr with "...&fl=price" or "...&sort=price desc" w/o ever necessarily knowing that price is really the result of a function query that takes into account the current exchange rate (or some other factors driven by a transformer configured in solrconfig.xml for hte handler Carl is querying)

          Carl is selecting what data goes in the output, but that doesn't mean carl should (have to) know where that data comes from

          Show
          Hoss Man added a comment - I think it makes sense because it is the place you select what goes in the output. part of the complexity here is that in a lot of cases you want the client to specy the "target" of the transformation, w/o knowing the source. in your previous example: clients may e in a situation where they know they want the "xxx" and "score" fields w/o knowing that "xxx" is the result of a transformation from the concrete "id" field. In an ideal world, a solr admin named Bob should be able to tell his client Carl that the "price" field is the one Carl wants to use. Carl could then query solr with "...&fl=price" or "...&sort=price desc" w/o ever necessarily knowing that price is really the result of a function query that takes into account the current exchange rate (or some other factors driven by a transformer configured in solrconfig.xml for hte handler Carl is querying) Carl is selecting what data goes in the output, but that doesn't mean carl should (have to) know where that data comes from
          Hide
          Ryan McKinley added a comment -

          This makes sense – i can see letting a component set up some list of pseudo fields you could ask for, and the FL parser picking them out of a map and makeing the right transformer/whatever

          I'd like to make sure we have a simple way to configure basic things inline – SQL SELECT 1 is remarkably useful!

          Show
          Ryan McKinley added a comment - This makes sense – i can see letting a component set up some list of pseudo fields you could ask for, and the FL parser picking them out of a map and makeing the right transformer/whatever I'd like to make sure we have a simple way to configure basic things inline – SQL SELECT 1 is remarkably useful!
          Hide
          Hoss Man added a comment -

          I'd like to make sure we have a simple way to configure basic things inline – SQL SELECT 1 is remarkably useful!

          i agree completely, my point is just that in picking a syntax/api we should prioritize making the syntax for "select price" (where price is something dynamicly generated by a transformer behind the scenes) simpler then the syntax for saying "select foo() AS price" (where the client knows the nitty-gritty details of how things work under the covers)

          clients asking for "select price" should be more common then clients asking for "select foo()" or "select foo() as price"

          Show
          Hoss Man added a comment - I'd like to make sure we have a simple way to configure basic things inline – SQL SELECT 1 is remarkably useful! i agree completely, my point is just that in picking a syntax/api we should prioritize making the syntax for "select price" (where price is something dynamicly generated by a transformer behind the scenes) simpler then the syntax for saying "select foo() AS price" (where the client knows the nitty-gritty details of how things work under the covers) clients asking for "select price" should be more common then clients asking for "select foo()" or "select foo() as price"
          Hide
          Ryan McKinley added a comment -

          I just started a new branch and implemented some of the things we have suggested. Check:
          https://svn.apache.org/repos/asf/lucene/dev/branches/pseudo/

          This implements:

          SQL style AS

          ?fl=id,field AS display
          

          will display 'field' with the name 'display'

          Pseudo Fields

          You can define pseudo fields with ?hl.pseudo=key:value

          Any key that matches something in the fl param gets replaced with value. For example:

          ?fl=id,price&fl.pseudo=price:real_price_field
          

          is the same as

          ?fl=id,real_price_field AS price
          

          Transformer Syntax [name]

          The previous underscore syntax is replaced with brackets.

          ?fl=id,[value:10] AS 10
          

          Hopefully this will make it more clear that it is calling a function.

          Show
          Ryan McKinley added a comment - I just started a new branch and implemented some of the things we have suggested. Check: https://svn.apache.org/repos/asf/lucene/dev/branches/pseudo/ This implements: SQL style AS ?fl=id,field AS display will display 'field' with the name 'display' Pseudo Fields You can define pseudo fields with ?hl.pseudo=key:value Any key that matches something in the fl param gets replaced with value. For example: ?fl=id,price&fl.pseudo=price:real_price_field is the same as ?fl=id,real_price_field AS price Transformer Syntax [name] The previous underscore syntax is replaced with brackets. ?fl=id,[value:10] AS 10 Hopefully this will make it more clear that it is calling a function.
          Hide
          Ryan McKinley added a comment -

          Just commited the changes – yonik, i replaced your fancy parsing with something i can understand (StringTokenizer and indexof)

          I figure we should agree on a syntax first, and then optimize the fl parsing (out of my league)

          Show
          Ryan McKinley added a comment - Just commited the changes – yonik, i replaced your fancy parsing with something i can understand (StringTokenizer and indexof) I figure we should agree on a syntax first, and then optimize the fl parsing (out of my league)
          Hide
          Hoss Man added a comment -

          I'm not particularly a fan of the specific syntax "x AS y", but i'm not opposed to it – although your specific example (with a number on the right side) confuses me ... i think you must have actually ment something like this, correct? ...

          ?fl=id,[value:10] as hard_coded_price
          

          The one change i'd like argue in favor of is ensuring we have some way to deal with fieldnames that are esoteric. ie: containing whitespaces or special characters

          I don't think we need stress out about a lot of quoting/escaping in the common case – the rules used in the FunctionQParser to identify "simple" field names should be good enough for most people, and helps keep the syntax clean for the majority of users who have straight forward field names, but it would definitely be nice if there was some way for people to refer to complex field names (either as "input" (for referring to esoteric field names in their documents) or as "output" (for generating esoteric field names as the result of an alias/transformation)

          I think the two changes to what Ryan has already described that might make this totally feasible would be...

          1) a quote character discovered where fields like fl expect to encounter a field name (like "fl") should trigger quoted terminated string parsing (where white space and punctuation are considered part of hte string, and backslash can be used escapes quotes) and the reuslting string will be used as the field name.

          2) the psuedo field param mapping ryan described should move the fieldname to the "key" so there is no ambiguity if the seperator appears twice, ie...

          ?hl.pseudo.xxx%3Ayyy=zzz
             ..instead of...
          ?hl.pseudo=xxx%3Ayyy%3Azzz
          

          If we did those two things, then these would all be equivilent...

          ?fl=id,price&fl.pseudo.price=real_price_field
          ?fl=id,real_price_field+AS+price
          ?fl=id,"real_price_field"+AS+"price"
          

          ...but it would also be possible to have either of these...

          ?fl=id,"external+price+alias"&fl.pseudo.external+price+alias=internal+price+field
          ?fl=id,"internal+price+field"+AS+"external+price+alias"
          

          ...this shouldn't cause a problem with the syntax for echoing back literal values, since that would already require a transformer...

          ?fl=id,[value:"No+it+is+Not"]+AS+"Is+Product+In+Stock"
          (all products temporarily out of stock)
          
          Show
          Hoss Man added a comment - I'm not particularly a fan of the specific syntax "x AS y", but i'm not opposed to it – although your specific example (with a number on the right side) confuses me ... i think you must have actually ment something like this, correct? ... ?fl=id,[value:10] as hard_coded_price The one change i'd like argue in favor of is ensuring we have some way to deal with fieldnames that are esoteric. ie: containing whitespaces or special characters I don't think we need stress out about a lot of quoting/escaping in the common case – the rules used in the FunctionQParser to identify "simple" field names should be good enough for most people, and helps keep the syntax clean for the majority of users who have straight forward field names, but it would definitely be nice if there was some way for people to refer to complex field names (either as "input" (for referring to esoteric field names in their documents) or as "output" (for generating esoteric field names as the result of an alias/transformation) I think the two changes to what Ryan has already described that might make this totally feasible would be... 1) a quote character discovered where fields like fl expect to encounter a field name (like "fl") should trigger quoted terminated string parsing (where white space and punctuation are considered part of hte string, and backslash can be used escapes quotes) and the reuslting string will be used as the field name. 2) the psuedo field param mapping ryan described should move the fieldname to the "key" so there is no ambiguity if the seperator appears twice, ie... ?hl.pseudo.xxx%3Ayyy=zzz ..instead of... ?hl.pseudo=xxx%3Ayyy%3Azzz If we did those two things, then these would all be equivilent... ?fl=id,price&fl.pseudo.price=real_price_field ?fl=id,real_price_field+AS+price ?fl=id, "real_price_field" +AS+ "price" ...but it would also be possible to have either of these... ?fl=id, "external+price+alias" &fl.pseudo.external+price+alias=internal+price+field ?fl=id, "internal+price+field" +AS+ "external+price+alias" ...this shouldn't cause a problem with the syntax for echoing back literal values, since that would already require a transformer... ?fl=id,[value: "No+it+is+Not" ]+AS+ "Is+Product+In+Stock" (all products temporarily out of stock)
          Hide
          Ryan McKinley added a comment -

          I'm not particularly a fan of the specific syntax "x AS y", but i'm not opposed to it

          Other ideas? Yoniks patch uses:

          ?fl=id,hard_coded_price=[value:10]
          

          My problem (but not strong) is that = is used for both the parameter and the name mapping.

          other options?

          i think you must have actually ment something like this, correct? ?fl=id,[value:10] as hard_coded_price

          yes, sorry poor example

          a quote character...

          +1

          should move the fieldname to the "key"

          I like the syntax you suggest. The reason i suggested fl.pseudo=key:value is more to do with the implementation then the syntax. With fl.pseudo.key=value we have to iterate and compare all parameters to parse the pseudo fields rather then just getParams( "fl.pseudo" ).

          I am happy with either syntax, but I like the implementation simplicity of fl.pseudo=key:value

          Show
          Ryan McKinley added a comment - I'm not particularly a fan of the specific syntax "x AS y", but i'm not opposed to it Other ideas? Yoniks patch uses: ?fl=id,hard_coded_price=[value:10] My problem (but not strong) is that = is used for both the parameter and the name mapping. other options? i think you must have actually ment something like this, correct? ?fl=id, [value:10] as hard_coded_price yes, sorry poor example a quote character... +1 should move the fieldname to the "key" I like the syntax you suggest. The reason i suggested fl.pseudo=key:value is more to do with the implementation then the syntax. With fl.pseudo.key=value we have to iterate and compare all parameters to parse the pseudo fields rather then just getParams( "fl.pseudo" ). I am happy with either syntax, but I like the implementation simplicity of fl.pseudo=key:value
          Hide
          Yonik Seeley added a comment -

          i replaced your fancy parsing with something i can understand (StringTokenizer and indexof)

          heh - good luck with that Not using the qparser framework is pretty much doomed to failure (due to the need to then exactly replicate that parsing logic).

          Show
          Yonik Seeley added a comment - i replaced your fancy parsing with something i can understand (StringTokenizer and indexof) heh - good luck with that Not using the qparser framework is pretty much doomed to failure (due to the need to then exactly replicate that parsing logic).
          Hide
          Ryan McKinley added a comment -

          i'm not suggesting the StringTokenizer approach is the right way to go – i just have no way to futz with the qparser stuff since it is so far out of my league.

          The stuff i did (in the branch), still uses the qparser – but only when it thinks something is a function. Is it required elsewhere?

          The plan would be to figure out what syntax we want then optimize the fl parsing from there (i think)

          Show
          Ryan McKinley added a comment - i'm not suggesting the StringTokenizer approach is the right way to go – i just have no way to futz with the qparser stuff since it is so far out of my league. The stuff i did (in the branch), still uses the qparser – but only when it thinks something is a function. Is it required elsewhere? The plan would be to figure out what syntax we want then optimize the fl parsing from there (i think)
          Hide
          Yonik Seeley added a comment -

          The stuff i did (in the branch), still uses the qparser – but only when it thinks something is a function.

          I should have explained further... doing indexOf("any random delimiter") and then feeding that to the qparser is what is pretty much doomed to failure. Whatever delimiter you're trying to use could easily be contained within the query/function syntax itself. This is why the qparser framework must be used to find the end of the query/funtion. The first crack at sort-by-function had the same problems (although it was trying to be more clever and parse parens, params, etc, to find the end of the function) and needed to be rewritten.

          Show
          Yonik Seeley added a comment - The stuff i did (in the branch), still uses the qparser – but only when it thinks something is a function. I should have explained further... doing indexOf("any random delimiter") and then feeding that to the qparser is what is pretty much doomed to failure. Whatever delimiter you're trying to use could easily be contained within the query/function syntax itself. This is why the qparser framework must be used to find the end of the query/funtion. The first crack at sort-by-function had the same problems (although it was trying to be more clever and parse parens, params, etc, to find the end of the function) and needed to be rewritten.
          Hide
          Ryan McKinley added a comment -

          got it – yes, this first splits on comma – so if a field has a comma it would be busted.

          There is also no way to implement the quoted field names that hoss suggests

          Show
          Ryan McKinley added a comment - got it – yes, this first splits on comma – so if a field has a comma it would be busted. There is also no way to implement the quoted field names that hoss suggests
          Hide
          Yonik Seeley added a comment -

          There is also no way to implement the quoted field names that hoss suggests

          Actually it should be trivial since it's a single value and StrParser already supports it. Anyway, as you say, we should concentrate first on the desired syntax.

          I brought up "add(a,b) AS foo" in SOLR-1298 as one option, but no one was thrilled with it.

          OK, so for the basic syntax of how to name pseudofields, I think these are the top 3 options we have?

          fl=name,title,dist=geodist(),nterms=termfreq(text,solr)
          fl=name,title,dist:geodist(),nterms:termfreq(text,solr)
          fl=name,title,geodist() AS dist,termfreq(text,solr) AS nterms
          

          I think I'm fine with any of these, but perhaps we should get feedback from more people? Once we decide, we should pretty much stick with it forever IMO.

          Show
          Yonik Seeley added a comment - There is also no way to implement the quoted field names that hoss suggests Actually it should be trivial since it's a single value and StrParser already supports it. Anyway, as you say, we should concentrate first on the desired syntax. I brought up "add(a,b) AS foo" in SOLR-1298 as one option, but no one was thrilled with it. OK, so for the basic syntax of how to name pseudofields, I think these are the top 3 options we have? fl=name,title,dist=geodist(),nterms=termfreq(text,solr) fl=name,title,dist:geodist(),nterms:termfreq(text,solr) fl=name,title,geodist() AS dist,termfreq(text,solr) AS nterms I think I'm fine with any of these, but perhaps we should get feedback from more people? Once we decide, we should pretty much stick with it forever IMO.
          Hide
          Ryan McKinley added a comment -

          but perhaps we should get feedback from more people?

          +1 I'd be fine with any of these options, but lean towards AS and shy away from =

          Show
          Ryan McKinley added a comment - but perhaps we should get feedback from more people? +1 I'd be fine with any of these options, but lean towards AS and shy away from =
          Hide
          Hoss Man added a comment -

          I like the syntax you suggest. The reason i suggested fl.pseudo=key:value is more to do with the implementation then the syntax. With fl.pseudo.key=value we have to iterate and compare all parameters to parse the pseudo fields rather then just getParams( "fl.pseudo" ).

          Yeah ... one possibility is to use the same approach we use for field overrides in other params...

          fl.pseudo=external+price+alias
          fl.pseudo=external+popularity+alias
          fl.pseudo.external+price+alias=internal+price+field
          fl.pseudo.external+popularity+alias=internal+popularity+field
          

          ...it's a little verbose, but since the main use of this is likely to be "default" params anyway (because people specifying it at request time could just include the aliasing directly in hte param value) it might not be that bad.

          OK, so for the basic syntax of how to name pseudofields, I think these are the top 3 options we have?

          i vote for the colon. "=" is evil since it's easy to confuse as a key=val delimiter in the URL (and requires extra escaping in most docs to explain it correctly). "x AS y" just seems unnecessarily verbose.

          Show
          Hoss Man added a comment - I like the syntax you suggest. The reason i suggested fl.pseudo=key:value is more to do with the implementation then the syntax. With fl.pseudo.key=value we have to iterate and compare all parameters to parse the pseudo fields rather then just getParams( "fl.pseudo" ). Yeah ... one possibility is to use the same approach we use for field overrides in other params... fl.pseudo=external+price+alias fl.pseudo=external+popularity+alias fl.pseudo.external+price+alias=internal+price+field fl.pseudo.external+popularity+alias=internal+popularity+field ...it's a little verbose, but since the main use of this is likely to be "default" params anyway (because people specifying it at request time could just include the aliasing directly in hte param value) it might not be that bad. OK, so for the basic syntax of how to name pseudofields, I think these are the top 3 options we have? i vote for the colon. "=" is evil since it's easy to confuse as a key=val delimiter in the URL (and requires extra escaping in most docs to explain it correctly). "x AS y" just seems unnecessarily verbose.
          Hide
          Erik Hatcher added a comment -

          I'm kinda liking having fl stay unadorned, such that we have fl=name,title,dist,nterms and fl.dist=geodist() and fl.nterms=termfreq(text,solr) (or fl.pseudo prefix). This allows for indirection on what dist and nterms really maps to, keeping that out of the main fl.

          Show
          Erik Hatcher added a comment - I'm kinda liking having fl stay unadorned, such that we have fl=name,title,dist,nterms and fl.dist=geodist() and fl.nterms=termfreq(text,solr) (or fl.pseudo prefix). This allows for indirection on what dist and nterms really maps to, keeping that out of the main fl.
          Hide
          Ryan McKinley added a comment -

          Erik – so you are suggesting that the fl list is always the display value, and it may map to a pseudo field with a different parameter

          fl=name,dist,nice_looking_field_name
          fl.pseudo.dist=geodist()
          fl.pseudo.nice_looking_field_name=crazy_field_name

          In this case, each 'fl' value would be checked to see if it actually maps to a pseudo field. As for supporting crazy field names we could either:
          1. support quoting in the fl param so that fields with ',' aren't split
          2. if you index a field with ',' in the name, you can get it but it needs to be mapped as a pseudo field.

          I like this suggestion. It avoids the ':' vs '=' vs 'AS' issue and makes the parsing rules easy to explain.

          Show
          Ryan McKinley added a comment - Erik – so you are suggesting that the fl list is always the display value, and it may map to a pseudo field with a different parameter fl=name,dist,nice_looking_field_name fl.pseudo.dist=geodist() fl.pseudo.nice_looking_field_name=crazy_field_name In this case, each 'fl' value would be checked to see if it actually maps to a pseudo field. As for supporting crazy field names we could either: 1. support quoting in the fl param so that fields with ',' aren't split 2. if you index a field with ',' in the name, you can get it but it needs to be mapped as a pseudo field. I like this suggestion. It avoids the ':' vs '=' vs 'AS' issue and makes the parsing rules easy to explain.
          Hide
          Yonik Seeley added a comment -

          From the high level user perspective, a field and a function of fields both yield values.
          I like that simplicity - it's the same type of simplicity we have in programming languages.

          I can call foo(a,b), but instead of a variable(think fieldname) I can simply substitute another function:
          foo(a,bar(c,d))

          I think this best matches people's expectations - directly specify what they want returned in "fl", as they do today. It's also less syntax to remember. It's also consistent with how we enable sort-by-function... anywhere a fieldname can be, a function can be substituted. It's also cleaner by default since a name is not required and the label that will be used is the function itself.

          i vote for the colon

          +1, I think that's the best option.

          I think this is an independent issue from setting up transparent user selectable aliases/pseudofields such that when a user puts "foo" in the field, they get the value but have no idea it came from a source other than stored fields.

          Show
          Yonik Seeley added a comment - From the high level user perspective, a field and a function of fields both yield values. I like that simplicity - it's the same type of simplicity we have in programming languages. I can call foo(a,b), but instead of a variable(think fieldname) I can simply substitute another function: foo(a,bar(c,d)) I think this best matches people's expectations - directly specify what they want returned in "fl", as they do today. It's also less syntax to remember. It's also consistent with how we enable sort-by-function... anywhere a fieldname can be, a function can be substituted. It's also cleaner by default since a name is not required and the label that will be used is the function itself. i vote for the colon +1, I think that's the best option. I think this is an independent issue from setting up transparent user selectable aliases/pseudofields such that when a user puts "foo" in the field, they get the value but have no idea it came from a source other than stored fields.
          Hide
          Ryan McKinley added a comment -

          Thinking about the pseudo field mapping – I like the idea that each parsed 'fl' element gets checked and possibly replaced with something from the pseudo field list.

          As a crazy example, this would mean:

          fl=name,dist:geodist()&
          fl.pseudo.geodist()=distance_field

          would use the lucene field 'distance_field' and put it in a solr document with the name 'dist'

          Does that match your expectation?

          i vote for colon

          +1 ok me too

          Show
          Ryan McKinley added a comment - Thinking about the pseudo field mapping – I like the idea that each parsed 'fl' element gets checked and possibly replaced with something from the pseudo field list. As a crazy example, this would mean: fl=name,dist:geodist()& fl.pseudo.geodist()=distance_field would use the lucene field 'distance_field' and put it in a solr document with the name 'dist' Does that match your expectation? i vote for colon +1 ok me too
          Hide
          Yonik Seeley added a comment -

          Ok, so it seems like we're agreeing on two different usecases.

          For user directly asking for functions, we'll support (as we already do today)
          fl=a,b,foo(c),d

          And if the user wants to change the name, they can use "mykey:foo(c)"

          And for separately setting up fields that a user could add to fl w/o needing to know where they come from,
          we can support
          field.mykey=foo(c)

          It seems like the latter should almost be a separate issue - IMO, it's not as immediately important, and there are other issues to figure out (like how to specify if it's implicit and included with *, or with globs, or if it's only added if explicitly listed).

          Show
          Yonik Seeley added a comment - Ok, so it seems like we're agreeing on two different usecases. For user directly asking for functions, we'll support (as we already do today) fl=a,b,foo(c),d And if the user wants to change the name, they can use "mykey:foo(c)" And for separately setting up fields that a user could add to fl w/o needing to know where they come from, we can support field.mykey=foo(c) It seems like the latter should almost be a separate issue - IMO, it's not as immediately important, and there are other issues to figure out (like how to specify if it's implicit and included with *, or with globs, or if it's only added if explicitly listed).
          Hide
          Ryan McKinley added a comment -

          bq, For user directly asking for functions, we'll support (as we already do today)

          fl=a,b,foo(c),d]

          And if the user wants to change the name, they can use "mykey:foo(c)"

          With the addition of the inline transformer syntax

          fl=a,b,[explain]

          fl=a,b,price:[value:10]

          I don't think we can treat the pseudo fields as a totally different issue since the field list parsing depends on the pseudo fields. That is, if you have fl=id,foo(c)&fl.pseudo.foo(c)=field_name we never actually parse foo(c) as a function – it needs to get mapped to the field 'field_name'

          I think the fl parser needs a few passes:
          1. split fl into tokens (or whatever we call them)
          2. each token may get replaced with the value from fl.pseudo.token
          3. check if the token is a function, transformer, or wildcard

          but yes, it is a conceptually different issue.

          Show
          Ryan McKinley added a comment - bq, For user directly asking for functions, we'll support (as we already do today) fl=a,b,foo(c),d] And if the user wants to change the name, they can use "mykey:foo(c)" With the addition of the inline transformer syntax fl=a,b, [explain] fl=a,b,price: [value:10] I don't think we can treat the pseudo fields as a totally different issue since the field list parsing depends on the pseudo fields. That is, if you have fl=id,foo(c)&fl.pseudo.foo(c)=field_name we never actually parse foo(c) as a function – it needs to get mapped to the field 'field_name' I think the fl parser needs a few passes: 1. split fl into tokens (or whatever we call them) 2. each token may get replaced with the value from fl.pseudo.token 3. check if the token is a function, transformer, or wildcard but yes, it is a conceptually different issue.
          Hide
          Yonik Seeley added a comment -

          I don't think we can treat the pseudo fields as a totally different issue since the field list parsing depends on the pseudo fields.

          Well sure, more development wherever it is. I meant more that it's different issue as far as features go.
          Being able to return function values finally completes basic geosearch.

          if you have fl=id,foo(c)&fl.pseudo.foo(c)=field_name we never actually parse foo(c) as a function

          That seems like more complexity than it's worth, and would really only work with parameter-less functions (if implemented as a hashmap lookup) since different arguments to the function would cause the match to fail.

          "transformer syntax" also seems like an somewhat orthoginal issue. Has anyone commented on the proposed syntax (what is the full proposed syntax anyway? I need to see some examples with more than one parameter). Is it important to allow these transformer parameters inline in "fl"? etc.

          Show
          Yonik Seeley added a comment - I don't think we can treat the pseudo fields as a totally different issue since the field list parsing depends on the pseudo fields. Well sure, more development wherever it is. I meant more that it's different issue as far as features go. Being able to return function values finally completes basic geosearch. if you have fl=id,foo(c)&fl.pseudo.foo(c)=field_name we never actually parse foo(c) as a function That seems like more complexity than it's worth, and would really only work with parameter-less functions (if implemented as a hashmap lookup) since different arguments to the function would cause the match to fail. "transformer syntax" also seems like an somewhat orthoginal issue. Has anyone commented on the proposed syntax (what is the full proposed syntax anyway? I need to see some examples with more than one parameter). Is it important to allow these transformer parameters inline in "fl"? etc.
          Hide
          Yonik Seeley added a comment - - edited

          Hmmm, I've tried changing '=' to ':' for the key... but things now fail because of tests in trunk w/ syntax like
          explain:nl

          Also, why is testAugmentFields in SolrExampleTests (the stuff that tests the example)?
          edit: Oh, I think I see - it's an easy way to see that both binary and non-binary response writers are tested, right?

          Show
          Yonik Seeley added a comment - - edited Hmmm, I've tried changing '=' to ':' for the key... but things now fail because of tests in trunk w/ syntax like explain:nl Also, why is testAugmentFields in SolrExampleTests (the stuff that tests the example)? edit: Oh, I think I see - it's an easy way to see that both binary and non-binary response writers are tested, right?
          Hide
          Ryan McKinley added a comment -

          That seems like more complexity than it's worth, and would really only work with parameter-less functions (if implemented as a hashmap lookup) since different arguments to the function would cause the match to fail.

          this is why i bring it up, and why I think it is the same issue. We need to agree on what it means. and i'm pretty sure that has consequences on how we implement the basic parsing.

          As you say, i would expect different arguments to the function should not match a pseudo field.
          fl=id,foo(a)
          would not use the pseudo field defined in
          fl.pseudo.foo(a)=something

          I think we only need to say that exact matches would get replaced. For example
          fl=id,foo( a )
          does not need to match
          fl.pseudo.foo(a)=something

          We can say that functions/transformers are not supported by pseudo fields – i'm fine with that, but think we need to be explicit. One argument to support it is so that you could swap the meaning of some function w/out updating clients.

          "transformer syntax" also seems like an somewhat orthoginal issue

          Not really, it is about how we parse the fl.

          Has anyone commented on the proposed syntax

          nope – other then hoss agreeing that SQL SELECT 1 is very useful and we should make it simple

          (what is the full proposed syntax anyway? I need to see some examples with more than one parameter)

          The proposed syntax is:

          [name] and [name:argument]

          For the key use cases I can think of having a single inline parameter is very useful [value:10], for more complex args, the transformer can use SolrQueryRequest

          Is it important to allow these transformer parameters inline in "fl"? etc.

          For me they are equally important to inline functions. I plan to use them for things that do not map cleanly to functions. A simple example, if you have a geohash point that encodes X and Y in a single field, i want to return that with well typed difference between X and Y. With a transformer, i can return

          {x:10, y:20}

          rather then just

          {point:'10 20'}

          and make the client figure out if I mean x y or lat lon.

          The other key place I see them getting used is with returning highlighed fields inline

          ?fl=id,name,[hl:name]

          would return the raw name field and the highlighted name field. All the other highlight parameters would be fetched from getParams()

          Show
          Ryan McKinley added a comment - That seems like more complexity than it's worth, and would really only work with parameter-less functions (if implemented as a hashmap lookup) since different arguments to the function would cause the match to fail. this is why i bring it up, and why I think it is the same issue. We need to agree on what it means. and i'm pretty sure that has consequences on how we implement the basic parsing. As you say, i would expect different arguments to the function should not match a pseudo field. fl=id,foo(a) would not use the pseudo field defined in fl.pseudo.foo(a)=something I think we only need to say that exact matches would get replaced. For example fl=id,foo( a ) does not need to match fl.pseudo.foo(a)=something We can say that functions/transformers are not supported by pseudo fields – i'm fine with that, but think we need to be explicit. One argument to support it is so that you could swap the meaning of some function w/out updating clients. "transformer syntax" also seems like an somewhat orthoginal issue Not really, it is about how we parse the fl. Has anyone commented on the proposed syntax nope – other then hoss agreeing that SQL SELECT 1 is very useful and we should make it simple (what is the full proposed syntax anyway? I need to see some examples with more than one parameter) The proposed syntax is: [name] and [name:argument] For the key use cases I can think of having a single inline parameter is very useful [value:10] , for more complex args, the transformer can use SolrQueryRequest Is it important to allow these transformer parameters inline in "fl"? etc. For me they are equally important to inline functions. I plan to use them for things that do not map cleanly to functions. A simple example, if you have a geohash point that encodes X and Y in a single field, i want to return that with well typed difference between X and Y. With a transformer, i can return {x:10, y:20} rather then just {point:'10 20'} and make the client figure out if I mean x y or lat lon. The other key place I see them getting used is with returning highlighed fields inline ?fl=id,name, [hl:name] would return the raw name field and the highlighted name field. All the other highlight parameters would be fetched from getParams()
          Hide
          Yonik Seeley added a comment -

          [name] and [name:argument]

          It just seems both strange and limiting to say that an augmenter may only have one argument.
          But I suppose if that argument is always just a string, the augmentor could always parse it into multiple arguments. What is the syntax of "argment"? is it backslash escaped so the value can contain "]"?

          Show
          Yonik Seeley added a comment - [name] and [name:argument] It just seems both strange and limiting to say that an augmenter may only have one argument. But I suppose if that argument is always just a string, the augmentor could always parse it into multiple arguments. What is the syntax of "argment"? is it backslash escaped so the value can contain "]"?
          Hide
          Yonik Seeley added a comment -

          For the key use cases I can think of having a single inline parameter is very useful [value:10],

          Still too complex for my tastes. I think that should be fl=myvalue:10
          rather than fl=myvalue:[value:10]

          But it doesn't hurt anything if we keep the "value" transformer around anyway

          Show
          Yonik Seeley added a comment - For the key use cases I can think of having a single inline parameter is very useful [value:10] , Still too complex for my tastes. I think that should be fl=myvalue:10 rather than fl=myvalue: [value:10] But it doesn't hurt anything if we keep the "value" transformer around anyway
          Hide
          Ryan McKinley added a comment -

          but things now fail because of tests in trunk w/ syntax like

          ya, i tried messing with that too – also tried changing the transformer syntax from _ to [] but could not understand how the parser works. This is why i made the branch to see how the rest feels.

          I just updated the branch to use a space as the transformer args deliminator. I also refactored to support the pseudo field mapping I think we agree on (though yonik thinks we should do it as a different issues)

          This adds a parameter fl.pseudo=true/false – if that is on, it will check if each field has an alternative in fl.pseudo.key=value

          This just uses fl.split( "," ) but that should really be a fancy parser that knows about quotes.

          it's an easy way to see that both binary and non-binary response writers are tested, right?

          Yes, this is the high level place that hits XML and binary response writers – it used to use JSON too, but looks like that has changed. It is also tested there because I want to make sure solrj works correctly with complex structures like [explain nl]

          Show
          Ryan McKinley added a comment - but things now fail because of tests in trunk w/ syntax like ya, i tried messing with that too – also tried changing the transformer syntax from _ to [] but could not understand how the parser works. This is why i made the branch to see how the rest feels. I just updated the branch to use a space as the transformer args deliminator. I also refactored to support the pseudo field mapping I think we agree on (though yonik thinks we should do it as a different issues) This adds a parameter fl.pseudo=true/false – if that is on, it will check if each field has an alternative in fl.pseudo.key=value This just uses fl.split( "," ) but that should really be a fancy parser that knows about quotes. it's an easy way to see that both binary and non-binary response writers are tested, right? Yes, this is the high level place that hits XML and binary response writers – it used to use JSON too, but looks like that has changed. It is also tested there because I want to make sure solrj works correctly with complex structures like [explain nl]
          Hide
          Ryan McKinley added a comment -

          It just seems both strange and limiting to say that an augmenter may only have one argument.

          But I suppose if that argument is always just a string, the augmentor could always parse it into multiple arguments.

          Ya, the value augmenter actually does this – you can specify a type [value int 10] vs [value 10]

          What is the syntax of "argment"? is it backslash escaped so the value can contain "]"?

          I guess that is a good idea – if it makes things complicated, i'm not too worried about it. You could use another parameter if if there is a need for something complex.

          Still too complex for my tastes. I think that should be fl=myvalue:10

          would this mean that any unknown string becomes a literal value? I would rather have an error then shorten the SELECT 10 case. See SOLR-2441

          Show
          Ryan McKinley added a comment - It just seems both strange and limiting to say that an augmenter may only have one argument. But I suppose if that argument is always just a string, the augmentor could always parse it into multiple arguments. Ya, the value augmenter actually does this – you can specify a type [value int 10] vs [value 10] What is the syntax of "argment"? is it backslash escaped so the value can contain "]"? I guess that is a good idea – if it makes things complicated, i'm not too worried about it. You could use another parameter if if there is a need for something complex. Still too complex for my tastes. I think that should be fl=myvalue:10 would this mean that any unknown string becomes a literal value? I would rather have an error then shorten the SELECT 10 case. See SOLR-2441
          Hide
          Yonik Seeley added a comment - - edited

          > What is the syntax of "argment"? is it backslash escaped so the value can contain "]"?

          I guess that is a good idea – if it makes things complicated, i'm not too worried about it.

          I'm not concerned about the complexity of implementation at all - I'm just trying to figure out what the proposal actually is.

          You could use another parameter if if there is a need for something complex.

          Another parameter for the augmenter? That's essentially what I was asking about. Or do you mean a different query parameter?
          edit: oops... just saw your previous message "I just updated the branch to use a space as the transformer args deliminator." I guess that's what you meant.

          > Still too complex for my tastes. I think that should be fl=myvalue:10

          would this mean that any unknown string becomes a literal value?

          Nope. By default, the function parser treats an unquoted string literal as a field name, and an error will be thrown if it isn't. If you want a string literal, you quote it.

          Show
          Yonik Seeley added a comment - - edited > What is the syntax of "argment"? is it backslash escaped so the value can contain "]"? I guess that is a good idea – if it makes things complicated, i'm not too worried about it. I'm not concerned about the complexity of implementation at all - I'm just trying to figure out what the proposal actually is. You could use another parameter if if there is a need for something complex. Another parameter for the augmenter? That's essentially what I was asking about. Or do you mean a different query parameter? edit: oops... just saw your previous message "I just updated the branch to use a space as the transformer args deliminator." I guess that's what you meant. > Still too complex for my tastes. I think that should be fl=myvalue:10 would this mean that any unknown string becomes a literal value? Nope. By default, the function parser treats an unquoted string literal as a field name, and an error will be thrown if it isn't. If you want a string literal, you quote it.
          Hide
          Yonik Seeley added a comment -

          This adds a parameter fl.pseudo=true/false – if that is on, it will check if each field has an alternative in fl.pseudo.key=value

          I think I like the shorter form fl.x=y better (rather than fl.pseudo.x=y)? Anyone else?

          A pseudofields=false parameter is a good idea to aid in debugging though.

          Show
          Yonik Seeley added a comment - This adds a parameter fl.pseudo=true/false – if that is on, it will check if each field has an alternative in fl.pseudo.key=value I think I like the shorter form fl.x=y better (rather than fl.pseudo.x=y)? Anyone else? A pseudofields=false parameter is a good idea to aid in debugging though.
          Hide
          Yonik Seeley added a comment -

          FYI: just so we don't overlap effort, I'm busy adding objectVal(doc) to DocValues so that we can support all types of function queries (which is important beyond pseudo-fields too)

          Show
          Yonik Seeley added a comment - FYI: just so we don't overlap effort, I'm busy adding objectVal(doc) to DocValues so that we can support all types of function queries (which is important beyond pseudo-fields too)
          Hide
          Ryan McKinley added a comment -

          you have seen SOLR-2443.... right?

          Show
          Ryan McKinley added a comment - you have seen SOLR-2443 .... right?
          Hide
          Yonik Seeley added a comment -

          you have seen SOLR-2443.... right?

          Heh - no I had not.
          I lose track (and in this case had never even seen the issue). We should try to link all of these related issues in one place - it's hard to keep track of otherwise.

          Show
          Yonik Seeley added a comment - you have seen SOLR-2443 .... right? Heh - no I had not. I lose track (and in this case had never even seen the issue). We should try to link all of these related issues in one place - it's hard to keep track of otherwise.
          Hide
          Hoss Man added a comment -

          We should try to link all of these related issues in one place - it's hard to keep track of otherwise.

          our current version of jira makes it possible to convert an issue into a "sub task" of another issue ... it's a little more visible that way then just using the dependency/realted issue linking.

          Show
          Hoss Man added a comment - We should try to link all of these related issues in one place - it's hard to keep track of otherwise. our current version of jira makes it possible to convert an issue into a "sub task" of another issue ... it's a little more visible that way then just using the dependency/realted issue linking.
          Hide
          Yonik Seeley added a comment -

          Random thought about augmenter parameter syntax (when it's actually needed)... what about reusing most of localParams syntax, but change

          {!stuff}

          to [stuff]? This would give us named parameters, param dereferencing, and allow passing something like SolrParams to an augmenter, which is probably easier for most people to deal with than parsing themselves?

          So instead of this:

          titlehl:[_hl_:title]
          

          We could have this:

          titlehl:[_hl_ f=title]
          

          Which would then easily allow multiple params like this:

          titlehl:[_hl_ f=title snippets=3 fragsize=800]
          

          Just a note, I'm making good progress on supporting all common object types in function queries, so basic literals won't need any transformer syntax and you should be able to just do stuff like

           fl=mystr:'hello',myint:10, myfloat:25.5
          
          Show
          Yonik Seeley added a comment - Random thought about augmenter parameter syntax (when it's actually needed)... what about reusing most of localParams syntax, but change {!stuff} to [stuff] ? This would give us named parameters, param dereferencing, and allow passing something like SolrParams to an augmenter, which is probably easier for most people to deal with than parsing themselves? So instead of this: titlehl:[_hl_:title] We could have this: titlehl:[_hl_ f=title] Which would then easily allow multiple params like this: titlehl:[_hl_ f=title snippets=3 fragsize=800] Just a note, I'm making good progress on supporting all common object types in function queries, so basic literals won't need any transformer syntax and you should be able to just do stuff like fl=mystr:'hello',myint:10, myfloat:25.5
          Hide
          Ryan McKinley added a comment -

          no need for the '_' with transformers – i would hope that the brackets tell you that it is a tranformer.

          ?fl=id,[explain]
          not
          ?fl=id,[_explain_]

          we could have titlehl:[_hl_ f=title]

          ya, i think that would be fine.

          so basic literals won't need any transformer syntax

          I still don't get this. How do you know that the literal is not referring to a field (or invalid field)? How do you know it is an int vs float vs double vs string? Seems like too much magic to me.

          Show
          Ryan McKinley added a comment - no need for the '_' with transformers – i would hope that the brackets tell you that it is a tranformer. ?fl=id, [explain] not ?fl=id, [_explain_] we could have titlehl: [_hl_ f=title] ya, i think that would be fine. so basic literals won't need any transformer syntax I still don't get this. How do you know that the literal is not referring to a field (or invalid field)? How do you know it is an int vs float vs double vs string? Seems like too much magic to me.
          Hide
          Yonik Seeley added a comment -

          no need for the '_' with transformers – i would hope that the brackets tell you that it is a tranformer.

          Right, but it seems like the name of the transformer should match the field that it adds to the document by default? That's just a convention of course... for example, fl=docid adds the docid field to the documents. It seems natural to refer to the transformer that does that as the docid transformer?

          I still don't get this. How do you know that the literal is not referring to a field (or invalid field)? How do you know it is an int vs float vs double vs string? Seems like too much magic to me.

          It's magic people expect, and easy to understand, because most of their programming languages work that way. It's unlikely that float vs double would matter for returning a constant anyway - and things like JSON don't even distinguish. A string would be quoted, and an int would lack characteristics of a float/doube.

          We could even add float() int() double() long() functions in the future if we really need them.

          Show
          Yonik Seeley added a comment - no need for the '_' with transformers – i would hope that the brackets tell you that it is a tranformer. Right, but it seems like the name of the transformer should match the field that it adds to the document by default? That's just a convention of course... for example, fl= docid adds the docid field to the documents. It seems natural to refer to the transformer that does that as the docid transformer? I still don't get this. How do you know that the literal is not referring to a field (or invalid field)? How do you know it is an int vs float vs double vs string? Seems like too much magic to me. It's magic people expect, and easy to understand, because most of their programming languages work that way. It's unlikely that float vs double would matter for returning a constant anyway - and things like JSON don't even distinguish. A string would be quoted, and an int would lack characteristics of a float/doube. We could even add float() int() double() long() functions in the future if we really need them.
          Hide
          Ryan McKinley added a comment -

          Right, but it seems like the name of the transformer should match the field that it adds to the document by default?

          Since changing from the '_' syntax to the bracket syntax, i would now expect the brackets in the name for the return field.

          ?fl=id,[explain]
          

          would return the document:

          <str name="[explain]">...</str>
          

          It's magic people expect,

          How do you know it is a literal and not just a missing field name? See SOLR-2441

          What about a literal that matches a field name? quotes? Didn't hoss suggest that we should use quotes to wrap crazy field names?

          is:

          ?fl=id,avalue:'some crazy field name',score
          

          referring to a field or a literal? In the fl parameter, i would expect everything to be a field name unless you explicitly say it is a literal.

          Show
          Ryan McKinley added a comment - Right, but it seems like the name of the transformer should match the field that it adds to the document by default? Since changing from the '_' syntax to the bracket syntax, i would now expect the brackets in the name for the return field. ?fl=id,[explain] would return the document: <str name= "[explain]" >...</str> It's magic people expect, How do you know it is a literal and not just a missing field name? See SOLR-2441 What about a literal that matches a field name? quotes? Didn't hoss suggest that we should use quotes to wrap crazy field names? is: ?fl=id,avalue:'some crazy field name',score referring to a field or a literal? In the fl parameter, i would expect everything to be a field name unless you explicitly say it is a literal.
          Hide
          Yonik Seeley added a comment -

          What about a literal that matches a field name? quotes? Didn't hoss suggest that we should use quotes to wrap crazy field names?

          'foo bar' would be a string literal
          field('foo bar') would be the whacky field name with a space in it

          Show
          Yonik Seeley added a comment - What about a literal that matches a field name? quotes? Didn't hoss suggest that we should use quotes to wrap crazy field names? 'foo bar' would be a string literal field('foo bar') would be the whacky field name with a space in it
          Hide
          Ryan McKinley added a comment -

          Aaa – my instinct would be the reverse. If something is listed, it is most likely a field name, and then only if you explicitly make it a value would it be a value.

          other people have opinions?

          what about a field name 10? does that need special escaping just because it is also a number? How would this handle ?fl=id,foo when foo is not a real field name? is foo a literal or a field name that does not exist?

          Show
          Ryan McKinley added a comment - Aaa – my instinct would be the reverse. If something is listed, it is most likely a field name, and then only if you explicitly make it a value would it be a value. other people have opinions? what about a field name 10? does that need special escaping just because it is also a number? How would this handle ?fl=id,foo when foo is not a real field name? is foo a literal or a field name that does not exist?
          Hide
          Koji Sekiguchi added a comment -

          Does this issue cover wildcard syntax like fl=*_s ? Because SOLR-2503 has been committed, I want the wildcard syntax for fl.

          &fl=*_s

          <doc>
            <str name="PERSON_S">Barack Obama</str>
            <str name="TITLE_S">the President</str>
          </doc>
          
          Show
          Koji Sekiguchi added a comment - Does this issue cover wildcard syntax like fl=*_s ? Because SOLR-2503 has been committed, I want the wildcard syntax for fl. &fl=*_s <doc> <str name= "PERSON_S" >Barack Obama</str> <str name= "TITLE_S" >the President</str> </doc>
          Hide
          Ryan McKinley added a comment -

          in #1133505, I updated Transformers to take a Map<String,String> that is parsed using the LocalParams syntax.

          In trunk, things now look like:

          ?fl=id,[shard],[value v=10]
          
          Show
          Ryan McKinley added a comment - in #1133505, I updated Transformers to take a Map<String,String> that is parsed using the LocalParams syntax. In trunk, things now look like: ?fl=id,[shard],[value v=10]
          Hide
          Yonik Seeley added a comment -

          Yep, we get the full power/familiarity of local params, including param substitution (e.g. myvar=$other_request_param)

          I updated Transformers to take a Map<String,String> that is parsed using the LocalParams syntax.

          In the template parsing code I committed first, I had used SolrParams... one reason being that for some time I've thought that we might want multi-valued parameters in localParams. If back compat of transformers isn't a big deal, we can change Map<String,String> to Map<String,String[]> later... but it seems like the additional parsing logic of SolrParams might add enough value to use that instead of a bare Map anyway?

          Show
          Yonik Seeley added a comment - Yep, we get the full power/familiarity of local params, including param substitution (e.g. myvar=$other_request_param) I updated Transformers to take a Map<String,String> that is parsed using the LocalParams syntax. In the template parsing code I committed first, I had used SolrParams... one reason being that for some time I've thought that we might want multi-valued parameters in localParams. If back compat of transformers isn't a big deal, we can change Map<String,String> to Map<String,String[]> later... but it seems like the additional parsing logic of SolrParams might add enough value to use that instead of a bare Map anyway?
          Hide
          Ryan McKinley added a comment -

          I used Map<String,String> because i figured most Transformes won't use the params anyway, so it is less "work" – I don't feel strongly either way.

          I'll change it to SolrParams

          Show
          Ryan McKinley added a comment - I used Map<String,String> because i figured most Transformes won't use the params anyway, so it is less "work" – I don't feel strongly either way. I'll change it to SolrParams
          Hide
          Ryan McKinley added a comment -

          changed in r1133534

          Show
          Ryan McKinley added a comment - changed in r1133534
          Hide
          Ryan McKinley added a comment -

          I added some quick docs to:
          http://wiki.apache.org/solr/CommonQueryParameters#glob

          we should make sure that is accurate and flush out better examples

          Show
          Ryan McKinley added a comment - I added some quick docs to: http://wiki.apache.org/solr/CommonQueryParameters#glob we should make sure that is accurate and flush out better examples
          Hide
          Hoss Man added a comment -

          since the majority of this has already been committed to trunk, i'm marking this for 4.0 – if there is any outstanding work to consider this issue "finished" it either needs spun off into a new issue, or wrapped up before 4.0 is released.

          Show
          Hoss Man added a comment - since the majority of this has already been committed to trunk, i'm marking this for 4.0 – if there is any outstanding work to consider this issue "finished" it either needs spun off into a new issue, or wrapped up before 4.0 is released.
          Hide
          Jan Rasehorn added a comment - - edited

          It does not seem to be working in Solr 4 Trunk from 16th Dec 2011.
          I added a transformer with name "testtrans" as a copy of the existing examples to solrconfig.xml and tried to incorporate it into the fl parameter.

          Solr returns an error message saying:

          undefined field: [testtrans]

          If i add a pseudo field "constval:sum(1,2)" to fl - parameter, solr returns an error message also:

          undefined field: constval:sum(1

          Am I missing some steps to enable it?

          Show
          Jan Rasehorn added a comment - - edited It does not seem to be working in Solr 4 Trunk from 16th Dec 2011. I added a transformer with name "testtrans" as a copy of the existing examples to solrconfig.xml and tried to incorporate it into the fl parameter. Solr returns an error message saying: undefined field: [testtrans] If i add a pseudo field "constval:sum(1,2)" to fl - parameter, solr returns an error message also: undefined field: constval:sum(1 Am I missing some steps to enable it?
          Hide
          Jan Rasehorn added a comment -

          Found the reason:

          I had the term vector component enabled. As described in SOLR-2352, TVC causes an error "undefined field" for "*" and "score" and as it seems for all pseudo fields, transformers and functions used in the fl-parameter.

          I disabled TVC in solrconfig.xml and now it is working.

          Show
          Jan Rasehorn added a comment - Found the reason: I had the term vector component enabled. As described in SOLR-2352 , TVC causes an error "undefined field" for "*" and "score" and as it seems for all pseudo fields, transformers and functions used in the fl-parameter. I disabled TVC in solrconfig.xml and now it is working.
          Hide
          Luca Cavanna added a comment -

          Hi all,
          are there any plan to support an exclusion syntax? I'm not sure the fl (field list) is the right place, but I would like to exclude some of the fields (by default) from the output. This would be better than specifying a (long) list of fields that I want, which would need to be updated every time I add a field to my schema.
          Wouldn't this be an useful feature?

          Show
          Luca Cavanna added a comment - Hi all, are there any plan to support an exclusion syntax? I'm not sure the fl (field list) is the right place, but I would like to exclude some of the fields (by default) from the output. This would be better than specifying a (long) list of fields that I want, which would need to be updated every time I add a field to my schema. Wouldn't this be an useful feature?
          Hide
          Luca Cavanna added a comment -

          When I wrote the last comment I didn't realize this was already committed! I think the fact that the issue is still open is a bit misleading, can't we close it?

          Show
          Luca Cavanna added a comment - When I wrote the last comment I didn't realize this was already committed! I think the fact that the issue is still open is a bit misleading, can't we close it?
          Hide
          Ryan McKinley added a comment -

          Lets discuss any problems in new issues

          Show
          Ryan McKinley added a comment - Lets discuss any problems in new issues

            People

            • Assignee:
              Ryan McKinley
              Reporter:
              Ryan McKinley
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development