Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I think it would be useful to add a way to exclude field from the Solr response. If I have for example 100 stored fields and I want to return all of them but one, it would be handy to list just the field I want to exclude instead of the 99 fields for inclusion through fl.

      1. SOLR-3191.patch
        56 kB
        Scott Stults
      2. SOLR-3191.patch
        55 kB
        Kuntal Ganguly
      3. SOLR-3191.patch
        67 kB
        Andrea Gazzarini
      4. SOLR-3191.patch
        67 kB
        Andrea Gazzarini

        Activity

        Luca Cavanna created issue -
        Hide
        Luca Cavanna added a comment -

        I previously left a comment on the SOLR-2444 asking if we can add an exclusion syntax there, but I'm honestly afraid of that now, given also the SOLR-2719 regression.
        I was thinking to keep it simple, for example using a new parameter like fl.ex or something, may I know what are your thoughts?

        Show
        Luca Cavanna added a comment - I previously left a comment on the SOLR-2444 asking if we can add an exclusion syntax there, but I'm honestly afraid of that now, given also the SOLR-2719 regression. I was thinking to keep it simple, for example using a new parameter like fl.ex or something, may I know what are your thoughts?
        Hide
        Jan Høydahl added a comment -

        I think by definition a field name cannot start with "-" since even the Lucene query syntax must support the alternative NOT syntax -inStock:true. I therefore think we should borrow this syntax over here, just as we are doing for dismax uf in SOLR-3026. It should be easy enough to parse.

        Examples:

        q=foo&fl=-title           # All fields except title (include * is implicit)
        q=foo&fl=-title,-body     # All fields except title and body
        q=foo&fl=-*_s             # All fields except those matching *_s pattern
        q=foo&fl=foo,-bar         # Would be the same as saying only &fl=foo, since an explicit field resets the list
        q=foo&fl=-bar,foo         # same as above. Order not important?
        q=foo&fl=-foo,foo         # same as above. Probably all the exclude fields should be parsed last and override positive?
        q=foo&fl=*,-title         # same as only -title as * is implicit include when only exclude fields listed?
        q=foo&fl=-*               # Hmm, makes no sense, if you're not returning anything you should ask for rows=0 ??
        q=foo&fl=*z,-*xyz         # Include all fields ending in "z" except those ending in "xyz", negative rule applied last
        
        Show
        Jan Høydahl added a comment - I think by definition a field name cannot start with "-" since even the Lucene query syntax must support the alternative NOT syntax -inStock:true . I therefore think we should borrow this syntax over here, just as we are doing for dismax uf in SOLR-3026 . It should be easy enough to parse. Examples: q=foo&fl=-title # All fields except title (include * is implicit) q=foo&fl=-title,-body # All fields except title and body q=foo&fl=-*_s # All fields except those matching *_s pattern q=foo&fl=foo,-bar # Would be the same as saying only &fl=foo, since an explicit field resets the list q=foo&fl=-bar,foo # same as above. Order not important? q=foo&fl=-foo,foo # same as above. Probably all the exclude fields should be parsed last and override positive? q=foo&fl=*,-title # same as only -title as * is implicit include when only exclude fields listed? q=foo&fl=-* # Hmm, makes no sense, if you're not returning anything you should ask for rows=0 ?? q=foo&fl=*z,-*xyz # Include all fields ending in "z" except those ending in "xyz", negative rule applied last
        Hide
        Yonik Seeley added a comment -

        +1, looks great Jan!

        Show
        Yonik Seeley added a comment - +1, looks great Jan!
        Hide
        Luca Cavanna added a comment -

        Thanks for your feedbacks!
        I had thought about the same solution too...I was just afraid of adding complexity to the new fl syntax, but if you think that's the way to go, great, and I'd like to contribute on this!
        My concern is just about SOLR-2719. Don't you think we should correct it and add some kind of consistency before adding new features? What do you think about adding some kind of validation for field names? Shall we open a new issue for that?

        Show
        Luca Cavanna added a comment - Thanks for your feedbacks! I had thought about the same solution too...I was just afraid of adding complexity to the new fl syntax, but if you think that's the way to go, great, and I'd like to contribute on this! My concern is just about SOLR-2719 . Don't you think we should correct it and add some kind of consistency before adding new features? What do you think about adding some kind of validation for field names? Shall we open a new issue for that?
        Hide
        Jan Høydahl added a comment -

        Luca Cavanna, now that SOLR-2719 is fixed, I think it should be green lights for this, if you'd like to attempt a patch. I don't know if the code from UserFields class may be helpful at all..

        Show
        Jan Høydahl added a comment - Luca Cavanna , now that SOLR-2719 is fixed, I think it should be green lights for this, if you'd like to attempt a patch. I don't know if the code from UserFields class may be helpful at all..
        Hide
        Bill Bell added a comment -

        This is a very useful feature.... I have 80 fields and want to just eclude one large field...

        Show
        Bill Bell added a comment - This is a very useful feature.... I have 80 fields and want to just eclude one large field...
        Hide
        Andrea Gazzarini added a comment - - edited

        Hi all, I'm going to complete the implementation of the field exclusion (using trunk code) but in order to complete I have some questions (so probably I'll have to change a little bit something on my code):

        • New ReturnFields implementor or SolrReturnFields change?
          At the moment I created another class so the old implementation (SolrReturnFields) is still there. There's no code duplication (or just a little bit that can be remove if that is ok by changing the SolrReturnFields too) because the fl parsing uses a different logic (basically no QueryParsing.StrParser).
        • SolrReturnFields:83 support for fl=' ' => *,score
          There's a comment below this line about an old feature that could be removed. On the wiki there's no mention about that so can I remove that?
        • About glob
          What would be the right behaviour (see below)? A full name expansion support or just *aaa bbb*? Does SOLR need that complexity? I started playing with SOLR in 2009 and honestly I never used globs in fl so I have no concrete experience for taking a decision.
          • The Wiki talks about just one example (with trailing wildcard)
          • org.apache.solr.search.ReturnFieldsTest.testWilcards() considers only beginning and trailing wildcard cases (e.g. *aaa, bbb*)
          • But the code supports a wide range of globs (e.g. a*a, a?a, n*m*,n*m?)
        • fl expressions (probably I will attach some other "ambiguous" case later)
          What would be the expected behaviour in these cases?
          • * -name test
          • -name test *
          • -name * test
          • -* I understand that should be wrong but what would be the correct behaviour? SyntaxError?
          • name name:manu (at the moment it seems the "aliased" field always wins)
          • pippo:name pippo:manu ( at the moment it seems the last alias wins)

        For latest three I believe a SyntaxError would be more appropriate

        Show
        Andrea Gazzarini added a comment - - edited Hi all, I'm going to complete the implementation of the field exclusion (using trunk code) but in order to complete I have some questions (so probably I'll have to change a little bit something on my code): New ReturnFields implementor or SolrReturnFields change? At the moment I created another class so the old implementation (SolrReturnFields) is still there. There's no code duplication (or just a little bit that can be remove if that is ok by changing the SolrReturnFields too) because the fl parsing uses a different logic (basically no QueryParsing.StrParser). SolrReturnFields:83 support for fl=' ' => *,score There's a comment below this line about an old feature that could be removed. On the wiki there's no mention about that so can I remove that? About glob What would be the right behaviour (see below)? A full name expansion support or just *aaa bbb*? Does SOLR need that complexity? I started playing with SOLR in 2009 and honestly I never used globs in fl so I have no concrete experience for taking a decision. The Wiki talks about just one example (with trailing wildcard) org.apache.solr.search.ReturnFieldsTest.testWilcards() considers only beginning and trailing wildcard cases (e.g. *aaa, bbb*) But the code supports a wide range of globs (e.g. a*a, a?a, n*m*,n*m?) fl expressions (probably I will attach some other "ambiguous" case later) What would be the expected behaviour in these cases? * -name test -name test * -name * test -* I understand that should be wrong but what would be the correct behaviour? SyntaxError? name name:manu (at the moment it seems the "aliased" field always wins) pippo:name pippo:manu ( at the moment it seems the last alias wins) For latest three I believe a SyntaxError would be more appropriate
        Hide
        David Morana added a comment -

        This is exactly one of the features I was looking for.
        If possible can you make a patch of this for Solr v4.0 Final?
        Thanks,

        Show
        David Morana added a comment - This is exactly one of the features I was looking for. If possible can you make a patch of this for Solr v4.0 Final? Thanks,
        Hide
        Erick Erickson added a comment -

        No body has put up any kind of patch for this, it's just a suggestion at this point.

        If you'd like to supply one that would be great!

        Show
        Erick Erickson added a comment - No body has put up any kind of patch for this, it's just a suggestion at this point. If you'd like to supply one that would be great!
        Hide
        Andrea Gazzarini added a comment -

        Hi, Some days ago I built a working implementation of ReturnFields some days ago. It's not yet ready because (in my free time) I'm trying to enumerate all scenarios in ReturnFieldsTest.
        Hope to submit a patch soon.

        Show
        Andrea Gazzarini added a comment - Hi, Some days ago I built a working implementation of ReturnFields some days ago. It's not yet ready because (in my free time) I'm trying to enumerate all scenarios in ReturnFieldsTest. Hope to submit a patch soon.
        Hide
        Andrea Gazzarini added a comment -

        Hi all, I attached a patch for the fl exclusion feature.
        Sorry for the very long post and for delay. I will try to be as short as possible, there are a lot of things that need to be considered but in general I would say that the whole stuff is easier to try than to explain.

        Token types

        Lets start with token types. The following tokens are supported (just a short list, each type is subsequently explained):

        • literal inclusion: name, id, title, subject (can be aliased like alias:fieldname);
        • inclusion glob: na*, n?m*, *me (cannot be aliased because the expression could match more than one fields);
        • *: as before, that means "all real fields": internally is managed as a special case of inclusion glob;
        • literal exclusion: -name, -id, -title (cannot be aliased, doesn't make sense)
        • exclusion glob: -na*, -n?m*, -*me (cannot be aliased, doesn't make sense); as special case -* is ignored.
        • transformers: [explain], [docid], [shard] (can be aliased like alias:[docid])
        • functions: sum(1,1), "literal value" (can be aliased like alias:sum(1,1), myalias:"literal value")
        Literal inclusion

        A literal inclusion declares a (real) field that must be returned in response. Supports aliasing (myalias:fieldname)
        Examples:

        • name
        • this.is.a.valid.field (see ReturnFieldsTest.testDotInFieldName)
        • this-is-a-valid-field (see ReturnFieldsTest.testHyphenInFieldName)
        • this$is$a$valid$field (see ReturnFieldsTest.testDollarInFieldName)
        • #foo_s,fl (see ReturnFieldsTest.testFunkyFieldNames)
        Inclusion glob

        An inclusion glob declares an expression using * and / or ?. Fields matching that expression will be returned in response.
        Examples (see ReturnFieldsTest.testWildCards / testReturnAllFields / testReturnOnlyName)

        • na*
        • n?m?
        • n*
        • *me
        • *m?

        Aliases cannot be applied to inclusion globs because an expression could match more than one fields. The expression wont' be discarded, just the alias part. So for example

        fl=alias:na* will collect the na* inclusion glob

        Literal exclusion

        A literal exclusion declares a field that will be excluded in response. Basically it is a literal inclusion prefixed by -
        Examples (see ReturnFieldsTest.testReturnOnlyName / testReturnOnlyNameAndTitle / testReturnAllRealFieldsExceptName):

        • -name
        • -this.is.a.valid.field (see ReturnFieldsTest.testDotInFieldName)
        • -this-is-a-valid-field (see ReturnFieldsTest.testHyphenInFieldName)
        • -this$is$a$valid$field (see ReturnFieldsTest.testDollarInFieldName)
        • -#foo_s,-fl (see ReturnFieldsTest.testFunkyFieldNames)

        Aliasing is not supported so

        fl=-alias:fieldname

        is silently discarded. Note that in this case or in this one:

        fl=-a1:a -a2:b -a3:c

        all tokens are invalid so the expressions will be resolved in

        fl=

        and therefore standard procedure applies (see last testcase on ReturnFieldsTest.testReturnAllRealFields)

        exclusion glob

        An exclusion glob is an expression that uses * and / or ?. Fields matching that expression will be excluded from response.
        Examples (see ReturnFieldsTest.testReturnOnlyName / testReturnOnlyNameAndTitle / testReturnAllRealFieldsExceptName):

        • -na*
        • -n?m?
        • -n*
        • -*me
        • -*m?

        As literal exclusions, you cannot use aliases in these expressions, they will be silently discarded (see last testcase on ReturnFieldsTest.testReturnAllRealFields)

        transformers

        Just one important thing: invalid transformers are ignored.
        Examples (you can see all the assertions below in ReturnsFieldsTest.testTransformers):

        • fl=[explain] (one transformer)
        • fl=[docid] [explain] (two transformers)
        • fl=[shard] id (a transformer and a real field)
        • fl=[xxxxxx] (an invalid transformer. That will have the same effect of having an empty fl)
        • fl=[xxxxxx] [yyyyy] (two invalid transformers. That will have the same effect of having an empty fl)
        • fl=[docid] [shard] [xxxx] (the invalid transformer will be ignored so only [docid] and [shard] will be evaluated
        • fl=myalias:[docid] (aliased transformer)
        • fl=alias:[yyyyyy] (invalid aliased transformer - will be ignored, see last test on ReturnFieldsTest.testReturnAllRealFields)
        Functions (and literals)

        All functions described here http://wiki.apache.org/solr/FunctionQuery#Available_Functions
        can be declared in "fl" parameter. They can be aliased too.
        Examples (see ReturnFieldsTest.testFunctions)

        • fl=sum(1,1)
        • fl="this is a literal", 'this is another literal", 1.2
        • fl=pippo:"this is an aliased literal", pluto:'this is another aliased literal", paperino:1.2
        • fl= {!func}add($v1,$v2)&v1=10&v2=13.0
          - fl=alias:{!func}

          add($v1,$v2)&v1=10&v2=13.0

        WARNING: differently from transformers, in case of invalid function an exception will be thrown.

        Aliases

        Aliases are supported on

        • literal field names (e.g. alias:fieldname)
        • transformers (e.g. alias:[docid])
        • functions (e.g. alias:max(1,2))

        and are not supported on

        • inclusion globs (because expression could match multiple fields and alias in response must be uniquely associated to a field. The alias part of the token expression will be ignored, the glob expression collected)
        • literal exclusion (doesn't make sense, will be silently ignored)
        • exclusion globs (doesn't make sense, will be silently ignored)

        General rules

        • an inclusion (literal or glob) is ignored if in case of *.
          • fl=* name na* (return all fields, name and na* fields are already included)
          • fl=name * (same as before)
          • any exclusion token will be ignored if an inclusion has been defined before
          • fl=name -id (returns all fields except id so "name", with other real fields, is implicitly included)
        • an inclusion token will clear all exclusion
          • fl=-id name (means "returns only name", no matter what exclusions are defined before)
          • fl=-id *
          • the * expression is ignored in case of an exclusion (literal or glob)
          • * -name (the * doesn't make sense, or in other words is implicit: returns all fields except name)
          • -name * (same as before)
        • score, transformers and functions doesn't change the behaviour described above, they are just added to response
          • fl=score, *, name
          • fl=*, [docid], name, sum(1,1), myfunctionalias:"literal value"
          • fl=name -id [shard], exchange:1.34
        • an empty or null fl will execute as "*" (all real fields). Note that we can be in this situation if we pass an empty fl, no fl or if all tokens in fl are discarded because invalid.
        • some other rules like multiplicity on fl parameter (see ReturnFieldsTest.testManyParameters), token separator (comma or whitespaces - see ReturnFieldsTest.testWhitespace and ReturnFieldsTest.testSeparators) remain the same.

        Some (technical) consideration

        1. I'm not sure about the test methods granularity level: some test methods, although grouping the same concept, contain a lot of sub-cases / scenarios and sometimes is hard to capture the functional behaviour (see for example ReturnFieldsTest.testTransformers or ReturnFieldsTest.testAliases). Maybe doing a test method for each sub-case should be better? I modified the ReturnFieldsTest for adding additional test cases but I followed the existing approach...let me know what you think about a much fine level of granularity..in case I can try to make some refactoring in that direction

        2. There are two (at least it seems to me) classes that are testing ReturnFields behaviour : ReturnFieldsTest and TestPseudoReturnFields...maybe all those test methods could be part of one test case? I don't know the reason behind the choice of having two classes so I'm just asking, the TestPseudoReturnFields is still there and all tests are green

        Any feedback is warmly welcome.
        Andrea

        Show
        Andrea Gazzarini added a comment - Hi all, I attached a patch for the fl exclusion feature. Sorry for the very long post and for delay. I will try to be as short as possible, there are a lot of things that need to be considered but in general I would say that the whole stuff is easier to try than to explain. Token types Lets start with token types. The following tokens are supported (just a short list, each type is subsequently explained): literal inclusion: name, id, title, subject (can be aliased like alias:fieldname); inclusion glob: na*, n?m*, *me (cannot be aliased because the expression could match more than one fields); *: as before, that means "all real fields": internally is managed as a special case of inclusion glob; literal exclusion: -name, -id, -title (cannot be aliased, doesn't make sense) exclusion glob: -na*, -n?m*, -*me (cannot be aliased, doesn't make sense); as special case -* is ignored. transformers: [explain], [docid], [shard] (can be aliased like alias:[docid]) functions: sum(1,1), "literal value" (can be aliased like alias:sum(1,1), myalias:"literal value") Literal inclusion A literal inclusion declares a (real) field that must be returned in response. Supports aliasing (myalias:fieldname) Examples: name this.is.a.valid.field (see ReturnFieldsTest.testDotInFieldName) this-is-a-valid-field (see ReturnFieldsTest.testHyphenInFieldName) this$is$a$valid$field (see ReturnFieldsTest.testDollarInFieldName) #foo_s,fl (see ReturnFieldsTest.testFunkyFieldNames) Inclusion glob An inclusion glob declares an expression using * and / or ?. Fields matching that expression will be returned in response. Examples (see ReturnFieldsTest.testWildCards / testReturnAllFields / testReturnOnlyName) na* n?m? n* *me *m? Aliases cannot be applied to inclusion globs because an expression could match more than one fields. The expression wont' be discarded, just the alias part. So for example fl=alias:na* will collect the na* inclusion glob Literal exclusion A literal exclusion declares a field that will be excluded in response. Basically it is a literal inclusion prefixed by - Examples (see ReturnFieldsTest.testReturnOnlyName / testReturnOnlyNameAndTitle / testReturnAllRealFieldsExceptName): -name -this.is.a.valid.field (see ReturnFieldsTest.testDotInFieldName) -this-is-a-valid-field (see ReturnFieldsTest.testHyphenInFieldName) -this$is$a$valid$field (see ReturnFieldsTest.testDollarInFieldName) -#foo_s,-fl (see ReturnFieldsTest.testFunkyFieldNames) Aliasing is not supported so fl=-alias:fieldname is silently discarded. Note that in this case or in this one: fl=-a1:a -a2:b -a3:c all tokens are invalid so the expressions will be resolved in fl= and therefore standard procedure applies (see last testcase on ReturnFieldsTest.testReturnAllRealFields) exclusion glob An exclusion glob is an expression that uses * and / or ?. Fields matching that expression will be excluded from response. Examples (see ReturnFieldsTest.testReturnOnlyName / testReturnOnlyNameAndTitle / testReturnAllRealFieldsExceptName): -na* -n?m? -n* -*me -*m? As literal exclusions, you cannot use aliases in these expressions, they will be silently discarded (see last testcase on ReturnFieldsTest.testReturnAllRealFields) transformers Just one important thing: invalid transformers are ignored. Examples (you can see all the assertions below in ReturnsFieldsTest.testTransformers): fl=[explain] (one transformer) fl=[docid] [explain] (two transformers) fl=[shard] id (a transformer and a real field) fl=[xxxxxx] (an invalid transformer. That will have the same effect of having an empty fl) fl=[xxxxxx] [yyyyy] (two invalid transformers. That will have the same effect of having an empty fl) fl=[docid] [shard] [xxxx] (the invalid transformer will be ignored so only [docid] and [shard] will be evaluated fl=myalias:[docid] (aliased transformer) fl=alias:[yyyyyy] (invalid aliased transformer - will be ignored, see last test on ReturnFieldsTest.testReturnAllRealFields) Functions (and literals) All functions described here http://wiki.apache.org/solr/FunctionQuery#Available_Functions can be declared in "fl" parameter. They can be aliased too. Examples (see ReturnFieldsTest.testFunctions) fl=sum(1,1) fl="this is a literal", 'this is another literal", 1.2 fl=pippo:"this is an aliased literal", pluto:'this is another aliased literal", paperino:1.2 fl= {!func}add($v1,$v2)&v1=10&v2=13.0 - fl=alias:{!func} add($v1,$v2)&v1=10&v2=13.0 WARNING: differently from transformers, in case of invalid function an exception will be thrown. Aliases Aliases are supported on literal field names (e.g. alias:fieldname) transformers (e.g. alias:[docid]) functions (e.g. alias:max(1,2)) and are not supported on inclusion globs (because expression could match multiple fields and alias in response must be uniquely associated to a field. The alias part of the token expression will be ignored, the glob expression collected) literal exclusion (doesn't make sense, will be silently ignored) exclusion globs (doesn't make sense, will be silently ignored) General rules an inclusion (literal or glob) is ignored if in case of *. fl=* name na* (return all fields, name and na* fields are already included) fl=name * (same as before) any exclusion token will be ignored if an inclusion has been defined before fl=name -id (returns all fields except id so "name", with other real fields, is implicitly included) an inclusion token will clear all exclusion fl=-id name (means "returns only name", no matter what exclusions are defined before) fl=-id * the * expression is ignored in case of an exclusion (literal or glob) * -name (the * doesn't make sense, or in other words is implicit: returns all fields except name) -name * (same as before) score, transformers and functions doesn't change the behaviour described above, they are just added to response fl=score, *, name fl=*, [docid], name, sum(1,1), myfunctionalias:"literal value" fl=name -id [shard], exchange:1.34 an empty or null fl will execute as "*" (all real fields). Note that we can be in this situation if we pass an empty fl, no fl or if all tokens in fl are discarded because invalid. some other rules like multiplicity on fl parameter (see ReturnFieldsTest.testManyParameters), token separator (comma or whitespaces - see ReturnFieldsTest.testWhitespace and ReturnFieldsTest.testSeparators) remain the same. Some (technical) consideration 1. I'm not sure about the test methods granularity level: some test methods, although grouping the same concept, contain a lot of sub-cases / scenarios and sometimes is hard to capture the functional behaviour (see for example ReturnFieldsTest.testTransformers or ReturnFieldsTest.testAliases). Maybe doing a test method for each sub-case should be better? I modified the ReturnFieldsTest for adding additional test cases but I followed the existing approach...let me know what you think about a much fine level of granularity..in case I can try to make some refactoring in that direction 2. There are two (at least it seems to me) classes that are testing ReturnFields behaviour : ReturnFieldsTest and TestPseudoReturnFields...maybe all those test methods could be part of one test case? I don't know the reason behind the choice of having two classes so I'm just asking, the TestPseudoReturnFields is still there and all tests are green Any feedback is warmly welcome. Andrea
        Hide
        Andrea Gazzarini added a comment -

        Patch for field exclusion in fl

        Show
        Andrea Gazzarini added a comment - Patch for field exclusion in fl
        Andrea Gazzarini made changes -
        Field Original Value New Value
        Attachment SOLR-3191.patch [ 12615801 ]
        Shalin Shekhar Mangar made changes -
        Assignee Shalin Shekhar Mangar [ shalinmangar ]
        Hide
        Robert Muir added a comment -

        the abstract ReturnFields should not have constants. its abstract, it has no syntax. that stuff needs to be moved to an implementation class.

        Separately, SolrReturnFields.java i think has already reached the limit where its totally un-understandable. The original discussion on this issue talked about having a separate implementation, but it seems instead it was all piled into ReturnFields.

        What to do about fields that actually start with "-" ? Remember, not everyone uses the lucene query parser. The assumptions listed on the beginning of this issue about that are totally wrong.

        I think we should make the ReturnFields implementation something that can be plugged into the solrconfig without writing java code and provide alternatives (for example: one that is simple and where the code can be understood)

        Show
        Robert Muir added a comment - the abstract ReturnFields should not have constants. its abstract, it has no syntax. that stuff needs to be moved to an implementation class. Separately, SolrReturnFields.java i think has already reached the limit where its totally un-understandable. The original discussion on this issue talked about having a separate implementation, but it seems instead it was all piled into ReturnFields. What to do about fields that actually start with "-" ? Remember, not everyone uses the lucene query parser. The assumptions listed on the beginning of this issue about that are totally wrong. I think we should make the ReturnFields implementation something that can be plugged into the solrconfig without writing java code and provide alternatives (for example: one that is simple and where the code can be understood)
        Hide
        Andrea Gazzarini added a comment -

        > the abstract ReturnFields should not have constants. its abstract, it has no syntax. that stuff needs to be moved to an implementation class.

        No problem at all, I can reattach the patch

        > Separately, SolrReturnFields.java i think has already reached the limit where its totally un-understandable. The original discussion on this issue talked about having a separate implementation, but it seems instead it was all piled into ReturnFields.

        Sorry I didn't find those assumptions in this issue. It's all into (Solr)ReturnFields class because all the described logic is there. I can separate the parser (which from "some" perspectives I could agree with you, inner classes complicates the reading) and the builder logic if you think is better.

        > I think we should make the ReturnFields implementation something that can be plugged into the solrconfig without writing java code and provide alternatives (for example: one that is simple and where the code can be understood)

        If things need to be done in this way I can try to do that. Again, I didn't because in this issue there's no trace about that "pluggabilty". Also please clarify me about what you consider understandable so I can follow that direction.

        Show
        Andrea Gazzarini added a comment - > the abstract ReturnFields should not have constants. its abstract, it has no syntax. that stuff needs to be moved to an implementation class. No problem at all, I can reattach the patch > Separately, SolrReturnFields.java i think has already reached the limit where its totally un-understandable. The original discussion on this issue talked about having a separate implementation, but it seems instead it was all piled into ReturnFields. Sorry I didn't find those assumptions in this issue. It's all into (Solr)ReturnFields class because all the described logic is there. I can separate the parser (which from "some" perspectives I could agree with you, inner classes complicates the reading) and the builder logic if you think is better. > I think we should make the ReturnFields implementation something that can be plugged into the solrconfig without writing java code and provide alternatives (for example: one that is simple and where the code can be understood) If things need to be done in this way I can try to do that. Again, I didn't because in this issue there's no trace about that "pluggabilty". Also please clarify me about what you consider understandable so I can follow that direction.
        Hide
        Yonik Seeley added a comment -

        I like the direction you're going Adrea!
        Frankly, I don't understand why there is an abstract class at all, or why we need plugabillity here at all. This should really be about interface and making the simple things simple. The edge cases like fields starting with a "-" can be handled as edge cases (via field() function or whatever) and should not dominate the design here.

        Show
        Yonik Seeley added a comment - I like the direction you're going Adrea! Frankly, I don't understand why there is an abstract class at all, or why we need plugabillity here at all. This should really be about interface and making the simple things simple. The edge cases like fields starting with a "-" can be handled as edge cases (via field() function or whatever) and should not dominate the design here.
        Hide
        Andrea Gazzarini added a comment -

        Thank you very much Yonik, very happy to hear that.

        I agree with you regarding the abstract class: ReturnFields should be an interface; I left it as a class because I don't know the past behind the code so my concern was to respect and preserve as much as possible what others did (not for SolrReturnFields but here there were a lot if things that required changes)

        So, do you want me to update the patch with

        a) ReturnFields as interface
        b) Constants removed and put in SolrReturnFields as Robert suggested?

        Show
        Andrea Gazzarini added a comment - Thank you very much Yonik, very happy to hear that. I agree with you regarding the abstract class: ReturnFields should be an interface; I left it as a class because I don't know the past behind the code so my concern was to respect and preserve as much as possible what others did (not for SolrReturnFields but here there were a lot if things that required changes) So, do you want me to update the patch with a) ReturnFields as interface b) Constants removed and put in SolrReturnFields as Robert suggested?
        Hide
        Ryan Ernst added a comment -

        a) ReturnFields as interface

        ReturnFields is an abstract class so that changes to the interface can not break concrete implementations. It was originally added in SOLR-4226.

        b) Constants removed and put in SolrReturnFields as Robert suggested?

        Yes Robert Muir is right that there shouldn't be constants in the ReturnFields. Leave that up to the concrete implementations. I also think making it pluggable would be very nice, but should be a separate issue.

        Show
        Ryan Ernst added a comment - a) ReturnFields as interface ReturnFields is an abstract class so that changes to the interface can not break concrete implementations. It was originally added in SOLR-4226 . b) Constants removed and put in SolrReturnFields as Robert suggested? Yes Robert Muir is right that there shouldn't be constants in the ReturnFields. Leave that up to the concrete implementations. I also think making it pluggable would be very nice, but should be a separate issue.
        Hide
        Yonik Seeley added a comment -

        I agree with you regarding the abstract class: ReturnFields should be an interface;

        Heh - I see my use of "interface" was ambiguous. I actually meant the higher level text (http, etc) interface to Solr is important, not the java level. It's that interface that really counts... being able to pass fl=foo, _important, -bar, -_big
        So I was questioning why there were even two classes, rather than just a single one. If a super-expert user wants to make their own customized version, they can copy-n-paste the code, but it's so far into the outliers we shouldn't be designing around it.

        Show
        Yonik Seeley added a comment - I agree with you regarding the abstract class: ReturnFields should be an interface; Heh - I see my use of "interface" was ambiguous. I actually meant the higher level text (http, etc) interface to Solr is important, not the java level. It's that interface that really counts... being able to pass fl=foo, _important, -bar, - _big So I was questioning why there were even two classes, rather than just a single one. If a super-expert user wants to make their own customized version, they can copy-n-paste the code, but it's so far into the outliers we shouldn't be designing around it.
        Hide
        Andrea Gazzarini added a comment -

        Heh - I see my use of "interface" was ambiguous.

        Heheh...and in addition there's my wonderful english.

        I understand what you mean and I agree. The only problem I see if we want to remove ReturnFields is that it is referenced 63 times (moreless) in source code so the patch would have a more delicate impact (at the moment the patch changed just 3 isolated files).

        Otherwise, from an OO perspective, what Ryan Ernst said is right so we could just remove the constants in the abstract class.

        Show
        Andrea Gazzarini added a comment - Heh - I see my use of "interface" was ambiguous. Heheh...and in addition there's my wonderful english. I understand what you mean and I agree. The only problem I see if we want to remove ReturnFields is that it is referenced 63 times (moreless) in source code so the patch would have a more delicate impact (at the moment the patch changed just 3 isolated files). Otherwise, from an OO perspective, what Ryan Ernst said is right so we could just remove the constants in the abstract class.
        Hide
        Andrea Gazzarini added a comment -

        New version of the patch – ReturnFields without constants.

        Show
        Andrea Gazzarini added a comment - New version of the patch – ReturnFields without constants.
        Andrea Gazzarini made changes -
        Attachment SOLR-3191.patch [ 12617644 ]
        Hide
        Shalin Shekhar Mangar added a comment -

        I don't have time right now to review this. I assigned it to myself because there was a lot of public interest but no assignee. However it looks like a couple of other committers have interest in this issue as well. I can only look at this after a few weeks so if no one takes it up, then I will.

        Show
        Shalin Shekhar Mangar added a comment - I don't have time right now to review this. I assigned it to myself because there was a lot of public interest but no assignee. However it looks like a couple of other committers have interest in this issue as well. I can only look at this after a few weeks so if no one takes it up, then I will.
        Shalin Shekhar Mangar made changes -
        Assignee Shalin Shekhar Mangar [ shalinmangar ]
        Hide
        Kuntal Ganguly added a comment -

        I have tried this patch on Solr 4.2.1 and 4.9, but this patch is not working with custom doc transformer.

        I have a custom transformer that does the following:

        The Custom transformer for joining between two entities and merging fields from second entity into the response of 1st entity is deployed in both local and staging boxes.

        Usecase:

        Say Entity-1 type=ProjectDocument and Entity-2 type=Projects.Now we want to search on certain fields of Entity-1 (document_createbyname),
        then join with Entity-2 based on common field/fields between two entities (pinprojectid)and finally fetch and merge field/fields of Entity-2(projecttype) into the response of Entity-1.

        Sample Query:

        Local::
        http://10.10.87.91:4983/solr/collection1/select?q=type:ProjectDocument AND document_createbyname:"Sudipta Sarkar"
        &fl=*,[mytrans5 jf=pinprojectid if=type:Projects mf=projecttype]

        Staging::
        http://10.1.1.230:6983/solr/collection1/select?q=type:ProjectDocument AND document_documenttitle:testing&&fl=*,[mytrans5 jf=pinprojectid if=type:Projects mf=pwaccountid,projectaddress2,projecttype,projectenddate,project_city]

        Details of transformer

        Transformer name->mytrans5
        It has three parameter:

        1. jf (Joining Field- Mandatory)= can be single or multi-value separated by comma. This should contain common fields between two entities.

        2. mf (Merge Field- Mandatory)= can be single or multi-value separated by comma. This fields will be fetch from 2nd entity and merge with 1st entity response.

        3. if (Identity Field- Optional)= single value. This fields specify the second entity uniquely from 1st entity (say type:Projects). If this fields is not provided during query,this will then read default value from configuration file. If default value is also missing,then it will take the first value from merge field parameter.

        [mytrans5 jf=pinprojectid,pwaccountid if=type:Projects mf=projecttype,projectenddate,project_city]

        Can you suggest me what to do? Because the transformer is working fine without this patch properly as expected.

        Show
        Kuntal Ganguly added a comment - I have tried this patch on Solr 4.2.1 and 4.9, but this patch is not working with custom doc transformer. I have a custom transformer that does the following: The Custom transformer for joining between two entities and merging fields from second entity into the response of 1st entity is deployed in both local and staging boxes. Usecase: Say Entity-1 type=ProjectDocument and Entity-2 type=Projects.Now we want to search on certain fields of Entity-1 (document_createbyname), then join with Entity-2 based on common field/fields between two entities (pinprojectid)and finally fetch and merge field/fields of Entity-2(projecttype) into the response of Entity-1. Sample Query: Local:: http://10.10.87.91:4983/solr/collection1/select?q=type:ProjectDocument AND document_createbyname:"Sudipta Sarkar" &fl=*, [mytrans5 jf=pinprojectid if=type:Projects mf=projecttype] Staging:: http://10.1.1.230:6983/solr/collection1/select?q=type:ProjectDocument AND document_documenttitle:testing&&fl=*, [mytrans5 jf=pinprojectid if=type:Projects mf=pwaccountid,projectaddress2,projecttype,projectenddate,project_city] Details of transformer Transformer name->mytrans5 It has three parameter: 1. jf (Joining Field- Mandatory)= can be single or multi-value separated by comma. This should contain common fields between two entities. 2. mf (Merge Field- Mandatory)= can be single or multi-value separated by comma. This fields will be fetch from 2nd entity and merge with 1st entity response. 3. if (Identity Field- Optional)= single value. This fields specify the second entity uniquely from 1st entity (say type:Projects). If this fields is not provided during query,this will then read default value from configuration file. If default value is also missing,then it will take the first value from merge field parameter. [mytrans5 jf=pinprojectid,pwaccountid if=type:Projects mf=projecttype,projectenddate,project_city] Can you suggest me what to do? Because the transformer is working fine without this patch properly as expected.
        Hide
        Kuntal Ganguly added a comment -

        I modify the code to support inbuilt alias transformer as well as custom transformer:

        Modify the method:

        /**

        • Logic to handle Custom as well as inbuilt Doc Transformer.
        • */

        void onTransformer(final StringBuilder expressionBuffer, final SolrQueryRequest request, final DocTransformers augmenters)
        throws SyntaxError {
        if(expressionBuffer!=null){
        String fl_Content=expressionBuffer.toString();
        if(fl_Content.contains("[") && fl_Content.contains("]")){

        if(fl_Content.indexOf("[")==0){
        //Logic to Parse Custom Transformers

        final Map<String,String> augmenterCustomArgs = new HashMap<String,String>();
        QueryParsing.parseLocalParams(expressionBuffer.toString(), 0, augmenterCustomArgs, request.getParams(), "[", CLOSE_BRACKET);
        final String augmenterCustomName = augmenterCustomArgs.remove("type");
        final String customDisp='['+augmenterCustomName+']';
        final TransformerFactory customFactory = request.getCore().getTransformerFactory(augmenterCustomName);
        if( customFactory != null )

        { MapSolrParams augmenterCustomParams = new MapSolrParams( augmenterCustomArgs ); augmenters.addTransformer( customFactory.create(customDisp, augmenterCustomParams, request) ); }

        }else if(fl_Content.indexOf("[")>0 && fl_Content.contains(":[")){
        //Logic to Parse In_Built Transformers

        final String alias = getExpressionAlias(expressionBuffer);
        final String transfomerExpression = getExpressionValue(expressionBuffer);
        final Map<String,String> augmenterArgs = new HashMap<String,String>();
        QueryParsing.parseLocalParams(transfomerExpression, 0, augmenterArgs, request.getParams(), "[", CLOSE_BRACKET);
        final String augmenterName = augmenterArgs.remove("type");
        final String aliasThatWillBeUsed = (alias != null) ? alias : OPEN_BRACKET + augmenterName + CLOSE_BRACKET;
        final TransformerFactory factory = request.getCore().getTransformerFactory(augmenterName);
        if (factory != null)

        { augmenters.addTransformer(factory.create(aliasThatWillBeUsed, new MapSolrParams(augmenterArgs), request)); onInclusionLiteralExpression(expressionBuffer, augmenters, false, false); }

        }

        }

        }
        }

        Show
        Kuntal Ganguly added a comment - I modify the code to support inbuilt alias transformer as well as custom transformer: Modify the method: /** Logic to handle Custom as well as inbuilt Doc Transformer. */ void onTransformer(final StringBuilder expressionBuffer, final SolrQueryRequest request, final DocTransformers augmenters) throws SyntaxError { if(expressionBuffer!=null){ String fl_Content=expressionBuffer.toString(); if(fl_Content.contains(" [") && fl_Content.contains("] ")){ if(fl_Content.indexOf("[")==0){ //Logic to Parse Custom Transformers final Map<String,String> augmenterCustomArgs = new HashMap<String,String>(); QueryParsing.parseLocalParams(expressionBuffer.toString(), 0, augmenterCustomArgs, request.getParams(), "[", CLOSE_BRACKET); final String augmenterCustomName = augmenterCustomArgs.remove("type"); final String customDisp=' ['+augmenterCustomName+'] '; final TransformerFactory customFactory = request.getCore().getTransformerFactory(augmenterCustomName); if( customFactory != null ) { MapSolrParams augmenterCustomParams = new MapSolrParams( augmenterCustomArgs ); augmenters.addTransformer( customFactory.create(customDisp, augmenterCustomParams, request) ); } }else if(fl_Content.indexOf("[")>0 && fl_Content.contains(":[")){ //Logic to Parse In_Built Transformers final String alias = getExpressionAlias(expressionBuffer); final String transfomerExpression = getExpressionValue(expressionBuffer); final Map<String,String> augmenterArgs = new HashMap<String,String>(); QueryParsing.parseLocalParams(transfomerExpression, 0, augmenterArgs, request.getParams(), "[", CLOSE_BRACKET); final String augmenterName = augmenterArgs.remove("type"); final String aliasThatWillBeUsed = (alias != null) ? alias : OPEN_BRACKET + augmenterName + CLOSE_BRACKET; final TransformerFactory factory = request.getCore().getTransformerFactory(augmenterName); if (factory != null) { augmenters.addTransformer(factory.create(aliasThatWillBeUsed, new MapSolrParams(augmenterArgs), request)); onInclusionLiteralExpression(expressionBuffer, augmenters, false, false); } } } } }
        Hide
        Erik Hatcher added a comment -

        Kuntal Ganguly - could you attach a full patch against Solr trunk (or branch_4x) so that the latest success working code is available easily? Thanks! Test cases with a custom transformer would be great if you could add those as well. I have interest in this issue, but it looks a bit hairy to review, so best if the latest is consolidated.

        Show
        Erik Hatcher added a comment - Kuntal Ganguly - could you attach a full patch against Solr trunk (or branch_4x) so that the latest success working code is available easily? Thanks! Test cases with a custom transformer would be great if you could add those as well. I have interest in this issue, but it looks a bit hairy to review, so best if the latest is consolidated.
        Hide
        Andrea Gazzarini added a comment -

        Hi guys, I created that patch some time ago, if you need something I will
        be happy to give a help.

        In my comments above you can find all tests involved with the fl
        exclusion, I think additional tests should go there.

        Andrea

        Show
        Andrea Gazzarini added a comment - Hi guys, I created that patch some time ago, if you need something I will be happy to give a help. In my comments above you can find all tests involved with the fl exclusion, I think additional tests should go there. Andrea
        Hide
        Andrea Gazzarini added a comment -

        I meant "additional test methods"

        Show
        Andrea Gazzarini added a comment - I meant "additional test methods"
        Hide
        Kuntal Ganguly added a comment -

        Erik Hatcher Yeah i will upload the patch along with the test cases for transformer fix tomorrow.

        Show
        Kuntal Ganguly added a comment - Erik Hatcher Yeah i will upload the patch along with the test cases for transformer fix tomorrow.
        Hide
        Kuntal Ganguly added a comment - - edited

        Updated the existing patch for working with custom transformer like - [mytrans5 jf=pinprojectid if=type:Projects mf=project_accessibilitytype] , that was working properly without this patch. Updated the onTransformer method in-order to make transformer( inbuilt, alias or custom) work properly. Use-case for the above transformer is mentioned above in my previous comment section.
        Also added two test cases for it.

        Show
        Kuntal Ganguly added a comment - - edited Updated the existing patch for working with custom transformer like - [mytrans5 jf=pinprojectid if=type:Projects mf=project_accessibilitytype] , that was working properly without this patch. Updated the onTransformer method in-order to make transformer( inbuilt, alias or custom) work properly. Use-case for the above transformer is mentioned above in my previous comment section. Also added two test cases for it.
        Kuntal Ganguly made changes -
        Attachment SOLR-3191.patch [ 12664348 ]
        Hide
        Jan Høydahl added a comment -

        Are there known issues with this patch? I see a few TODO's

        Show
        Jan Høydahl added a comment - Are there known issues with this patch? I see a few TODO's
        Hide
        Andrea Gazzarini added a comment -

        Hi Jan,
        if you refer to my long comment above, the two last points are questions
        about the granularity of unit tests, so nothing important, from a
        functional point of view.

        Show
        Andrea Gazzarini added a comment - Hi Jan, if you refer to my long comment above, the two last points are questions about the granularity of unit tests, so nothing important, from a functional point of view.
        Hide
        Roman Kliewer added a comment -

        I think this feature should be higher prioritized since the atomic updates do require all fields stored.

        In my case there are millions of documents and i do need to frequently update the ACL field, so storing of the default search field is required. This causes of course a much slower search because the default search field is returned every time since it can not be excluded and almost all other fields are dynamic.

        IMHO the absence of this feature renders atomic update feature completely unusable.

        Show
        Roman Kliewer added a comment - I think this feature should be higher prioritized since the atomic updates do require all fields stored . In my case there are millions of documents and i do need to frequently update the ACL field, so storing of the default search field is required. This causes of course a much slower search because the default search field is returned every time since it can not be excluded and almost all other fields are dynamic. IMHO the absence of this feature renders atomic update feature completely unusable.
        Hide
        Jan Høydahl added a comment -

        Anyone have been using the patch in production?

        Show
        Jan Høydahl added a comment - Anyone have been using the patch in production?
        Hide
        Andrea Gazzarini added a comment -

        Hi Jan,
        Yes but with an old version of Solr (4.4.0 if I remember well) and I don't
        use all of the stuff offered by the patch (e.g. exclusion globs, exclusion
        with aliasing) just a simple fl=-huge_field which was exactly the
        reason why at that time I started working on this issue

        Show
        Andrea Gazzarini added a comment - Hi Jan, Yes but with an old version of Solr (4.4.0 if I remember well) and I don't use all of the stuff offered by the patch (e.g. exclusion globs, exclusion with aliasing) just a simple fl=-huge_field which was exactly the reason why at that time I started working on this issue
        Hide
        Jan Høydahl added a comment -

        Kuntal Ganguly, there is a problem with your last patch from august, it does not include the Expectation interface.

        Show
        Jan Høydahl added a comment - Kuntal Ganguly , there is a problem with your last patch from august, it does not include the Expectation interface.
        Hide
        Scott Stults added a comment -

        I updated this patch to work on trunk. A couple of things that didn't seem to work with the last patch were:

        • When a field is requested both in its bare form as well as with an alias (e.g. "fl=id,myid:id") the bare form was discarded
        • When fl only contains invalid transformers and aliased exclusion globs, the behavior was expected to be as if * were requested.

        I was able to get the first case to work by adding the requested aliases to a separate list and then applying the RenameFieldTransformer copy argument selectively.

        The second case seems really fringe so the best course might be to open it as a separate Jira later if necessary.

        Show
        Scott Stults added a comment - I updated this patch to work on trunk. A couple of things that didn't seem to work with the last patch were: When a field is requested both in its bare form as well as with an alias (e.g. "fl=id,myid:id") the bare form was discarded When fl only contains invalid transformers and aliased exclusion globs, the behavior was expected to be as if * were requested. I was able to get the first case to work by adding the requested aliases to a separate list and then applying the RenameFieldTransformer copy argument selectively. The second case seems really fringe so the best course might be to open it as a separate Jira later if necessary.
        Scott Stults made changes -
        Attachment SOLR-3191.patch [ 12768412 ]
        Hide
        Demian Katz added a comment - - edited

        I just took a look at test coverage for this patch, and it appears that there are a couple of areas where more tests are needed:

        1.) I'm not sure what public static String getFieldName(QueryParsing.StrParser sp) is for, but it's not covered at all.

        2.) None of the logic related to collectingLiteral seems to be getting tested. I tried adjusting the test class to include a field list containing a function and a literal:

          @Test
          public void testFunctionAndLiteralInFieldList() {
        	ReturnFields rf = new SolrReturnFields(req("fl", "ordOfId:{!func}ord(id),literal:1234"));
        	assertTrue(rf.wantsField("ordOfId"));
        	assertTrue(rf.wantsField("literal"));
        	assertFalse(rf.wantsField("xxx"));
        	assertFalse(rf.wantsAllFields());
          }
        

        Unfortunately, while this test passes, it does not appear to affect the coverage as I had expected.

        3.) For a little low-hanging-fruit coverage improvement, I added this to testFunkyFieldNames, since the method was not covered at all:

        assertEquals("[#foo_s, id]", rf.getRequestedFieldNames().toString());
        

        I'm not sure if that's actually valuable.

        Apologies if any of these comments seem ill-informed; this is my first foray into the Solr code. However, we'd really like to see this patch committed to master, and I'm told that good patch test coverage helps things move along, so I thought I'd take a stab at it. I hope this is of some use! Please let me know if I can be of further assistance.

        Show
        Demian Katz added a comment - - edited I just took a look at test coverage for this patch, and it appears that there are a couple of areas where more tests are needed: 1.) I'm not sure what public static String getFieldName(QueryParsing.StrParser sp) is for, but it's not covered at all. 2.) None of the logic related to collectingLiteral seems to be getting tested. I tried adjusting the test class to include a field list containing a function and a literal: @Test public void testFunctionAndLiteralInFieldList() { ReturnFields rf = new SolrReturnFields(req("fl", "ordOfId:{!func}ord(id),literal:1234")); assertTrue(rf.wantsField("ordOfId")); assertTrue(rf.wantsField("literal")); assertFalse(rf.wantsField("xxx")); assertFalse(rf.wantsAllFields()); } Unfortunately, while this test passes, it does not appear to affect the coverage as I had expected. 3.) For a little low-hanging-fruit coverage improvement, I added this to testFunkyFieldNames, since the method was not covered at all: assertEquals("[#foo_s, id]", rf.getRequestedFieldNames().toString()); I'm not sure if that's actually valuable. Apologies if any of these comments seem ill-informed; this is my first foray into the Solr code. However, we'd really like to see this patch committed to master, and I'm told that good patch test coverage helps things move along, so I thought I'd take a stab at it. I hope this is of some use! Please let me know if I can be of further assistance.
        Hide
        Erik Hatcher added a comment -

        Commenting on the latest patch - why is there so much work to test and allow "funky" field names? Shall we not start enforcing "Field names should consist of alphanumeric or underscore characters only and not start with a digit. " - from https://cwiki.apache.org/confluence/display/solr/Defining+Fields

        Show
        Erik Hatcher added a comment - Commenting on the latest patch - why is there so much work to test and allow "funky" field names? Shall we not start enforcing "Field names should consist of alphanumeric or underscore characters only and not start with a digit. " - from https://cwiki.apache.org/confluence/display/solr/Defining+Fields
        Hide
        Erik Hatcher added a comment -

        Demian Katz thanks for the review; it got me looking at this patch deeper myself. To address your questions, as I look into it myself:

        1) getFieldName is used by some unrelated (LegacyFacet) code only, so seems to be untouched/unused here. Good eye, but no worries here.

        2) Another good eye (how are you testing coverage?) - collectingLiteral is only used when there are single or double quotes being parsed in an fl, which none of the tests do. A different ParserStage is used for regular field names.

        3) I don't think we need to test the .toString of the Set returned, but point taken about testing getReturnFieldNames() contents.

        Still reviewing...

        Show
        Erik Hatcher added a comment - Demian Katz thanks for the review; it got me looking at this patch deeper myself. To address your questions, as I look into it myself: 1) getFieldName is used by some unrelated (LegacyFacet) code only, so seems to be untouched/unused here. Good eye, but no worries here. 2) Another good eye (how are you testing coverage?) - collectingLiteral is only used when there are single or double quotes being parsed in an fl, which none of the tests do. A different ParserStage is used for regular field names. 3) I don't think we need to test the .toString of the Set returned, but point taken about testing getReturnFieldNames() contents. Still reviewing...
        Hide
        Demian Katz added a comment -

        Erik,

        I'm using the jacoco ant task to test coverage – specifically:

        ant jacoco -Dtests.class=org.apache.solr.search.ReturnFieldsTest

        This generates lots of files under build/jacoco, and if you navigate to the one for the SolrReturnFields class you can see very clearly where the weak coverage areas lie.

        • Demian
        Show
        Demian Katz added a comment - Erik, I'm using the jacoco ant task to test coverage – specifically: ant jacoco -Dtests.class=org.apache.solr.search.ReturnFieldsTest This generates lots of files under build/jacoco, and if you navigate to the one for the SolrReturnFields class you can see very clearly where the weak coverage areas lie. Demian
        Hide
        Erick Erickson added a comment -

        bq: Field names should consist of alphanumeric or underscore characters only and not start with a digit.

        Oh my yes!!! +Integer.MAX_VALUE I really hate saying "Well, some other patterns may work, but we don't guarantee it".

        Show
        Erick Erickson added a comment - bq: Field names should consist of alphanumeric or underscore characters only and not start with a digit. Oh my yes!!! +Integer.MAX_VALUE I really hate saying "Well, some other patterns may work, but we don't guarantee it".
        Hide
        Erik Hatcher added a comment -

        Ok, so Scott Stults - wanna take a stab at adjusting the code and tests to only allow allowed field names? Will we get into any trouble for a 5.4 fix version for that or will there be some field names that folks use that won't pass through? I'm ok with the implementation working for these funky field names, but that we don't "officially" support anything funky, so maybe it's ok as-is for now? But it'd be good to be strict about this stuff at some point.

        Show
        Erik Hatcher added a comment - Ok, so Scott Stults - wanna take a stab at adjusting the code and tests to only allow allowed field names? Will we get into any trouble for a 5.4 fix version for that or will there be some field names that folks use that won't pass through? I'm ok with the implementation working for these funky field names, but that we don't "officially" support anything funky, so maybe it's ok as-is for now? But it'd be good to be strict about this stuff at some point.
        Hide
        Scott Stults added a comment -

        I'd definitely be up for trimming out funky field name checks in the tests. Applying field name restrictions just in fl might lead to the situation where data is searchable but not returnable, so should the enforcement portion go to a more central location? Also, should field aliases be restricted in exactly the same way as real field names?

        Show
        Scott Stults added a comment - I'd definitely be up for trimming out funky field name checks in the tests. Applying field name restrictions just in fl might lead to the situation where data is searchable but not returnable, so should the enforcement portion go to a more central location? Also, should field aliases be restricted in exactly the same way as real field names?
        Hide
        Erik Hatcher added a comment -

        That's a good question about field aliases - a number literal, for example, wouldn't be a valid Solr field name by these rules. So I think field aliases are fair game to be funky.

        I don't necessarily propose we tackle all of these field name rules or maybe none of them at all, as part of the main gist of this issue though. Maybe we just get the fl field exclusion aspect of this committed, and deal with field name rules and parsing in other contexts to a separate effort?

        Show
        Erik Hatcher added a comment - That's a good question about field aliases - a number literal, for example, wouldn't be a valid Solr field name by these rules. So I think field aliases are fair game to be funky. I don't necessarily propose we tackle all of these field name rules or maybe none of them at all, as part of the main gist of this issue though. Maybe we just get the fl field exclusion aspect of this committed, and deal with field name rules and parsing in other contexts to a separate effort?
        Hide
        Yonik Seeley added a comment -

        I think new features should be free to ignore funky field name... no back compat requirements.
        Changing existing APIs... I don't know. I guess it depends on how many cases there are out there of funky names, how many of them will upgrade, and what the cost is to us to support.

        That's a good question about field aliases - a number literal, for example, wouldn't be a valid Solr field name by these rules. So I think field aliases are fair game to be funky.

        My guess is that Scott was referring to just the alias name... as in <alias_name> : <alias_value>, where alias_value can be a function query, literal, or transformer, as well as another field.

        Show
        Yonik Seeley added a comment - I think new features should be free to ignore funky field name... no back compat requirements. Changing existing APIs... I don't know. I guess it depends on how many cases there are out there of funky names, how many of them will upgrade, and what the cost is to us to support. That's a good question about field aliases - a number literal, for example, wouldn't be a valid Solr field name by these rules. So I think field aliases are fair game to be funky. My guess is that Scott was referring to just the alias name... as in <alias_name> : <alias_value>, where alias_value can be a function query, literal, or transformer, as well as another field.
        Hide
        Scott Stults added a comment -

        Right. The reason I ask is, aliases would be a decent alternative for folks who need field names like "1_product" should we start enforcing recommended field name restrictions in the schema. But there are other places in the response where we need to render a field name, like in facets and highlighting (I don't know the details of how that's done), so allowing funky aliases could have side effects.

        There are other tickets specifically about leading digits in field names (SOLR-7070, SOLR-3407), so I like Erik Hatcher's suggestion of keeping this focused on the exclusion list aspect and addressing field name enforcement/warning elsewhere.

        Show
        Scott Stults added a comment - Right. The reason I ask is, aliases would be a decent alternative for folks who need field names like "1_product" should we start enforcing recommended field name restrictions in the schema. But there are other places in the response where we need to render a field name, like in facets and highlighting (I don't know the details of how that's done), so allowing funky aliases could have side effects. There are other tickets specifically about leading digits in field names ( SOLR-7070 , SOLR-3407 ), so I like Erik Hatcher 's suggestion of keeping this focused on the exclusion list aspect and addressing field name enforcement/warning elsewhere.
        Hide
        Demian Katz added a comment -

        I just took another look at this and discovered that the patch no longer applies cleanly against the latest code. Looks like it was broken by r1714994. Any chance somebody could generate an updated patch? I don't think there are any major conflicts here – just enough to break the patch.

        Show
        Demian Katz added a comment - I just took another look at this and discovered that the patch no longer applies cleanly against the latest code. Looks like it was broken by r1714994. Any chance somebody could generate an updated patch? I don't think there are any major conflicts here – just enough to break the patch.
        Hide
        Jan Høydahl added a comment -

        I saw there were a lot of changes. I can help facilitate this for inclusion, if someone would provide a patch which applies to trunk or 5x. And also +1 to skip funky field name enforcing in this issue, can be introduced later.

        Show
        Jan Høydahl added a comment - I saw there were a lot of changes. I can help facilitate this for inclusion, if someone would provide a patch which applies to trunk or 5x. And also +1 to skip funky field name enforcing in this issue, can be introduced later.

          People

          • Assignee:
            Unassigned
            Reporter:
            Luca Cavanna
          • Votes:
            31 Vote for this issue
            Watchers:
            34 Start watching this issue

            Dates

            • Created:
              Updated:

              Development