Solr
  1. Solr
  2. SOLR-3916

fl parsing is sensitive to newlines at the end of field names

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-BETA
    • Fix Version/s: 4.0, 4.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      As reported by giovanni.bricconi on the user list, there is a bug in "fl" parsing that causes solr to get confused when a field name is followed by a newline character – eg: in a requestHandler default like...

      <!-- newlines showing using "$" -->$
      <str name="fl">$
         sku,store_slug$  
      </str>$
      

      ...this results in solr assuming it should use function parsing to evaluate the field name, which can cause missleading errors if the field name can't be used in a function (eg: "can not use FieldCache on multivalued field: store_slug")

        Activity

        Hide
        Yonik Seeley added a comment -

        A lot of things could be confused by this - I don't consider this as much a bug in "fl" parsing as I do a bug in default handling configured via XML.

        Show
        Yonik Seeley added a comment - A lot of things could be confused by this - I don't consider this as much a bug in "fl" parsing as I do a bug in default handling configured via XML.
        Hide
        Hoss Man added a comment -

        blaming this on xml parsing doesn't really seem productive, espeicially since you can trivially reproduce the problem from a remote client (or even with the solr admin UI) if you accidentally cut/paste tab characters or newlines mixed in your list of fields.

        The fix is trivial: in some places the fl parser was naivly only checking character equality with ' ' instead of using Character.isWhitespace().

        patch with tests attached, running all tests and precommit on my computer now for further vetting.

        Show
        Hoss Man added a comment - blaming this on xml parsing doesn't really seem productive, espeicially since you can trivially reproduce the problem from a remote client (or even with the solr admin UI) if you accidentally cut/paste tab characters or newlines mixed in your list of fields. The fix is trivial: in some places the fl parser was naivly only checking character equality with ' ' instead of using Character.isWhitespace(). patch with tests attached, running all tests and precommit on my computer now for further vetting.
        Hide
        Yonik Seeley added a comment -

        blaming this on xml parsing doesn't really seem productive

        ?

        I'm suggesting this could be a much wider problem than just "fl", which would seem to be a very productive line of conversation.

        In the context of grabbing defaults from XML, we should consider trimming whitespace in the process of generating those defaults (rather than later).
        The only downside is if certain parameters could reasonably expect to want leading/trailing whitespace. If we want to support that, we could introduce a new type called <verbatim> or something.

        Show
        Yonik Seeley added a comment - blaming this on xml parsing doesn't really seem productive ? I'm suggesting this could be a much wider problem than just "fl", which would seem to be a very productive line of conversation. In the context of grabbing defaults from XML, we should consider trimming whitespace in the process of generating those defaults (rather than later). The only downside is if certain parameters could reasonably expect to want leading/trailing whitespace. If we want to support that, we could introduce a new type called <verbatim> or something.
        Hide
        Hoss Man added a comment -

        In the context of grabbing defaults from XML, we should consider trimming whitespace in the process of generating those defaults (rather than later).

        ...which can be pursued in another issue if you wish to do so.

        as i've already said: this current bug is not specific to getting defaults from the xml file, nor is it specific to leading/trailing whitespace that could be easily trimmed in a generic way. Having a tab or newline or any whitespace character other then a simple ' ' between the names of fields in an "fl" string causes this annoying little bug – regardless of whether it comes from xml config, or a remote client.

        If you look at the patch, you can see my point quite easily: when parsing the fl, ReturnFields is naively only treating the ' ' character as whitespace and not recognizing any other whitespace characters that might exist between field names.

        Show
        Hoss Man added a comment - In the context of grabbing defaults from XML, we should consider trimming whitespace in the process of generating those defaults (rather than later). ...which can be pursued in another issue if you wish to do so. as i've already said: this current bug is not specific to getting defaults from the xml file, nor is it specific to leading/trailing whitespace that could be easily trimmed in a generic way. Having a tab or newline or any whitespace character other then a simple ' ' between the names of fields in an "fl" string causes this annoying little bug – regardless of whether it comes from xml config, or a remote client. If you look at the patch, you can see my point quite easily: when parsing the fl, ReturnFields is naively only treating the ' ' character as whitespace and not recognizing any other whitespace characters that might exist between field names.
        Hide
        Hoss Man added a comment -

        Committed revision 1394836. - trunk
        Committed revision 1394843. - 4x
        Committed revision 1394844. - 4.0

        Show
        Hoss Man added a comment - Committed revision 1394836. - trunk Committed revision 1394843. - 4x Committed revision 1394844. - 4.0
        Hide
        Yonik Seeley added a comment -

        If you look at the patch, you can see my point quite easily: when parsing the fl, ReturnFields is naively only treating the ' ' character as whitespace and not recognizing any other whitespace characters that might exist between field names.

        I had looked at the patch, and still didn't consider not checking for other types of whitespace between fieldnames a bug since we never promised to support that. If you look at the code that was used before ReturnFields, it also used a pattern that only split on comma or space. The previous code did handle leading/trailing whitespace via using String.trim() first though.

        Show
        Yonik Seeley added a comment - If you look at the patch, you can see my point quite easily: when parsing the fl, ReturnFields is naively only treating the ' ' character as whitespace and not recognizing any other whitespace characters that might exist between field names. I had looked at the patch, and still didn't consider not checking for other types of whitespace between fieldnames a bug since we never promised to support that. If you look at the code that was used before ReturnFields, it also used a pattern that only split on comma or space. The previous code did handle leading/trailing whitespace via using String.trim() first though.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1394843

        SOLR-3916: Fixed whitespace bug in parsing the fl param (merge r1394836)

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1394843 SOLR-3916 : Fixed whitespace bug in parsing the fl param (merge r1394836)
        Hide
        Scott Stults added a comment -

        Should there be a @Test annotation on testWhitespace()? I think some test runners might do introspection, but all of the other tests in that class use the annotation and was wondering if that was just an oversight or had specific meaning.

        Show
        Scott Stults added a comment - Should there be a @Test annotation on testWhitespace() ? I think some test runners might do introspection, but all of the other tests in that class use the annotation and was wondering if that was just an oversight or had specific meaning.
        Hide
        Erick Erickson added a comment -

        Scott:

        Whether or not the @Test annotation is used is pretty arbitrary, JUnit picks up any method that starts with 'test' so this test gets run. Personally I prefer the annotation, but there's no standard....

        Show
        Erick Erickson added a comment - Scott: Whether or not the @Test annotation is used is pretty arbitrary, JUnit picks up any method that starts with 'test' so this test gets run. Personally I prefer the annotation, but there's no standard....

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development