Solr
  1. Solr
  2. SOLR-4538

DateMath strings truncated to 32 chars by lucene qparser in simple term queries

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.3
    • Component/s: search
    • Labels:
      None

      Description

      When using the "lucene" QParser, term queries on date fields cause the field value portion of the query clause to be truncated at 32 characters. This can cause visible errors about invalid date math expressions, or may result in silently returning incorrect results if the truncation results in a valid (but incomplete) date math expression.

      For example...

      Example...
      
      last_modified:"2013-03-08T00:46:15Z/DAY+6MONTHS+3DAYS"
      ...truncates to...
      last_modified:"2013-03-08T00:46:15Z/DAY+6MONTHS"
      (silently incorrect)
      
      foo_tdt:"2013-03-10T08:00:00.1Z/MINUTE+6MONTHS"
      ...truncates to...
      foo_tdt:2013-03-10T08:00:00.1Z/MINUTE+6M
      ("Invalid Date Math String" error msg)
      

      This problem does not affect range queries, or queries using either the term or field qparsers...

      Examples that work fine even though the math is longer then 32 chars...
      
      foo_tdt:[2013-03-10T08:00:00.1Z/MINUTE+6MONTHS TO 2013-03-10T08:00:00.1Z/MINUTE+6MONTHS]
      
      {!term f=bday}1976-07-04T12:08:56.45Z/SECOND+235MILLIS
      foo_tdt:"2013-03-10T08:00:00.1Z/MINUTE+6MONTHS"
      
      Original problem report

      DateMathParser doesn't work correctly.
      http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/util/DateMathParser.html

      fq=last_modified:"2013-03-08T00:46:15Z/DAY+6MONTHS+3DAYS"

      expected; last_modified:2013-09-11T00:00:00Z
      actual; last_modified:2013-09-08T00:00:00Z

      1. SOLR-4538_test.patch
        3 kB
        Hoss Man
      2. SOLR-4538.patch
        7 kB
        Hoss Man
      3. SOLR-4538.patch
        2 kB
        Yonik Seeley

        Activity

        Hide
        Hoss Man added a comment -

        Weird.

        This appears to be a query parsing issue, not a DateMathParser issue.

        Works...

        http://localhost:8983/solr/select?q={!field%20f=foo_dt}2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query
        --> <str name="parsedquery">foo_dt:2013-09-11T00:00:00Z</str>
        
        http://localhost:8983/solr/select?q={!term%20f=foo_dt}2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query
        --> <str name="parsedquery">foo_dt:2013-09-11T00:00:00Z</str>
        
        
        http://localhost:8983/solr/select?q=*:*&rows=0&debug=query&facet=true&facet.range=foo_dt&facet.range.start=2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&facet.range.end=2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&facet.range.gap=%2B1DAY
        --> <date name="start">2013-09-11T00:00:00Z</date>
        --> <date name="end">2013-09-11T00:00:00Z</date>
        

        Fails...

        http://localhost:8983/solr/select?q=foo_dt:2013-03-08T00\:46\:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query
        --> <str name="parsedquery">foo_dt:2013-09-08T00:00:00Z</str>
        
        http://localhost:8983/solr/select?q=foo_dt:%222013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS%22&debug=query
        --> <str name="parsedquery">foo_dt:2013-09-08T00:00:00Z</str>
        

        (all URLs using Solr 4.1.0 with example configs)

        Show
        Hoss Man added a comment - Weird. This appears to be a query parsing issue, not a DateMathParser issue. Works... http://localhost:8983/solr/select?q={!field%20f=foo_dt}2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query --> <str name="parsedquery">foo_dt:2013-09-11T00:00:00Z</str> http://localhost:8983/solr/select?q={!term%20f=foo_dt}2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query --> <str name="parsedquery">foo_dt:2013-09-11T00:00:00Z</str> http://localhost:8983/solr/select?q=*:*&rows=0&debug=query&facet=true&facet.range=foo_dt&facet.range.start=2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&facet.range.end=2013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS&facet.range.gap=%2B1DAY --> <date name="start">2013-09-11T00:00:00Z</date> --> <date name="end">2013-09-11T00:00:00Z</date> Fails... http://localhost:8983/solr/select?q=foo_dt:2013-03-08T00\:46\:15Z/DAY%2B6MONTHS%2B3DAYS&debug=query --> <str name="parsedquery">foo_dt:2013-09-08T00:00:00Z</str> http://localhost:8983/solr/select?q=foo_dt:%222013-03-08T00:46:15Z/DAY%2B6MONTHS%2B3DAYS%22&debug=query --> <str name="parsedquery">foo_dt:2013-09-08T00:00:00Z</str> (all URLs using Solr 4.1.0 with example configs)
        Hide
        Hoss Man added a comment -

        This is so weird. Somehow the query parser is deciding to ignore the last N chars of the string, where N can vary depending on the string. in the examples posted to this issue so far, it just so happens that N winds up being the exact length of the last clause in the math, but in other cases you'll get an invalid date math expression which causes an error..

        http://localhost:8983/solr/select?q=foo_tdt:%222013-03-10T08:00:00Z/MONTH%2B6MONTHS%22
        Invalid Date Math String:'2013-03-10T08:00:00Z/MONTH+6MONT'
        
        http://localhost:8983/solr/select?q=foooooo_tdt:%222013-03-10T08:00:00Z/MONTH%2B6MONTHS%22
        Invalid Date Math String:'2013-03-10T08:00:00Z/MONTH+6MONT'
        
        http://localhost:8983/solr/select?debug=query&q=foo_tdt:2013-03-10T08\:00\:00.1Z/MONTH%2B6MONTHS%2B1DAY
        Invalid Date Math String:'2013-03-10T08:00:00.1Z/MONTH+6MO'
        
        http://localhost:8983/solr/select?debug=query&q=foo_tdt:2013-03-10T08\:00\:00.123Z/MONTH%2B6MONTHS%2B1DAY
        Invalid Date Math String:'2013-03-10T08:00:00.123Z/MONTH+6'
        
        http://localhost:8983/solr/select?debug=query&q=foo_tdt:%222013-03-10T08:00:00Z/MINUTE%2B6MONTHS%22
        Invalid Date Math String:'2013-03-10T08:00:00Z/MINUTE+6MON'
        
        http://localhost:8983/solr/select?debug=query&q=foo_tdt:%222013-03-10T08:00:00.1Z/MINUTE%2B6MONTHS%22
        Invalid Date Math String:'2013-03-10T08:00:00.1Z/MINUTE+6M'
        

        (all URLs using Solr 4.1.0 with example configs)

        Show
        Hoss Man added a comment - This is so weird. Somehow the query parser is deciding to ignore the last N chars of the string, where N can vary depending on the string. in the examples posted to this issue so far, it just so happens that N winds up being the exact length of the last clause in the math, but in other cases you'll get an invalid date math expression which causes an error.. http://localhost:8983/solr/select?q=foo_tdt:%222013-03-10T08:00:00Z/MONTH%2B6MONTHS%22 Invalid Date Math String:'2013-03-10T08:00:00Z/MONTH+6MONT' http://localhost:8983/solr/select?q=foooooo_tdt:%222013-03-10T08:00:00Z/MONTH%2B6MONTHS%22 Invalid Date Math String:'2013-03-10T08:00:00Z/MONTH+6MONT' http://localhost:8983/solr/select?debug=query&q=foo_tdt:2013-03-10T08\:00\:00.1Z/MONTH%2B6MONTHS%2B1DAY Invalid Date Math String:'2013-03-10T08:00:00.1Z/MONTH+6MO' http://localhost:8983/solr/select?debug=query&q=foo_tdt:2013-03-10T08\:00\:00.123Z/MONTH%2B6MONTHS%2B1DAY Invalid Date Math String:'2013-03-10T08:00:00.123Z/MONTH+6' http://localhost:8983/solr/select?debug=query&q=foo_tdt:%222013-03-10T08:00:00Z/MINUTE%2B6MONTHS%22 Invalid Date Math String:'2013-03-10T08:00:00Z/MINUTE+6MON' http://localhost:8983/solr/select?debug=query&q=foo_tdt:%222013-03-10T08:00:00.1Z/MINUTE%2B6MONTHS%22 Invalid Date Math String:'2013-03-10T08:00:00.1Z/MINUTE+6M' (all URLs using Solr 4.1.0 with example configs)
        Hide
        Hoss Man added a comment -

        still no idea where the problem is, but here's a patch with tests showing off some of hte various problems

        Show
        Hoss Man added a comment - still no idea where the problem is, but here's a patch with tests showing off some of hte various problems
        Hide
        Yonik Seeley added a comment -

        I do know at this point that the difference is due to the query parser electing to use newFieldQuery() (which goes through the analyzer) rather than FieldType.getFieldQuery().
        Not yet sure what's the issue with the analyzer though.

        Show
        Yonik Seeley added a comment - I do know at this point that the difference is due to the query parser electing to use newFieldQuery() (which goes through the analyzer) rather than FieldType.getFieldQuery(). Not yet sure what's the issue with the analyzer though.
        Hide
        Yonik Seeley added a comment -

        Here's a patch that should fix the issue. The TrieTokenizer previously assumed a max buffer length of 32 chars. I just changed it to 128 for date fields.

        Show
        Yonik Seeley added a comment - Here's a patch that should fix the issue. The TrieTokenizer previously assumed a max buffer length of 32 chars. I just changed it to 128 for date fields.
        Hide
        Hoss Man added a comment -

        I just changed it to 128 for date fields.

        I don't like the idea of kicking the can from 32 chars to 128 chars ... particularly since we've already seen from the original report of this issue how simple truncation can lead to silently incorrect results instead of any clear and obvious error message.

        it seems simple enough to just grow the buffer if the input is long then expected – it doesn't add any overhead to the common case of "normal" date (math) strings.

        New patch includes:

        • my previous tests
        • yonik's test
        • yonik's change to default the buffer size to 128 if it's a DATE field
        • a change to grow the buffer if it's not long enough
        • a test demonstrating date math strings longer then 128 chars
        Show
        Hoss Man added a comment - I just changed it to 128 for date fields. I don't like the idea of kicking the can from 32 chars to 128 chars ... particularly since we've already seen from the original report of this issue how simple truncation can lead to silently incorrect results instead of any clear and obvious error message. it seems simple enough to just grow the buffer if the input is long then expected – it doesn't add any overhead to the common case of "normal" date (math) strings. New patch includes: my previous tests yonik's test yonik's change to default the buffer size to 128 if it's a DATE field a change to grow the buffer if it's not long enough a test demonstrating date math strings longer then 128 chars
        Hide
        Hoss Man added a comment -

        updated summary & description know that hte scope & root issue is known.

        Show
        Hoss Man added a comment - updated summary & description know that hte scope & root issue is known.
        Hide
        Uwe Schindler added a comment - - edited

        Hoss: The term attribute is misused in the patch (as it is populated in ctor/reset()). It is also shared with the inner NumericTokenStream, which makes me nervous in combination with your changes. I just want to be sure, that the termattribute's contents are not used after thee ctor finishes, so incrementToken() can do what it wants with the attribute.

        You should maybe add a comment or use a separate growable instance like StringBuilder to get the chars and not misuse the term attribute.

        EDIT: The patch is fine, it uses a completely private term attribute instance, just to have a growable "charsref-like" instance. I would still prefer to use a StringBuilder for that, which can also be reset to zero length

        Show
        Uwe Schindler added a comment - - edited Hoss: The term attribute is misused in the patch (as it is populated in ctor/reset()). It is also shared with the inner NumericTokenStream, which makes me nervous in combination with your changes. I just want to be sure, that the termattribute's contents are not used after thee ctor finishes, so incrementToken() can do what it wants with the attribute. You should maybe add a comment or use a separate growable instance like StringBuilder to get the chars and not misuse the term attribute. EDIT: The patch is fine, it uses a completely private term attribute instance, just to have a growable "charsref-like" instance. I would still prefer to use a StringBuilder for that, which can also be reset to zero length
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1455269

        SOLR-4538: Date Math expressions were being truncated to 32 characters when used in field:value queries in the lucene QParser

        Show
        Commit Tag Bot added a comment - [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1455269 SOLR-4538 : Date Math expressions were being truncated to 32 characters when used in field:value queries in the lucene QParser
        Hide
        Hoss Man added a comment -

        Committed revision 1455269.
        Committed revision 1455348.
        h

        Show
        Hoss Man added a comment - Committed revision 1455269. Committed revision 1455348. h
        Hide
        Uwe Schindler added a comment -

        Thanks, all is fine! Although I don't like the pattern-misuse of CharTermAttribute, I am fine with the fix.

        Show
        Uwe Schindler added a comment - Thanks, all is fine! Although I don't like the pattern-misuse of CharTermAttribute, I am fine with the fix.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1455348

        SOLR-4538: Date Math expressions were being truncated to 32 characters when used in field:value queries in the lucene QParser (merge r1455269)

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1455348 SOLR-4538 : Date Math expressions were being truncated to 32 characters when used in field:value queries in the lucene QParser (merge r1455269)
        Hide
        Hoss Man added a comment -

        Although I don't like the pattern-misuse of CharTermAttribute

        as mentioned in IRC: the patch initially started trying to be consistent with how KeywordTokenizer works, but since NumericTokenStream doesn't support CharTermAttribute i made it private instead of using addAttribute().

        If anyone feels strongly about changing the implementation to not using CharTermAttributeImpl (and replace with StringBuilder, io-utils reader2String, whatever..) feel free, but i just choose to leave CharTermAttributeImpl in here because: a) i'd already written the code & committed it to trunk before i even saw uwe's comment; b) it keeps the buffer resizing abstracted away out of the tokenizer and consistent with the behaviour of other tokenizers.

        Show
        Hoss Man added a comment - Although I don't like the pattern-misuse of CharTermAttribute as mentioned in IRC: the patch initially started trying to be consistent with how KeywordTokenizer works, but since NumericTokenStream doesn't support CharTermAttribute i made it private instead of using addAttribute(). If anyone feels strongly about changing the implementation to not using CharTermAttributeImpl (and replace with StringBuilder, io-utils reader2String, whatever..) feel free, but i just choose to leave CharTermAttributeImpl in here because: a) i'd already written the code & committed it to trunk before i even saw uwe's comment; b) it keeps the buffer resizing abstracted away out of the tokenizer and consistent with the behaviour of other tokenizers.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Hoss Man
            Reporter:
            Minoru Osuka
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development