Solr
  1. Solr
  2. SOLR-8050

Partial update on document with multivalued date field fails

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.4, 6.0
    • Component/s: clients - java, SolrJ
    • Labels:
      None
    • Environment:

      embedded solr
      java 1.7
      win

      Description

      When updating a document with multivalued date field Solr throws a exception
      like: org.apache.solr.common.SolrException: Invalid Date String:'Mon Sep 14 01:48:38 CEST 2015'
      even if the update document doesn't contain any datefield.

      See following code snippet to reproduce
      1. create a doc with multivalued date field (here dynamic field _dts)
      SolrInputDocument doc = new SolrInputDocument();
      String id = Long.toString(System.currentTimeMillis());
      System.out.println("testUpdate: adding test document to solr ID=" + id);
      doc.addField(CollectionSchema.id.name(), id);
      doc.addField(CollectionSchema.title.name(), "Lorem ipsum");
      doc.addField(CollectionSchema.host_s.name(), "yacy.net");
      doc.addField(CollectionSchema.text_t.name(), "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.");
      doc.addField(CollectionSchema.dates_in_content_dts.name(), new Date());

      solr.add(doc);
      solr.commit(true);

      2. update any field on this doc via partial update
      SolrInputDocument sid = new SolrInputDocument();
      sid.addField(CollectionSchema.id.name(), doc.getFieldValue(CollectionSchema.id.name()));
      sid.addField(CollectionSchema.host_s.name(), "yacy.yacy");
      solr.update(sid);
      solr.commit(true);

      Result
      Caused by: org.apache.solr.common.SolrException: Invalid Date String:'Mon Sep 14 01:48:38 CEST 2015'
      at org.apache.solr.util.DateFormatUtil.parseMath(DateFormatUtil.java:87)
      at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:473)
      at org.apache.solr.schema.TrieField.createFields(TrieField.java:715)
      at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
      at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
      at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:83)
      at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:237)
      at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
      at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
      at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
      at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
      at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
      at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:706)
      at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
      at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
      at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:207)
      at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
      at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
      at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
      at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
      at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
      at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
      at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:179)
      at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
      at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:174)
      at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:191)

      P.S. the line "solr.update" takes care to create a partial update document, with proper

      {"set":[fieldname:value]}
      1. screenshot-1.png
        26 kB
        Burkhard Buelte
      2. SOLR-8050.patch
        11 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Burkhard Buelte added a comment -

        To my research the value.toString
        line 715 in TrieField.createField (see screenshot-1.png) is the cause, where value is of type Date and toString not expected DateString format.

        Show
        Burkhard Buelte added a comment - To my research the value.toString line 715 in TrieField.createField (see screenshot-1.png) is the cause, where value is of type Date and toString not expected DateString format.
        Hide
        Luc Vanlerberghe added a comment -

        I have the same problem in solr-5.1.0 and was able to create a simple test demonstrating the problem in trunk.

        I'll upload a patch/pull-request with the failing testcase shortly

        Show
        Luc Vanlerberghe added a comment - I have the same problem in solr-5.1.0 and was able to create a simple test demonstrating the problem in trunk. I'll upload a patch/pull-request with the failing testcase shortly
        Hide
        ASF GitHub Bot added a comment -

        GitHub user LucVL opened a pull request:

        https://github.com/apache/lucene-solr/pull/202

        SOLR-8050: Test case demonstrating the bug

        To run just this testcase, use:
        ```sh
        ant test -Dtests.class=org.apache.solr.update.processor.AtomicUpdatesTest -Dtests.method=testMultipleTDateValues
        ```

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/LucVL/lucene-solr SOLR-8050

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/lucene-solr/pull/202.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #202


        commit bb7b239eb25a8826e9767edc52e970a8b2aab405
        Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com>
        Date: 2015-10-05T09:58:56Z

        Test case demonstrating the bug


        Show
        ASF GitHub Bot added a comment - GitHub user LucVL opened a pull request: https://github.com/apache/lucene-solr/pull/202 SOLR-8050 : Test case demonstrating the bug To run just this testcase, use: ```sh ant test -Dtests.class=org.apache.solr.update.processor.AtomicUpdatesTest -Dtests.method=testMultipleTDateValues ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/LucVL/lucene-solr SOLR-8050 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #202 commit bb7b239eb25a8826e9767edc52e970a8b2aab405 Author: Luc Vanlerberghe <luc.vanlerberghe@bvdinfo.com> Date: 2015-10-05T09:58:56Z Test case demonstrating the bug
        Hide
        Luc Vanlerberghe added a comment - - edited

        As I mentioned in the code comments, not only is TrieField.createField not able to make sense of the output of Date.toString() as opposed to a correctly formed UTC date/time string (like "1986-01-01T00:00:00Z"), but the value the Date object contains depends on the locale the test is run in, so there must be an error even earlier in the update logic while decoding the values in the Lucene Document...

        Show
        Luc Vanlerberghe added a comment - - edited As I mentioned in the code comments, not only is TrieField.createField not able to make sense of the output of Date.toString() as opposed to a correctly formed UTC date/time string (like "1986-01-01T00:00:00Z"), but the value the Date object contains depends on the locale the test is run in, so there must be an error even earlier in the update logic while decoding the values in the Lucene Document...
        Hide
        Luc Vanlerberghe added a comment -

        A temporary workaround seems to be to include the data of the multi-valued tdate field in the update request to prevent Solr trying to decode the existing values...

        In the patch I attached earlier, I now used

            doc.setField("multiTDate_tdtdv", new String[]{"1986-01-01T00:00:00Z", "1988-01-01T00:00:00Z", "1980-01-01T00:00:00Z"});
        

        to construct the original document and added

            doc.setField("multiTDate_tdtdv", ImmutableMap.of("set", "1986-01-01T00:00:00Z")); 
            doc.addField("multiTDate_tdtdv", ImmutableMap.of("set", "1988-01-01T00:00:00Z")); 
            doc.addField("multiTDate_tdtdv", ImmutableMap.of("set", "1980-01-01T00:00:00Z")); 
        

        to the update request and the test passes

        Show
        Luc Vanlerberghe added a comment - A temporary workaround seems to be to include the data of the multi-valued tdate field in the update request to prevent Solr trying to decode the existing values... In the patch I attached earlier, I now used doc.setField( "multiTDate_tdtdv" , new String []{ "1986-01-01T00:00:00Z" , "1988-01-01T00:00:00Z" , "1980-01-01T00:00:00Z" }); to construct the original document and added doc.setField( "multiTDate_tdtdv" , ImmutableMap.of( "set" , "1986-01-01T00:00:00Z" )); doc.addField( "multiTDate_tdtdv" , ImmutableMap.of( "set" , "1988-01-01T00:00:00Z" )); doc.addField( "multiTDate_tdtdv" , ImmutableMap.of( "set" , "1980-01-01T00:00:00Z" )); to the update request and the test passes
        Hide
        Luc Vanlerberghe added a comment -

        Contrary to the components list in the original report, this is not a SolrJ issue but a bug in the update logic in solr core itself.

        @reger: I didn't submit the original report so I cannot update it. Could you update it to increase the likelihood that a committer picks it up?
        I'm having a go at it, but I'm not familiar with the internals of solr atomic updates...

        Show
        Luc Vanlerberghe added a comment - Contrary to the components list in the original report, this is not a SolrJ issue but a bug in the update logic in solr core itself. @reger: I didn't submit the original report so I cannot update it. Could you update it to increase the likelihood that a committer picks it up? I'm having a go at it, but I'm not familiar with the internals of solr atomic updates...
        Hide
        Luc Vanlerberghe added a comment -

        I managed to fix it (at least it seems to be ok now without breaking any other tests)

        The Date object did contain a correct value, but Date.toString() confusingly uses the current Locale (See Java SE 8 Date and Time, ... For example, java.util.Date represents an instant on the timeline—a wrapper around the number of milli-seconds since the UNIX epoch—but if you call toString(), the result suggests that it has a time zone, causing confusion among developers.

        The bug was introduced more than two years ago when adding support for multivalued docvalues.
        The old code calls readableToIndexed on value.ToString() which works for most TrieField types, expect when value is a Date object obtained from reading the old value during an update.

        Since a little higher the code already construct a correct StorableField, I changed it to use storeableToIndexed instead.

        Show
        Luc Vanlerberghe added a comment - I managed to fix it (at least it seems to be ok now without breaking any other tests) The Date object did contain a correct value, but Date.toString() confusingly uses the current Locale (See Java SE 8 Date and Time , ... For example, java.util.Date represents an instant on the timeline—a wrapper around the number of milli-seconds since the UNIX epoch—but if you call toString(), the result suggests that it has a time zone, causing confusion among developers. The bug was introduced more than two years ago when adding support for multivalued docvalues. The old code calls readableToIndexed on value.ToString() which works for most TrieField types, expect when value is a Date object obtained from reading the old value during an update. Since a little higher the code already construct a correct StorableField, I changed it to use storeableToIndexed instead.
        Hide
        Luc Vanlerberghe added a comment -

        P.s.: I updated the pull request so the original link to the patch (https://github.com/apache/lucene-solr/pull/202.patch) now includes the fix.

        Show
        Luc Vanlerberghe added a comment - P.s.: I updated the pull request so the original link to the patch ( https://github.com/apache/lucene-solr/pull/202.patch ) now includes the fix.
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Luc. Here's a better randomized test which catches and fixes the problem reported here (parse exception) and also fixes another bug described below.

        When you add another date X as string to a multi-valued date field containing an existing date, a subsequent remove of X fails to delete the date. The reason is that it finds the old document (read from transaction log) containing a list with the first date as a java.lang.Date object and the second date as a String.

        This patch ensures that both add and set operations convert the incoming value to a native type using FieldType.toNativeType so that the transaction log stores a proper native object instead of a string.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Luc. Here's a better randomized test which catches and fixes the problem reported here (parse exception) and also fixes another bug described below. When you add another date X as string to a multi-valued date field containing an existing date, a subsequent remove of X fails to delete the date. The reason is that it finds the old document (read from transaction log) containing a list with the first date as a java.lang.Date object and the second date as a String. This patch ensures that both add and set operations convert the incoming value to a native type using FieldType.toNativeType so that the transaction log stores a proper native object instead of a string.
        Hide
        ASF subversion and git services added a comment -

        Commit 1709042 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1709042 ]

        SOLR-8050: Partial update on document with multivalued date field fails to parse date and can also fail to remove dates in some cases.

        This closes #202

        Show
        ASF subversion and git services added a comment - Commit 1709042 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1709042 ] SOLR-8050 : Partial update on document with multivalued date field fails to parse date and can also fail to remove dates in some cases. This closes #202
        Hide
        ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/lucene-solr/pull/202

        Show
        ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/202
        Hide
        ASF subversion and git services added a comment -

        Commit 1709053 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1709053 ]

        SOLR-8050: Partial update on document with multivalued date field fails to parse date and can also fail to remove dates in some cases.

        Show
        ASF subversion and git services added a comment - Commit 1709053 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1709053 ] SOLR-8050 : Partial update on document with multivalued date field fails to parse date and can also fail to remove dates in some cases.
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Burkhard and Luc!

        Show
        Shalin Shekhar Mangar added a comment - Thanks Burkhard and Luc!
        Hide
        ASF GitHub Bot added a comment -

        Github user LucVL commented on the pull request:

        https://github.com/apache/lucene-solr/pull/210#issuecomment-158335822

        There’s good documentation on combining git pull-requests with jira issues on the apache wiki (there’s a bot for that)
        https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow

        Basically:

        • Create the JIRA issue first
        • Create the pull request with the JIRA issue in the title. This will cause a bot to pick it up and link the two. A comment will appear in the JIRA issue that includes a link to an equivalent patch for non-git users.
        • When the JIRA issue is closed, the committer should include “This closes #PP”. This causes the bot to close the PR as well

        For an example, see https://issues.apache.org/jira/browse/SOLR-8050

        Luc

        From: smartprix notifications@github.com
        Sent: donderdag 19 november 2015 15:27
        To: apache/lucene-solr
        Subject: Re: [lucene-solr] WordDelimiterFilter - Don't split words marked as keyword (#210)

        The behavior should is now configurable. I have updated the pull request to reflect that. A new attribute "splitKeywordTokens" which is false by default for lucene >= 6.0 and true otherwise.

        Does lucene not accept pull requests from github? Should I create it on JIRA?


        Reply to this email directly or view it on GitHub<https://github.com/apache/lucene-solr/pull/210#issuecomment-158072439>.

        Show
        ASF GitHub Bot added a comment - Github user LucVL commented on the pull request: https://github.com/apache/lucene-solr/pull/210#issuecomment-158335822 There’s good documentation on combining git pull-requests with jira issues on the apache wiki (there’s a bot for that) https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow Basically: Create the JIRA issue first Create the pull request with the JIRA issue in the title. This will cause a bot to pick it up and link the two. A comment will appear in the JIRA issue that includes a link to an equivalent patch for non-git users. When the JIRA issue is closed, the committer should include “This closes #PP”. This causes the bot to close the PR as well For an example, see https://issues.apache.org/jira/browse/SOLR-8050 Luc From: smartprix notifications@github.com Sent: donderdag 19 november 2015 15:27 To: apache/lucene-solr Subject: Re: [lucene-solr] WordDelimiterFilter - Don't split words marked as keyword (#210) The behavior should is now configurable. I have updated the pull request to reflect that. A new attribute "splitKeywordTokens" which is false by default for lucene >= 6.0 and true otherwise. Does lucene not accept pull requests from github? Should I create it on JIRA? — Reply to this email directly or view it on GitHub< https://github.com/apache/lucene-solr/pull/210#issuecomment-158072439 >.

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Burkhard Buelte
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development