Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6250

Schemaless parsing does not work on a consistent schema

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Schema and Analysis
    • None

    Description

      See this comment (https://issues.apache.org/jira/browse/SOLR-6137?focusedCommentId=14044366&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044366), reproduced here:

      One small issue I noticed is that there is a race between parsing and schema addition. The AddSchemaFieldsUpdateProcessor handles this by only working on a fixed schema, so the schema doesn't change underneath it. If it decides on a schema addition and that fails (because another addition beat it), it will grab the latest schema and retry. But the parsers don't do that so the core's schema can change in the middle of parsing. It may make sense to defend against that by moving the retry code from the AddSchemaFieldsUpdateProcessor to some processor that runs before all the parsers. The downside is if the schema addition fails, you have to rerun all the parsers, but that may be a minor concern.

      This may not actually matter. Consider the case tested at the end of the test: two documents are simultaneously inserted with the same field having a Long and Date value. Assume the Date wins the schema "race" and is updated first. While parsing the Long, each parser may see the schema as having a date field or no field. If a valid parser (that is, one that can modify the field value) sees a date field, it won't do any modifications because shouldMutate will fail, leaving the object in whatever state the serializer left it (either Long or String). If it sees no field, it will mutate the object to create a Long object. In either case, we should get an error at the point we actually create the lucene document, because neither a Long nor String-representation-of-a-long can be stored in a Date field. This is pretty difficult to reason about though.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gchanan Gregory Chanan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: