Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6939

UpdateProcessor to buffer & sample documents and then batch create neccessary fields

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      spun off of an idea in SOLR-6016...

      We could add a SchemaGeneratorHandler which would generate the "best" schema.

      You wouldn't need/want a handler for this – you'd just need an UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would look at the datatypes of the fields in each document w/o doing any indexing and pick the least common denominator.

      So then you'd have a chain with all of your normal update processors including the TypeMapping processors configured with the preccedence orders and locales and format strings you want – and at the end you'd have your BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs, study their values, and throw them away – until a commit comes along, at which point it does all the under the hood schema field addition calls.

      So to learn, you'd send docs using whatever handler/format you wnat (json, xml, extraction, etc...) with an update.chain=my.datatype.learning.processor.chain request param ... and once you've sent a bunch and giving it a lot of variety to see, then you send a commit so it creates the schema and then you re-index your docs for real w/o that special chain.

      ...not mentioned originally: this factory could also default to assuming fields should be single valued, unless/until it sees multiple values in a doc that it samples.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hossman Chris M. Hostetter
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: