Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11741

Offline training mode for schema guessing

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Our data driven schema guessing doesn't work under many situations. For example, if the first document has a field with value "0", it is guessed as Long and subsequent fields with "0.0" are rejected. Similarly, if the same field had alphanumeric contents for a latter document, those documents are rejected. Also, single vs. multi valued field guessing is not ideal.

      Proposing an offline training mode where Solr accepts bunch of documents and returns a guessed schema (without indexing). This schema can then be used for actual indexing. I think the original idea is from Hoss.

      I think initial implementation can be based on an UpdateRequestProcessor. We can hash out the API soon, as we go along.

      Attachments

        1. RuleForMostAccomodatingField.png
          14 kB
          Abhishek Kumar Singh
        2. screenshot-1.png
          17 kB
          Abhishek Kumar Singh
        3. screenshot-3.png
          15 kB
          Abhishek Kumar Singh
        4. SOLR-11741.patch
          1019 kB
          Abhishek Kumar Singh
        5. SOLR-11741.patch
          1019 kB
          Abhishek Kumar Singh
        6. SOLR-11741.patch
          77 kB
          Abhishek Kumar Singh
        7. SOLR-11741-temp.patch
          10 kB
          Abhishek Kumar Singh

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ichattopadhyaya Ishan Chattopadhyaya Assign to me
            ichattopadhyaya Ishan Chattopadhyaya

            Dates

              Created:
              Updated:

              Slack

                Issue deployment