[SOLR-11741] Offline training mode for schema guessing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Our data driven schema guessing doesn't work under many situations. For example, if the first document has a field with value "0", it is guessed as Long and subsequent fields with "0.0" are rejected. Similarly, if the same field had alphanumeric contents for a latter document, those documents are rejected. Also, single vs. multi valued field guessing is not ideal.

Proposing an offline training mode where Solr accepts bunch of documents and returns a guessed schema (without indexing). This schema can then be used for actual indexing. I think the original idea is from Hoss.

I think initial implementation can be based on an UpdateRequestProcessor. We can hash out the API soon, as we go along.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

screenshot-1.png
07/Jan/18 19:50
17 kB
Abhishek Kumar Singh
screenshot-3.png
07/Jan/18 20:08
15 kB
Abhishek Kumar Singh
SOLR-11741-temp.patch
07/Jan/18 20:45
10 kB
Abhishek Kumar Singh
RuleForMostAccomodatingField.png
07/Jan/18 21:03
14 kB
Abhishek Kumar Singh
SOLR-11741.patch
21/Apr/18 13:37
77 kB
Abhishek Kumar Singh
SOLR-11741.patch
05/May/18 16:32
1019 kB
Abhishek Kumar Singh
SOLR-11741.patch
05/May/18 16:33
1019 kB
Abhishek Kumar Singh

Issue Links

is related to

SOLR-6939 UpdateProcessor to buffer & sample documents and then batch create neccessary fields

Open

SOLR-14701 Deprecate Schemaless Mode (Discussion)

Open

relates to

SOLR-15277 Schema Designer in Admin UI

Closed

Activity

People

Assignee:: Ishan Chattopadhyaya

Reporter:: Ishan Chattopadhyaya

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 09/Dec/17 10:35

Updated:: 15/Mar/23 13:00