[SOLR-7188] Run Data Import Handler processes in a SolrJ client - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: contrib - DataImportHandler
Labels:
None

Description

Adds a DataImportHandlerClient class that wraps an EmbeddedSolrServer and adds a DIHCloudWriter implementation of DIHWriter that sends documents to a remote SolrCloud cluster. This enables existing DIH processes to run outside of the Solr JVM which should enable better scalability.

The current architecture of DIH imposes several restrictions on scalability. First, the DIH runs in the same process space as Solr itself and competes for resources (CPU and memory) with normal Solr processes devoted to indexing and querying. Second, the DIH cannot be multi-threaded which means that parallelizing it requires splitting the processing amongst nodes in a SolrCloud cluster. Since the incoming data is sent through an UpdateRequestProcessor chain (via the SolrWriter implementation of DIHWriter), additional routing is done internally as the documents are forwarded to the current shard leader nodes once the ID hash is computed. This causes additional network traffic within the SolrCloud cluster. Scaling the DIH is limited by the number of nodes in the cluster and any heavy-duty processing due to entity processors or transformation elements shares the processing resources of Solr itself. This is known to be a source of bottlenecks in Solr installations (SolrCloud or Master-Slave) that use DIH.

The DataImportHandlerClient uses native DIH functionality - DataImporter, etc. but can be run externally to Solr. This means that as many processes as are needed to achieve necessary performance at scale can be added and the processing that occurs within the DataImportHandler is done outside of the Solr JVM. The same benefits that accrue with multiple SolrJ clients can now be realized with DIH without the necessity of porting code from DIH to a SolrJ client.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ESS_as_a_copy.patch
25/Jun/15 10:13
10 kB
Noble Paul
IDEA-AS-CODE.patch
05/Mar/15 11:20
56 kB
Noble Paul
SOLR-7188.patch
23/Jun/15 09:48
144 kB
Noble Paul
SOLR-7188.patch
23/Jun/15 09:41
144 kB
Noble Paul
SOLR-7188.patch
18/Mar/15 15:02
142 kB
Ted Sullivan
SOLR-7188.patch
04/Mar/15 16:48
62 kB
Ted Sullivan
SOLR-7188.patch
04/Mar/15 15:11
63 kB
Ted Sullivan

Issue Links

duplicates

SOLR-853 Make DIH API friendly

Resolved

is related to

SOLR-4058 DIH should use the SolrCloudServer impl when running in SolrCloud mode.

Resolved

SOLR-9908 create SolrCloudDIHWriter to speedup DataImportHandler on SolrCloud

Closed

is superceded by

SOLR-14783 Remove DIH from 9.0

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Ted Sullivan

Votes:: 11 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 04/Mar/15 15:02

Updated:: 29/Aug/20 19:53

Resolved:: 29/Aug/20 19:53