[SOLR-4799] DIH: join="zipper" aka merge join for nested EntityProcessors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.0, 6.0
Component/s: contrib - DataImportHandler
Labels:

Description

DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance.

This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

The idea is:

parent table is explicitly ordered by it’s PK in SQL
children table is explicitly ordered by parent_id FK in SQL
children entity processor joins ordered resultsets by ‘zipper’ algorithm.

example usage for what's committed:

<dataConfig>
	<document>
		<entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id">		
			<entity name="child_1" processor="SqlEntityProcessor"
				where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" join="zipper" >
			</entity>			
		</entity>
	</document>
</dataConfig>

mind about:

ordering both sides
specifying join="zipper" at children entity
note that it works with any entity processors

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-4799.patch
01/Dec/14 14:09
28 kB
Mikhail Khludnev
SOLR-4799.patch
30/Nov/14 20:13
28 kB
Mikhail Khludnev
SOLR-4799.patch
29/Nov/14 21:19
25 kB
Mikhail Khludnev
SOLR-4799.patch
28/Nov/14 22:46
18 kB
Mikhail Khludnev
SOLR-4799.patch
11/Jul/13 10:27
17 kB
Mikhail Khludnev

Issue Links

links to

blog describes overall context and ideas

Activity

People

Assignee:: Noble Paul

Reporter:: Mikhail Khludnev

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 08/May/13 01:04

Updated:: 09/May/16 18:51

Resolved:: 03/Dec/14 12:09