Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
Description
DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance.
This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
The idea is:
- parent table is explicitly ordered by it’s PK in SQL
- children table is explicitly ordered by parent_id FK in SQL
- children entity processor joins ordered resultsets by ‘zipper’ algorithm.
example usage for what's committed:
<dataConfig> <document> <entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id"> <entity name="child_1" processor="SqlEntityProcessor" where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" join="zipper" > </entity> </entity> </document> </dataConfig>
mind about:
- ordering both sides
- specifying join="zipper" at children entity
- note that it works with any entity processors