DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance.
This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
The idea is:
- parent table is explicitly ordered by it’s PK in SQL
- children table is explicitly ordered by parent_id FK in SQL
- children entity processor joins ordered resultsets by ‘zipper’ algorithm.
example usage for what's committed:
- ordering both sides
- specifying join="zipper" at children entity
- note that it works with any entity processors