Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4799

DIH: join="zipper" aka merge join for nested EntityProcessors

    XMLWordPrintableJSON

Details

    Description

      DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance.

      This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

      The idea is:

      • parent table is explicitly ordered by it’s PK in SQL
      • children table is explicitly ordered by parent_id FK in SQL
      • children entity processor joins ordered resultsets by ‘zipper’ algorithm.

      example usage for what's committed:

      <dataConfig>
      	<document>
      		<entity name="parent" processor="SqlEntityProcessor" query="SELECT * FROM PARENT ORDER BY id">		
      			<entity name="child_1" processor="SqlEntityProcessor"
      				where="parent_id=parent.id" query="SELECT * FROM CHILD_1 ORDER BY parent_id" join="zipper" >
      			</entity>			
      		</entity>
      	</document>
      </dataConfig>
      

      mind about:

      1. ordering both sides
      2. specifying join="zipper" at children entity
      3. note that it works with any entity processors

      Attachments

        1. SOLR-4799.patch
          28 kB
          Mikhail Khludnev
        2. SOLR-4799.patch
          28 kB
          Mikhail Khludnev
        3. SOLR-4799.patch
          25 kB
          Mikhail Khludnev
        4. SOLR-4799.patch
          18 kB
          Mikhail Khludnev
        5. SOLR-4799.patch
          17 kB
          Mikhail Khludnev

        Activity

          People

            noble.paul Noble Paul
            mkhl Mikhail Khludnev
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: