Details

      Description

      Functionality:
      1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application.

      2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor).

      3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches.

      4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel.

      Use Cases:
      1. We needed a flexible & scalable way to temporarily cache child-entity data prior to joining to parent entities.

      • Using SqlEntityProcessor with Child Entities can cause an "n+1 select" problem.
      • CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale.
      • There is no way to cache non-SQL inputs (ex: flat files, xml, etc).

      2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process.

      3. We wanted the ability to do a delta import of only the entities that changed.

      • Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed.
      • Our data comes from 50+ complex sql queries and/or flat files.
      • We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed.
      • Persistent DIH caches solve this problem.

      4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the "threads" parameter).

      5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards.

      Implementation Details:
      1. De-couple EntityProcessorBase from caching.

      • Created a new interface, DIHCache & two implementations:
      • SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated).
      • BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar
      • NOTE: the existing Lucene Contrib "db" project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage.
      • NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html

      2. Allow Entity Processors to take a "cacheImpl" parameter to cause the entity data to be cached (see EntityProcessorBase & DIHCacheProperties).

      3. Partially De-couple SolrWriter from DocBuilder

      • Created a new interface DIHWriter, & two implementations:
      • SolrWriter (refactored)
      • DIHCacheWriter (allows DIH to write ultimately to a Cache).

      4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input.

      5. Support a "partition" parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data.

      6. Change the semantics of entity.destroy()

      • Previously, it was being called on each iteration of DocBuilder.buildDocument().
      • Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed.
      • The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change.

      General Notes:
      We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and verified that all existing test cases pass. I believe this patch maintains backwards-compatibility and would be a welcome addition to a future version of Solr.

      1. SOLR-2382.patch
        144 kB
        James Dyer
      2. SOLR-2382.patch
        161 kB
        James Dyer
      3. SOLR-2382.patch
        170 kB
        James Dyer
      4. SOLR-2382.patch
        170 kB
        James Dyer
      5. SOLR-2382.patch
        172 kB
        James Dyer
      6. SOLR-2382.patch
        125 kB
        James Dyer
      7. SOLR-2382.patch
        125 kB
        James Dyer
      8. SOLR-2382.patch
        138 kB
        James Dyer
      9. SOLR-2382-properties.patch
        14 kB
        James Dyer
      10. SOLR-2382-properties.patch
        15 kB
        James Dyer
      11. SOLR-2382-solrwriter.patch
        30 kB
        James Dyer
      12. SOLR-2382-solrwriter.patch
        30 kB
        Noble Paul
      13. SOLR-2382-entities.patch
        60 kB
        James Dyer
      14. SOLR-2382-dihwriter.patch
        38 kB
        James Dyer
      15. SOLR-2382-solrwriter.patch
        29 kB
        James Dyer
      16. SOLR-2382-entities.patch
        59 kB
        James Dyer
      17. SOLR-2382-solrwriter-verbose-fix.patch
        7 kB
        James Dyer
      18. SOLR-2382-entities.patch
        59 kB
        James Dyer
      19. SOLR-2382-entities.patch
        61 kB
        James Dyer
      20. SOLR-2382-dihwriter.patch
        37 kB
        James Dyer
      21. SOLR-2382-entities.patch
        62 kB
        James Dyer
      22. SOLR-2382-entities.patch
        68 kB
        James Dyer
      23. SOLR-2382-dihwriter.patch
        39 kB
        James Dyer
      24. SOLR-2382-entities.patch
        68 kB
        James Dyer
      25. SOLR-2382-entities.patch
        62 kB
        Noble Paul
      26. SOLR-2382-dihwriter.patch
        58 kB
        James Dyer
      27. TestThreaded.java.patch
        4 kB
        Mikhail Khludnev
      28. SOLR-2382-dihwriter.patch
        58 kB
        James Dyer
      29. SOLR-2382-dihwriter_standalone.patch
        59 kB
        James Dyer
      30. TestCachedSqlEntityProcessor.java-break-where-clause.patch
        1 kB
        Mikhail Khludnev
      31. TestCachedSqlEntityProcessor.java-fix-where-clause-by-adding-cachePk-and-lookup.patch
        2 kB
        Mikhail Khludnev
      32. TestCachedSqlEntityProcessor.java-wrong-pk-detected-due-to-lack-of-where-support.patch
        2 kB
        Mikhail Khludnev
      33. SOLR-2382_3x.patch
        86 kB
        James Dyer

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          James Dyer made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          James Dyer made changes -
          Attachment SOLR-2382_3x.patch [ 12519286 ]
          James Dyer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          James Dyer made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee James Dyer [ jdyer ]
          James Dyer made changes -
          Link This issue blocks SOLR-2549 [ SOLR-2549 ]
          James Dyer made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 4.0 [ 12314992 ]
          Resolution Fixed [ 1 ]
          James Dyer made changes -
          Link This issue relates to SOLR-2613 [ SOLR-2613 ]
          James Dyer made changes -
          Link This issue blocks SOLR-2613 [ SOLR-2613 ]
          James Dyer made changes -
          Attachment SOLR-2382-dihwriter_standalone.patch [ 12505376 ]
          James Dyer made changes -
          Attachment SOLR-2382-dihwriter.patch [ 12505360 ]
          Mikhail Khludnev made changes -
          Attachment TestThreaded.java.patch [ 12505259 ]
          James Dyer made changes -
          Attachment SOLR-2382-dihwriter.patch [ 12503675 ]
          Noble Paul made changes -
          Attachment SOLR-2382-entities.patch [ 12501064 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12500225 ]
          James Dyer made changes -
          Attachment SOLR-2382-dihwriter.patch [ 12500224 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12500223 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12499441 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12489482 ]
          Attachment SOLR-2382-dihwriter.patch [ 12489483 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12488222 ]
          James Dyer made changes -
          Attachment SOLR-2382-solrwriter-verbose-fix.patch [ 12488221 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12487560 ]
          James Dyer made changes -
          Attachment SOLR-2382-solrwriter.patch [ 12487335 ]
          James Dyer made changes -
          Attachment SOLR-2382-dihwriter.patch [ 12486468 ]
          James Dyer made changes -
          Attachment SOLR-2382-entities.patch [ 12486352 ]
          Noble Paul made changes -
          Attachment SOLR-2382-solrwriter.patch [ 12486273 ]
          James Dyer made changes -
          Attachment SOLR-2382-solrwriter.patch [ 12486233 ]
          James Dyer made changes -
          Attachment SOLR-2382-properties.patch [ 12486225 ]
          James Dyer made changes -
          Attachment SOLR-2382-properties.patch [ 12486219 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12483747 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12483489 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12483490 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12483489 ]
          James Dyer made changes -
          Link This issue blocks SOLR-2613 [ SOLR-2613 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12483354 ]
          James Dyer made changes -
          Link This issue blocks SOLR-2549 [ SOLR-2549 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12475830 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12474630 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12473842 ]
          James Dyer made changes -
          Attachment SOLR-2382.patch [ 12472218 ]
          James Dyer made changes -
          Field Original Value New Value
          Attachment SOLR-2382.patch [ 12471980 ]
          James Dyer created issue -

            People

            • Assignee:
              James Dyer
              Reporter:
              James Dyer
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development