Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16697

New API support to import index files generated by Embedded SOLR into SOLR Cloud

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.3
    • Backup/Restore
    • None

    Description

      Offline indexing is a popular option when really large data sets needs to be indexed into SOLR. 
      Data is loaded from data source ( eg. c*)  and index creation pipelines produce index files per shard using embedded SOLR.
       
      With older versions of SOLR, we would copy these index files into SOLR Cloud data directories using a custom tools and reload the collection to be able to search/update on the newly uploaded collection.
      Ideally, we should use the Restore API to import the index files from backup repository. However, the file structure expected for the Restore API to work is complex enough that massaging the index files in every shard into Restore compatible format is infeasible.
       
      It would be good for SOLR to support a 'Restore' like API that would allow us to import index files generated by embedded SOLR into SOLR Cloud ? This API should operate on shard level and be able to import the index files into a single shard (per invocation)
       
      With the new API , offline indexing could look like this : 
       
      1. Generate index files per shard using embedded SOLR as a part of hadoop MR /Spark jobs  and copy all index files for every shard into backup repository.
       
      2. The New API should be able to import the index from backup repository location into each shard on SOLR Cloud. The API would handle things like marking the collection as read-only, trigger replication etc. along the lines of what the 'RESTORE' API currently does.
       
      The new API should be able to support relevant parameters from Restore API ( location & repository )

      Attachments

        Issue Links

          Activity

            People

              gerlowskija Jason Gerlowski
              indurajagopalan Indumathy Rajagopalan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m