Details
-
Improvement
-
Status: Open
-
Trivial
-
Resolution: Unresolved
-
None
-
None
Description
I am working on creating an abstraction over Lucene wherein I have 2 places where data is stored: local disk and remote cloud storage. In case the host on which index is present gets terminated due to some issue, I want to be able to replicate the index on another host.
While trying to recreate index on another host, I start by downloading the metadata files associated with the index (segment_N, .si files) and once done with this, I try to initialise an IndexWriter object on top of the local directory to which this has been downloaded from remote storage. This helps me begin indexing the data (I don't have any updateDocs call and only addDocs operation is used) without the need to download data for older segments.
While doing so, I am seeing error during the initialization of IndexWriter itself as it tries to get the field number mappings for the previous segments before it can create the IndexWriter object.
With the compound file system enabled, this requires to download .cfs files from the remote storage which in turn increases the time required to initialize the IndexWriter, and thus the time before which new host can accept the incoming requests increases resulting in the application rejecting a large number of customer requests.
- Why do we need the fnm files from previous segments while creating the IndexWriter?
- Could you help with a workaround for this to prevent downloading the extra files apart from commit metadata