Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13102

Shared storage Directory implementation



    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None


      We need a general strategy (and probably a general base class) that can work with shared storage and not corrupt indexes from multiple writers.

      One strategy that is used on local disk is to use locks. This doesn't extend well to remote / shared filesystems when the locking is not tied into the object store itself since a process can lose the lock (a long GC or whatever) and then immediately try to write a file and there is no way to stop it.

      An alternate strategy ditches the use of locks and simply avoids overwriting files by some algorithmic mechanism.
      One of my colleagues outlined one way to do this: https://www.youtube.com/watch?v=UeTFpNeJ1Fo
      That strategy uses random looking filenames and then writes a "core.metadata" file that maps between the random names and the original names. The problem is then reduced to overwriting "core.metadata" when you lose the lock. One way to fix this is to version "core.metadata". Since the new leader election code was implemented, each shard as a monotonically increasing "leader term", and we can use that as part of the filename. When a reader goes to open an index, it can use the latest file from the directory listing, or even use the term obtained from ZK if we can't trust the directory listing to be up to date. Additionally, we don't need random filenames to avoid collisions... a simple unique prefix or suffix would work fine (such as the leader term again)


        Issue Links



              Unassigned Unassigned
              yseeley@gmail.com Yonik Seeley
              0 Vote for this issue
              6 Start watching this issue