Lucene - Core
  1. Lucene - Core
  2. LUCENE-2481

Enhance SnapshotDeletionPolicy to allow taking multiple snapshots

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      A spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/99161?do=post_view_threaded#99161

      I will:

      1. Replace snapshot() with snapshot(String), so that one can name/identify the snapshot
      2. Add some supporting methods, like release(String), getSnapshots() etc.
      3. Some unit tests of course.

      This is mostly written already - I want to contribute it. I've also written a PersistentSDP, which persists the snapshots on stable storage (a Lucene index in this case) to support opening an IW with existing snapshots already, so they don't get deleted. If it's interesting, I can contribute it as well.

      Porting my patch to the new API. Should post it soon.

      1. LUCENE-2481-3x.patch
        44 kB
        Shai Erera
      2. LUCENE-2481-3x.patch
        44 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Enhancements to SnapshotDeletionPolicy + tests.

        Also, I've added a PersistentSDP, which persists the snapshots information in a Lucene Directory. In case the JVM crashes, the info from the Directory can be used to open an IndexWriter on the other Directory w/ already snapshotted commits (prevents their deletion).

        Show
        Shai Erera added a comment - Enhancements to SnapshotDeletionPolicy + tests. Also, I've added a PersistentSDP, which persists the snapshots information in a Lucene Directory. In case the JVM crashes, the info from the Directory can be used to open an IndexWriter on the other Directory w/ already snapshotted commits (prevents their deletion).
        Hide
        Shai Erera added a comment - - edited

        If there are no comments/objections, I'd like to commit this in 2-3 days.

        Show
        Shai Erera added a comment - - edited If there are no comments/objections, I'd like to commit this in 2-3 days.
        Hide
        Michael McCandless added a comment -

        This looks nice Shai!

        So now you can keep any number of outstanding snapshots, and, if you
        use PersistentSnapshotDeletionPolicy, you can keep these snapshots
        alive across close/open of IndexWriter, right?

        And, if one has some other way to persist snapshots (instead of using
        a Lucene index as PersistentSnapshotDeletionPolicy does), one can
        simply subclass SnapshotDeletionPolicy

        Can you rename "String segment" -> "String segmentsFileName" (or
        segmentsFile)? Ie, this is referring to the collection of segments
        that make up a given commit? It's confusing to see "segment" all over
        because it makes me think you're somehow doing something
        per-segment...

        Show
        Michael McCandless added a comment - This looks nice Shai! So now you can keep any number of outstanding snapshots, and, if you use PersistentSnapshotDeletionPolicy, you can keep these snapshots alive across close/open of IndexWriter, right? And, if one has some other way to persist snapshots (instead of using a Lucene index as PersistentSnapshotDeletionPolicy does), one can simply subclass SnapshotDeletionPolicy Can you rename "String segment" -> "String segmentsFileName" (or segmentsFile)? Ie, this is referring to the collection of segments that make up a given commit? It's confusing to see "segment" all over because it makes me think you're somehow doing something per-segment...
        Hide
        Shai Erera added a comment -

        you can keep these snapshots alive across close/open of IndexWriter, right?

        Yes, that's right ! In fact, PSDP exists for exactly that purpose, and can also serve as an example for whoever would like to persist the snapshots elsewhere.

        Can you rename "String segment" -> "String segmentsFileName" (or segmentsFile)?

        I've renamed it to segmentsFileName, and also segmentToIDs -> segmentsFileToIDs to make it clearer. Thanks for the feedback !

        I may commit this tomorrow.

        Show
        Shai Erera added a comment - you can keep these snapshots alive across close/open of IndexWriter, right? Yes, that's right ! In fact, PSDP exists for exactly that purpose, and can also serve as an example for whoever would like to persist the snapshots elsewhere. Can you rename "String segment" -> "String segmentsFileName" (or segmentsFile)? I've renamed it to segmentsFileName, and also segmentToIDs -> segmentsFileToIDs to make it clearer. Thanks for the feedback ! I may commit this tomorrow.
        Hide
        Earwin Burrfoot added a comment -

        Still dislike that string key you have to generate somehow. And if you just want some threads to get and release snapshots independently? You have to devise some random key generation scheme?

        Show
        Earwin Burrfoot added a comment - Still dislike that string key you have to generate somehow. And if you just want some threads to get and release snapshots independently? You have to devise some random key generation scheme?
        Hide
        Shai Erera added a comment -

        And if you just want some threads to get and release snapshots independently? You have to devise some random key generation scheme?

        No - you can simply use the thread's name .

        But really, the ID is important for different processes that take a snapshot of the same commit for different purposes. It makes it more clear who holds which snapshot. Also better for debugging later, if say the JVM crashed, and you use the persistency layer, you can tell which of your processes kept a snapshot (and didn't release yet).

        Show
        Shai Erera added a comment - And if you just want some threads to get and release snapshots independently? You have to devise some random key generation scheme? No - you can simply use the thread's name . But really, the ID is important for different processes that take a snapshot of the same commit for different purposes. It makes it more clear who holds which snapshot. Also better for debugging later, if say the JVM crashed, and you use the persistency layer, you can tell which of your processes kept a snapshot (and didn't release yet).
        Hide
        Shai Erera added a comment -

        Another thing about IDs - they are more user-level API than IndexCommit.

        So Earwin, I agree that we can have snapshot() and release(IndexCommit) to achieve the same functionality (need to be careful w/ multiple snapshots over the same IC). But for users, passing in an IndexCommit is not too friendly. Also, one customer of the API already uses the ID to encode some information that's interesting to him (e.g. name of the process + timestamp) which shows why IDs should remain. It's kind of like the commitUserData given to commits. Another reason is sharing the ID between two different code segments. It's easier to share a String-ID, than to share an IndexCommit. And what exactly is IndexCommit, and how does it translate to a key? Just the segmentsFileName? See - that's a too low level implementation detail IMO ...

        Show
        Shai Erera added a comment - Another thing about IDs - they are more user-level API than IndexCommit. So Earwin, I agree that we can have snapshot() and release(IndexCommit) to achieve the same functionality (need to be careful w/ multiple snapshots over the same IC). But for users, passing in an IndexCommit is not too friendly. Also, one customer of the API already uses the ID to encode some information that's interesting to him (e.g. name of the process + timestamp) which shows why IDs should remain. It's kind of like the commitUserData given to commits. Another reason is sharing the ID between two different code segments. It's easier to share a String-ID, than to share an IndexCommit. And what exactly is IndexCommit, and how does it translate to a key? Just the segmentsFileName? See - that's a too low level implementation detail IMO ...
        Hide
        Shai Erera added a comment -

        Some javadoc updates and member renames (as Mike suggested). I plan to commit this shortly.

        Show
        Shai Erera added a comment - Some javadoc updates and member renames (as Mike suggested). I plan to commit this shortly.
        Hide
        Shai Erera added a comment -

        Committed revision 949730 (3x).
        Committed revision 949756 (trunk).

        Show
        Shai Erera added a comment - Committed revision 949730 (3x). Committed revision 949756 (trunk).
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development