Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Tangentally mentioned in a blog post, James Hamilton talks about deferred deletes:

      If you have an application error, administrative error, or database implementation bug that losses data, then it is simply gone unless you have an offline copy. This, by the way, is why I'm a big fan of deferred delete. This is a technique where deleted items are marked as deleted but not garbage collected until some days or preferably weeks later. Deferred delete is not full protection but it has saved my butt more than once and I'm a believer. See On Designing and Deploying Internet-Scale Services (http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf) for more detail.

      (See http://perspectives.mvdirona.com/2010/04/07/StonebrakerOnCAPTheoremAndDatabases.aspx)

      Because deletes – at least, after the initial write has been flushed from memstore – are tombstones, deferred delete in HBase could be supported if somehow tombstones could be invalidated, an undelete operation in effect. This could be accomplished by adding support for tombstones for deletes. Would complicate major compaction but otherwise not touch much. A typical use case might be "resurrect any data deleted from ts1 to ts2 ", a period of 4 hours when an application error was operative. In this case a new write would be issued to the table that is a tombstone covering any deletes over that period of time. Users would defer major compactions until safe checkpoint periods.

      Such guarantees could optionally be extended to the memstoe by using tombstones there as well. But it would probably be sufficient to provide guidance that forcing a flush is necessary to insure edits are persisted in a way that allows for undeletion.

        Activity

        Hide
        Jonathan Gray added a comment -

        Seems like this would be part of the SnapshotScanner in HBASE-2376? You use TTKAV (TimeToKeepAllVersions) to prevent compactions from wiping any data.

        Show
        Jonathan Gray added a comment - Seems like this would be part of the SnapshotScanner in HBASE-2376 ? You use TTKAV (TimeToKeepAllVersions) to prevent compactions from wiping any data.
        Hide
        Lars Hofhansl added a comment -

        Seems this is duplicated by HBASE-4536... With that it is possible to do time-range queries past delete markers (if store was enabled for KEEP_DELETED_CELLS). So quite the same, but targets a similar usecase.

        Show
        Lars Hofhansl added a comment - Seems this is duplicated by HBASE-4536 ... With that it is possible to do time-range queries past delete markers (if store was enabled for KEEP_DELETED_CELLS). So quite the same, but targets a similar usecase.
        Hide
        stack added a comment -

        We can close this issue then? (After adding a bit of 'how to do deferred deletes' to the manual?)

        Show
        stack added a comment - We can close this issue then? (After adding a bit of 'how to do deferred deletes' to the manual?)

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Purtell
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development