Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Delete is incomplete in hbase. Whats there is inconsistent. Deleted records currently persist and are never cleaned up. This issue is about making delete behavior coherent across gets, scans and compaction.

      Below is from a bit of back and forth between Jim and myself where Jim takes a stab at outlining a model for delete taking inspiration from how Digital's versioned file system used work:

      Let's say you have 5 versions with timestamps T1, T2, ..., T5 where
      timestamps are increasing from T1 to T5 (so T5 is the newest).
      
      Before any deletes occur, if you don't specify a timestamp and request N
      versions, you should get T5 first, then T4, T3, ... until you have
      reached N or you run out of versions.
      
      Now add deletes:
      
      (In the following, timestamp refers to the timestamp associated with
      the delete operation)
      
      1. If no timestamp is specified we are deleting the latest version.
         If a get or scanner specifies that it wants N versions, then it 
         should get T4, T3, ..., until we have N versions or we run out of
         older versions. After compaction, the deletion record and T5 should
         be elided from the HStore.
      
      2. If a timestamp is specified and it exactly matches a version (say
         T4) and a get or scanner requests N versions, then the client
         receives T5, T3, T2, ... until we satisfy N or run out of versions.
         After a compaction, the deletion record and T4 should be elided
         from the HStore.
      
      3. If a timestamp is specified and does not exactly match a version,
         it means delete every version older than this timestamp. If the
         timestamp is greater than T5 all versions are considered to be
         deleted and a get or a scanner will return no results even if 
         the get or scanner specify an older time. This is consistent
         with the concept of delete all versions older than timestamp.
         After a compaction, the delete record and all the values should
         be elided.
      
         If the specified timestamp falls between two older versions (say
         T4 and T3) then T3, T2 and T1 are considered to be deleted (again
         this is all versions older than timestamp). A get or scanner
         that specifies no time but requests N versions can only get T5
         and T4. A get or scanner that requests a time of T3 or earlier
         will get no results because those versions are deleted. After
         a compaction, the deletion record and the deleted versions
         are elided from the HStore.
      
      1. delete1.patch
        64 kB
        stack
      2. delete2.patch
        84 kB
        stack
      3. delete3.patch
        98 kB
        stack
      4. delete4.patch
        99 kB
        stack

        Issue Links

          Activity

          Jeff Hammerbacher made changes -
          Link This issue relates to HBASE-3543 [ HBASE-3543 ]
          Owen O'Malley made changes -
          Assignee stack [ stack ]
          Key HADOOP-1784 HBASE-315
          Project Hadoop Core [ 12310240 ] Hadoop HBase [ 12310753 ]
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          Fix Version/s 0.15.0 [ 12312565 ]
          Owen O'Malley made changes -
          Component/s contrib/hbase [ 12311752 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Resolution Fixed [ 1 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          stack made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          stack made changes -
          Attachment delete4.patch [ 12365442 ]
          stack made changes -
          Status Patch Available [ 10002 ] In Progress [ 3 ]
          stack made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Fix Version/s 0.15.0 [ 12312565 ]
          stack made changes -
          Attachment delete3.patch [ 12365426 ]
          stack made changes -
          Attachment delete2.patch [ 12365393 ]
          stack made changes -
          Attachment delete1.patch [ 12365365 ]
          stack made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Jim Kellerman made changes -
          Field Original Value New Value
          Link This issue blocks HADOOP-1550 [ HADOOP-1550 ]
          stack created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development