HBase
  1. HBase
  2. HBASE-10118

Major compact keeps deletes with future timestamps

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      hbase.hstore.time.to.purge.deletes has been changed; if it is not set, or set to 0, all delete markers including those with future timestamp are purged during the later major compaction. Otherwise, a delete marker is kept until the major compaction after marker's timestamp + this setting.
      Show
      hbase.hstore.time.to.purge.deletes has been changed; if it is not set, or set to 0, all delete markers including those with future timestamp are purged during the later major compaction. Otherwise, a delete marker is kept until the major compaction after marker's timestamp + this setting.

      Description

      Hello!

      During migration from HBase 0.90.6 to 0.94.6 we found changed behaviour in how major compact handles delete markers with timestamps in the future. Before HBASE-4721 major compact purged deletes regardless of their timestamp. Newer versions keep them in HFile until timestamp not reached.

      I guess this happened due to new check in ScanQueryMatcher EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes.

      This can be worked around by specifying large negative value in hbase.hstore.time.to.purge.deletes option, but, unfortunately, negative values are pulled up to zero by Math.max in HStore.java.

      Maybe, we are trying to do something weird by specifing delete timestamp in future, but HBASE-4721 definitely breaks old behaviour we rely on.

      Steps to reproduce this:

      put 'test', 'delmeRow', 'delme:something', 'hello'
      flush 'test'
      delete 'test', 'delmeRow', 'delme:something', 1394161431061
      flush 'test'
      major_compact 'test'
      

      Before major_compact we have two hfiles with the following:

      first:
      K: delmeRow/delme:something/1384161431061/Put/vlen=5/ts=0
      
      second:
      K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
      

      After major compact we get the following:

      K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
      

      In our installation, we resolved this by removing Math.max and setting hbase.hstore.time.to.purge.deletes to Integer.MIN_VALUE, which purges delete markers, and it looks like a solution. But, maybe, there are better approach.

      1. HBASE-10118-trunk-v1.diff
        4 kB
        Liu Shaohui
      2. HBASE-10118-trunk-v2.diff
        5 kB
        Liu Shaohui
      3. HBASE-10118-trunk-v3.diff
        6 kB
        Liu Shaohui
      4. HBASE-10118-0.94-v1.diff
        4 kB
        Liu Shaohui
      There are no Sub-Tasks for this issue.

        Activity

        Max Lapan created issue -
        Max Lapan made changes -
        Field Original Value New Value
        Description Hello!

        During migration from HBase 0.90.6 to 0.94.6 we found changed behaviour in how major compact handles delete markers with timestamps in the future. Before HBASE-4721 major compact purged deletes regardless of their timestamp. Newer versions keep them in HFile until timestamp not reached.

        I guess this happened due to new check in ScanQueryMatcher {{EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes}}.

        This can be worked around by specifying large negative value in {{hbase.hstore.time.to.purge.deletes}} option, but, unfortunately, negative values are pulled up to zero by Math.max in HStore.java.

        It is very possible that we are trying to do something weird by specifing delete timestamp in future, but HBASE-4721 definitely breaks old behaviour we rely on.

        Steps to reproduce this:
        {code}
        put 'test', 'delmeRow', 'delme:something', 'hello'
        flush 'test'
        delete 'test', 'delmeRow', 'delme:something', 1394161431061
        flush 'test'
        major_compact 'test'
        {code}

        Before major_compact we have two hfiles with the following:
        {code}
        first:
        K: delmeRow/delme:something/1384161431061/Put/vlen=5/ts=0

        second:
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        After major compact we get the following:
        {code}
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        In our installation, we resolved this by removing Math.max and setting hbase.hstore.time.to.purge.deletes to Integer.MIN_VALUE, which purges delete markers.
        Hello!

        During migration from HBase 0.90.6 to 0.94.6 we found changed behaviour in how major compact handles delete markers with timestamps in the future. Before HBASE-4721 major compact purged deletes regardless of their timestamp. Newer versions keep them in HFile until timestamp not reached.

        I guess this happened due to new check in ScanQueryMatcher {{EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes}}.

        This can be worked around by specifying large negative value in {{hbase.hstore.time.to.purge.deletes}} option, but, unfortunately, negative values are pulled up to zero by Math.max in HStore.java.

        Maybe, we are trying to do something weird by specifing delete timestamp in future, but HBASE-4721 definitely breaks old behaviour we rely on.

        Steps to reproduce this:
        {code}
        put 'test', 'delmeRow', 'delme:something', 'hello'
        flush 'test'
        delete 'test', 'delmeRow', 'delme:something', 1394161431061
        flush 'test'
        major_compact 'test'
        {code}

        Before major_compact we have two hfiles with the following:
        {code}
        first:
        K: delmeRow/delme:something/1384161431061/Put/vlen=5/ts=0

        second:
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        After major compact we get the following:
        {code}
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        In our installation, we resolved this by removing Math.max and setting hbase.hstore.time.to.purge.deletes to Integer.MIN_VALUE, which purges delete markers.
        Max Lapan made changes -
        Description Hello!

        During migration from HBase 0.90.6 to 0.94.6 we found changed behaviour in how major compact handles delete markers with timestamps in the future. Before HBASE-4721 major compact purged deletes regardless of their timestamp. Newer versions keep them in HFile until timestamp not reached.

        I guess this happened due to new check in ScanQueryMatcher {{EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes}}.

        This can be worked around by specifying large negative value in {{hbase.hstore.time.to.purge.deletes}} option, but, unfortunately, negative values are pulled up to zero by Math.max in HStore.java.

        Maybe, we are trying to do something weird by specifing delete timestamp in future, but HBASE-4721 definitely breaks old behaviour we rely on.

        Steps to reproduce this:
        {code}
        put 'test', 'delmeRow', 'delme:something', 'hello'
        flush 'test'
        delete 'test', 'delmeRow', 'delme:something', 1394161431061
        flush 'test'
        major_compact 'test'
        {code}

        Before major_compact we have two hfiles with the following:
        {code}
        first:
        K: delmeRow/delme:something/1384161431061/Put/vlen=5/ts=0

        second:
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        After major compact we get the following:
        {code}
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        In our installation, we resolved this by removing Math.max and setting hbase.hstore.time.to.purge.deletes to Integer.MIN_VALUE, which purges delete markers.
        Hello!

        During migration from HBase 0.90.6 to 0.94.6 we found changed behaviour in how major compact handles delete markers with timestamps in the future. Before HBASE-4721 major compact purged deletes regardless of their timestamp. Newer versions keep them in HFile until timestamp not reached.

        I guess this happened due to new check in ScanQueryMatcher {{EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes}}.

        This can be worked around by specifying large negative value in {{hbase.hstore.time.to.purge.deletes}} option, but, unfortunately, negative values are pulled up to zero by Math.max in HStore.java.

        Maybe, we are trying to do something weird by specifing delete timestamp in future, but HBASE-4721 definitely breaks old behaviour we rely on.

        Steps to reproduce this:
        {code}
        put 'test', 'delmeRow', 'delme:something', 'hello'
        flush 'test'
        delete 'test', 'delmeRow', 'delme:something', 1394161431061
        flush 'test'
        major_compact 'test'
        {code}

        Before major_compact we have two hfiles with the following:
        {code}
        first:
        K: delmeRow/delme:something/1384161431061/Put/vlen=5/ts=0

        second:
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        After major compact we get the following:
        {code}
        K: delmeRow/delme:something/1394161431061/DeleteColumn/vlen=0/ts=0
        {code}

        In our installation, we resolved this by removing Math.max and setting hbase.hstore.time.to.purge.deletes to Integer.MIN_VALUE, which purges delete markers, and it looks like a solution. But, maybe, there are better approach.
        Liu Shaohui made changes -
        Attachment HBASE-10118-trunk-v1.diff [ 12636677 ]
        Liu Shaohui made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Liu Shaohui [ liushaohui ]
        Ted Yu made changes -
        Fix Version/s 0.99.0 [ 12325675 ]
        Liu Shaohui made changes -
        Attachment HBASE-10118-trunk-v2.diff [ 12636855 ]
        Liu Shaohui made changes -
        Fix Version/s 0.99.0 [ 12325675 ]
        Liu Shaohui made changes -
        Attachment HBASE-10118-trunk-v3.diff [ 12638201 ]
        Liu Shaohui made changes -
        Attachment HBASE-10118-trunk-v3.diff [ 12638201 ]
        Liu Shaohui made changes -
        Attachment HBASE-10118-trunk-v3.diff [ 12638209 ]
        Sergey Shelukhin made changes -
        Release Note hbase.hstore.time.to.purge.deletes has been changed; if it is not set, or set to 0, all delete markers including those with future timestamp are purged during the later major compaction. Otherwise, a delete marker is kept until the major compaction after marker's timestamp + this setting.
        Liu Shaohui made changes -
        Attachment HBASE-10118-0.94-v1.diff [ 12638620 ]
        Lars Hofhansl made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Fix Version/s 0.94.19 [ 12326287 ]
        Fix Version/s 0.98.2 [ 12326505 ]
        Fix Version/s 0.96.3 [ 12326538 ]
        Resolution Fixed [ 1 ]
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Liu Shaohui
            Reporter:
            Max Lapan
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development