HBase
  1. HBase
  2. HBASE-2649

Make use of storefile 'order' making certain decisions during merge sort

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When we merge sort results currently there is no regard for storefile order. This issue is about exploiting store file order at certain junctures. For example, if we have N KVs all of the same coordinates – same r/f/q/type/ts – then the one from the storefile that was made most recently should prevail. Also, we might consider order when looking at deletes so our tombstones are less tombstoney in that they'll only apply to values that are in storefiles older than the one that carries the delete marker (this latter sounds hard but putting it out there anyways).

      1. HBASE-1485-V9.patch
        43 kB
        Pranav Khaitan

        Issue Links

          Activity

          Pranav Khaitan made changes -
          Attachment HBASE-1485-V9.patch [ 12454750 ]
          Hide
          Pranav Khaitan added a comment -

          This patch was committed to trunk a week ago. Just attaching to the jira.

          Show
          Pranav Khaitan added a comment - This patch was committed to trunk a week ago. Just attaching to the jira.
          Jonathan Gray made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Jonathan Gray added a comment -

          Fixed with HBASE-1485 commit.

          Show
          Jonathan Gray added a comment - Fixed with HBASE-1485 commit.
          Pranav Khaitan made changes -
          Assignee Pranav Khaitan [ pranavkhaitan ]
          Hide
          Jonathan Gray added a comment -

          I'd opt to keep this open. I feel pretty strongly that this is a good short-term solution towards those bigger jiras. It does suggest an implementation but the others are pretty broad... this is specific and targeted so might be able to find someone to implement it

          Show
          Jonathan Gray added a comment - I'd opt to keep this open. I feel pretty strongly that this is a good short-term solution towards those bigger jiras. It does suggest an implementation but the others are pretty broad... this is specific and targeted so might be able to find someone to implement it
          Hide
          stack added a comment -

          Thinking on it, let me close this issue. Its an issue that suggests an implementation for dealing with hbase-1485 and a newly added issue, HBASE-2847 (We can argue in HBASE-2406 if HBASE-2847 is a 'bug' or not).

          Show
          stack added a comment - Thinking on it, let me close this issue. Its an issue that suggests an implementation for dealing with hbase-1485 and a newly added issue, HBASE-2847 (We can argue in HBASE-2406 if HBASE-2847 is a 'bug' or not).
          stack made changes -
          Link This issue is blocked by HBASE-2406 [ HBASE-2406 ]
          Hide
          Jonathan Gray added a comment -

          I'm interested in this for the first use case. As for second case, as Ryan said, this is basically what we used to do. I think as far as deletes and such are concerned, we need to nail down what we do/don't want to do during a flush and during minor compactions.

          With HBASE-2248 we now have MemStore taking care of the ordering of duplicate versions, so within each StoreFile, they will be ordered with the most recent at the top. The only thing that remains to properly handle ordering of duplicate versions is cross-StoreFile.

          One solution is to include sequence numbers for every KV in the StoreFiles. The other is this jira. I am +1 on this jira and have a general idea how to do it.

          Show
          Jonathan Gray added a comment - I'm interested in this for the first use case. As for second case, as Ryan said, this is basically what we used to do. I think as far as deletes and such are concerned, we need to nail down what we do/don't want to do during a flush and during minor compactions. With HBASE-2248 we now have MemStore taking care of the ordering of duplicate versions, so within each StoreFile, they will be ordered with the most recent at the top. The only thing that remains to properly handle ordering of duplicate versions is cross-StoreFile. One solution is to include sequence numbers for every KV in the StoreFiles. The other is this jira. I am +1 on this jira and have a general idea how to do it.
          stack made changes -
          Field Original Value New Value
          Fix Version/s 0.21.0 [ 12313607 ]
          Hide
          stack added a comment -

          Bringing into 0.21

          Show
          stack added a comment - Bringing into 0.21
          Hide
          stack added a comment -

          @Ryan I wasn't advocating going back to old-style Gets. Storefile order is info we can exploit under certain circumstances (e.g. my first example above should not be contentious).

          Let this issue subsume HBASE-2454 "Revisit major compaction deletes after hbase-2248 (take storefile age into consideration)" because its more general. There we said "See Jon's note over in HBASE-2453 on issues with deletes around major compactions."

          Show
          stack added a comment - @Ryan I wasn't advocating going back to old-style Gets. Storefile order is info we can exploit under certain circumstances (e.g. my first example above should not be contentious). Let this issue subsume HBASE-2454 "Revisit major compaction deletes after hbase-2248 (take storefile age into consideration)" because its more general. There we said "See Jon's note over in HBASE-2453 on issues with deletes around major compactions."
          Hide
          ryan rawson added a comment -

          The old get code essentially did this. We decided to remove it for simplicity in HBASE-2248. Having things controlled by an arbitrary opaque concept (Store file order) is probably not a good idea.

          Specifically the old get code which orders store files for Puts caused a situation where the insert order via store files overrides the timestamp (user settable thing) and made it so Gets and Scans were giving different answers. I think we should NOT consider doing so, and even though tombstones stay longer than we'd want to, going for simplicity and easy to understand APIs and inner workings is better.

          What we might consider is allowing access to tombstone markers via some API so people can get to understand how their data is being masked. Right now the Result() return only has Put KeyValues.

          Show
          ryan rawson added a comment - The old get code essentially did this. We decided to remove it for simplicity in HBASE-2248 . Having things controlled by an arbitrary opaque concept (Store file order) is probably not a good idea. Specifically the old get code which orders store files for Puts caused a situation where the insert order via store files overrides the timestamp (user settable thing) and made it so Gets and Scans were giving different answers. I think we should NOT consider doing so, and even though tombstones stay longer than we'd want to, going for simplicity and easy to understand APIs and inner workings is better. What we might consider is allowing access to tombstone markers via some API so people can get to understand how their data is being masked. Right now the Result() return only has Put KeyValues.
          stack created issue -

            People

            • Assignee:
              Pranav Khaitan
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development