HBase
  1. HBase
  2. HBASE-5674

add support in HBase to overwrite hbase timestamp to a version number during major compaction

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Right now, a millisecond-level timestamp is attached to every record.
      In our case, we only need a version number (mostly it will be just zero etc). A millisecond timestamp is too heavy to carry. We should add support to overwrite it to zero during major compaction.
      KVs before major compaction will remain using system timestamp. And this should be configurable, so that we should not mess up if the hbase timestamp is specified by application.

        Activity

        Hide
        Scott Chen added a comment -

        The timestamp takes 8 bytes for every column.

        I meant to say
        the timestamp takes 8 bytes for every keyvalue.

        Show
        Scott Chen added a comment - The timestamp takes 8 bytes for every column. I meant to say the timestamp takes 8 bytes for every keyvalue.
        Hide
        Scott Chen added a comment -

        > Can you not just have your client specify timestamp of 0?

        We still need the timestamp when the write was happening for versioning.
        But after major compaction we don't need the exact timestamp.
        We only need to know it's version is old.

        The timestamp takes 8 bytes for every column.
        And the last digits of these timestamps are very random so it cannot be compressed well.
        In our tests, the space it consumed can be quite significant.

        I believe in a lot of use cases, these millisecond resolution timestamp may not be useful after compaction.
        It will be nice to have the ability to remove them to save some spaces.

        Show
        Scott Chen added a comment - > Can you not just have your client specify timestamp of 0? We still need the timestamp when the write was happening for versioning. But after major compaction we don't need the exact timestamp. We only need to know it's version is old. The timestamp takes 8 bytes for every column. And the last digits of these timestamps are very random so it cannot be compressed well. In our tests, the space it consumed can be quite significant. I believe in a lot of use cases, these millisecond resolution timestamp may not be useful after compaction. It will be nice to have the ability to remove them to save some spaces.
        Hide
        He Yongqiang added a comment - - edited

        Thanks Matt and stack for the point out of 4676. Yeah, we are very very interested in the work that is going on HBase-4676.

        Show
        He Yongqiang added a comment - - edited Thanks Matt and stack for the point out of 4676. Yeah, we are very very interested in the work that is going on HBase-4676.
        Hide
        stack added a comment -

        @He np. Thanks for the background. On a slightly related note, I was going to ask if you'd been following Matt's work over in hbase-4676. The compression factor he gets over there I thought you'd be interested in.

        Show
        stack added a comment - @He np. Thanks for the background. On a slightly related note, I was going to ask if you'd been following Matt's work over in hbase-4676. The compression factor he gets over there I thought you'd be interested in.
        Hide
        Matt Corgan added a comment -

        I've been brainstorming something similar as a follow-on to HBASE-4676. The more similar timestamps you have in a block, the smaller the encoded version. Most people doing a simple, flat table with 1 version of each cell don't care about the timestamps. They're only needed to pick the latest cell. If all timestamps in an HFile are the same then they will encode down to nothing.

        One possibility is to have an option "flattenTimestamps" where you grab t=currentTimeMillis() at the beginning of a flush and overwrite all timestamps with it. To support multiple versions of a cell, you could use t-1, t-2, etc (as long as they don't go all the way back to the previous hfile's timestamp).

        Show
        Matt Corgan added a comment - I've been brainstorming something similar as a follow-on to HBASE-4676 . The more similar timestamps you have in a block, the smaller the encoded version. Most people doing a simple, flat table with 1 version of each cell don't care about the timestamps. They're only needed to pick the latest cell. If all timestamps in an HFile are the same then they will encode down to nothing. One possibility is to have an option "flattenTimestamps" where you grab t=currentTimeMillis() at the beginning of a flush and overwrite all timestamps with it. To support multiple versions of a cell, you could use t-1, t-2, etc (as long as they don't go all the way back to the previous hfile's timestamp).
        Hide
        He Yongqiang added a comment -

        okay. Now i need to make it public on my lack sense of humor.

        Here is the real problem:
        In our use case, the space the data occupies really matter. We need to find all kind of things that we can do to bring down the size as much as possible. Apparently we do not want to bring in LZMA compression or bzip2 compression as they are really slow. In my simple test, a 41MB data can be reduced to 32MB after i rewrite the hbase Long timestamp to zero. The 8-bytes Long timestamp is heavy is because it is binary system timestamp which makes it very hard to compress (MemstoreTS is also a Long timestamp but there is no problem with it as it will be zero eventually). And if you look at how we are using that data, pretty much that data is not used by most applications if the data is system generated (not specified by applications). A good reason to make it configurable is some application may do specify it. In that case, pretty much you as hbase can not modify that data. But for a lot of other applications which do not care this data should not suffer this problem if data size really matter to them.
        I think this could benefit other community members as they may see this problem when they want to decrease the data size.

        Show
        He Yongqiang added a comment - okay. Now i need to make it public on my lack sense of humor. Here is the real problem: In our use case, the space the data occupies really matter. We need to find all kind of things that we can do to bring down the size as much as possible. Apparently we do not want to bring in LZMA compression or bzip2 compression as they are really slow. In my simple test, a 41MB data can be reduced to 32MB after i rewrite the hbase Long timestamp to zero. The 8-bytes Long timestamp is heavy is because it is binary system timestamp which makes it very hard to compress (MemstoreTS is also a Long timestamp but there is no problem with it as it will be zero eventually). And if you look at how we are using that data, pretty much that data is not used by most applications if the data is system generated (not specified by applications). A good reason to make it configurable is some application may do specify it. In that case, pretty much you as hbase can not modify that data. But for a lot of other applications which do not care this data should not suffer this problem if data size really matter to them. I think this could benefit other community members as they may see this problem when they want to decrease the data size.
        Hide
        stack added a comment -

        I didn't get the 'reference'. Sorry, that email went over my head.

        So are you referring this as conflicting with your 'hardcore production worthy platform' goal?

        First its not 'my' goal. Check out the notes from recent HBase PMC meeting: http://blogs.apache.org/hbase/.

        That said, I thought it a different kinda 'researchy' that was being referred to. I like the 'research' you fellas are at.

        (Pardon me. I did not read the name on the issue before responding. All I saw was the short description asking for an 'odd', little-substantiated behavior and I asked a question. Even on your comeback, I failed to check the whom and was innocently reacting to what I thought was a request that hbase become a dumping ground for plugins and research)

        Show
        stack added a comment - I didn't get the 'reference'. Sorry, that email went over my head. So are you referring this as conflicting with your 'hardcore production worthy platform' goal? First its not 'my' goal. Check out the notes from recent HBase PMC meeting: http://blogs.apache.org/hbase/ . That said, I thought it a different kinda 'researchy' that was being referred to. I like the 'research' you fellas are at. (Pardon me. I did not read the name on the issue before responding. All I saw was the short description asking for an 'odd', little-substantiated behavior and I asked a question. Even on your comeback, I failed to check the whom and was innocently reacting to what I thought was a request that hbase become a dumping ground for plugins and research)
        Hide
        He Yongqiang added a comment -

        I use the term 'researchy' as it is mentioned so in one email thread. refer to http://osdir.com/ml/general/2012-03/msg52707.html I have no idea how this term come up.

        The most of us working on hbase are trying to make it an hardcore production worthy platform. 'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective.

        So are you referring this as conflicting with your 'hardcore production worthy platform' goal?

        Show
        He Yongqiang added a comment - I use the term 'researchy' as it is mentioned so in one email thread. refer to http://osdir.com/ml/general/2012-03/msg52707.html I have no idea how this term come up. The most of us working on hbase are trying to make it an hardcore production worthy platform. 'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective. So are you referring this as conflicting with your 'hardcore production worthy platform' goal?
        Hide
        stack added a comment -

        I hope this can be done in open source hbase, and can be pluggable.

        Can you do your research w/o requiring that your 'researchy' code be committed to core. The most of us working on hbase are trying to make it an hardcore production worthy platform. 'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective.

        But maybe the research aligns with where hbase is trying to go. Whats your research on?

        Thanks.

        Show
        stack added a comment - I hope this can be done in open source hbase, and can be pluggable. Can you do your research w/o requiring that your 'researchy' code be committed to core. The most of us working on hbase are trying to make it an hardcore production worthy platform. 'Pluggable' and 'research', at least on first blush, sound like distractions from the project objective. But maybe the research aligns with where hbase is trying to go. Whats your research on? Thanks.
        Hide
        He Yongqiang added a comment -

        For whom?

        For our 'researchy' project...

        Can you not just have your client specify timestamp of 0?

        I hope this can be done in open source hbase, and can be pluggable.

        Show
        He Yongqiang added a comment - For whom? For our 'researchy' project... Can you not just have your client specify timestamp of 0? I hope this can be done in open source hbase, and can be pluggable.
        Hide
        stack added a comment -

        A millisecond timestamp is too heavy to carry.

        For whom?

        Can you not just have your client specify timestamp of 0?

        Show
        stack added a comment - A millisecond timestamp is too heavy to carry. For whom? Can you not just have your client specify timestamp of 0?

          People

          • Assignee:
            He Yongqiang
            Reporter:
            He Yongqiang
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development