HBase
  1. HBase
  2. HBASE-8701

distributedLogReplay need to apply wal edits in the receiving order of those edits

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.99.0
    • Component/s: MTTR
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This issue happens in distributedLogReplay mode when recovering multiple puts of the same key + version(timestamp). After replay, the value is nondeterministic of the key

      The original concern situation raised from Elliott Clark:

      For all edits the rowkey is the same.
      There's a log with: [ A (ts = 0), B (ts = 0) ]
      Replay the first half of the log.
      A user puts in C (ts = 0)
      Memstore has to flush
      A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
      Replay the rest of the Log.
      Flush

      The issue will happen in similar situation like Put(key, t=T) in WAL1 and Put(key,t=T) in WAL2

      Below is the option(proposed by Ted) I'd like to use:

      a) During replay, we pass original wal sequence number of each edit to the receiving RS
      b) In receiving RS, we store negative original sequence number of wal edits into mvcc field of KVs of wal edits
      c) Add handling of negative MVCC in KVScannerComparator and KVComparator
      d) In receiving RS, write original sequence number into an optional field of wal file for chained RS failure situation
      e) When opening a region, we add a safety bumper(a large number) in order for the new sequence number of a newly opened region not to collide with old sequence numbers.

      In the future, when we stores sequence number along with KVs, we can adjust the above solution a little bit by avoiding to overload MVCC field.

      The other alternative options are listed below for references:

      Option one
      a) disallow writes during recovery
      b) during replay, we pass original wal sequence ids
      c) hold flush till all wals of a recovering region are replayed. Memstore should hold because we only recover unflushed wal edits. For edits with same key + version, whichever with larger sequence Id wins.

      Option two
      a) During replay, we pass original wal sequence ids
      b) for each wal edit, we store each edit's original sequence id along with its key.
      c) during scanning, we use the original sequence id if it's present otherwise its store file sequence Id
      d) compaction can just leave put with max sequence id

      Please let me know if you have better ideas.

      1. 8701-v3.txt
        9 kB
        Ted Yu
      2. hbase-8701-v4.patch
        33 kB
        Jeffrey Zhong
      3. hbase-8701-v5.patch
        33 kB
        Jeffrey Zhong
      4. hbase-8701-v6.patch
        37 kB
        Jeffrey Zhong
      5. hbase-8701-v7.patch
        42 kB
        Jeffrey Zhong
      6. hbase-8701-v8.patch
        47 kB
        Jeffrey Zhong
      7. hbase-8701-tag.patch
        26 kB
        Jeffrey Zhong
      8. hbase-8701-tag-v1.patch
        27 kB
        Jeffrey Zhong
      9. hbase-8701-tag-v2.patch
        44 kB
        Jeffrey Zhong
      10. hbase-8701-tag-v2-update.patch
        44 kB
        Jeffrey Zhong

        Issue Links

          Activity

          Hide
          Enis Soztutar added a comment -

          I've put the following comment on the parent jira HBASE-7006. I guess the discussions should continue here, so I am repeating my comments:
          Here is a proposed scheme that can solve this problem:
          The region will be opened for replaying as in previous. The normal writes go to the memstore, and the memstore is flushed as usual. The region servers who are reading the WAL and sending to the replaying RS will still be the same, except for the fact that the edits are sent with their seq_ids.
          On the replaying RS, for all regions that are in replaying state, there is a single buffer. All edits are appended to this buffer without any sorting. This buffer can be accounted as a memstore, and it will have the memstore flush size as max size. Once this is reached, or due to global memstore pressure, we are asked to flush, we do spill this to disk after sorting. This buffer keeps <kv,seq> pairs, and sorts according to <kv,seq>. If there is not memory pressure, and buffer does not fill up, we don't need to spill to disk.
          Once the replaying is finished, and master asks the region server to open the region for reading, then we do a final merge sort for the in-memory sorted buffer, and all on-disk spilled buffers and create an hfile, discarding kv's that have the same kv, but smaller seq_id. This file will be a single hfile that corresponds to a flush. This hfile will have a seq_id that is obtained from the wal edits. Then we add this hfile to the store, and open the region as usual. This kind of keeping an unsorted buffer, and sorting it with qsort with spills and final on-disk merge sort might even be faster, since otherwise, we would be doing an insertion to the memstore, which becomes an insertion sort.
          The other thing we need to change is that replayed edits will not go into the wal again, so we keep track of recovering state for the region server, and re-do the work if there is a subsequent failure.
          In sort, this will be close to the BigTable's in-memory sort for each WAL file approach, but instead we gather the edits for the region from all WAL files by doing the replay RPC, and do the sort per region. End result, we create a flushed hfile, as if the region just flushed before the crash.

          Show
          Enis Soztutar added a comment - I've put the following comment on the parent jira HBASE-7006 . I guess the discussions should continue here, so I am repeating my comments: Here is a proposed scheme that can solve this problem: The region will be opened for replaying as in previous. The normal writes go to the memstore, and the memstore is flushed as usual. The region servers who are reading the WAL and sending to the replaying RS will still be the same, except for the fact that the edits are sent with their seq_ids. On the replaying RS, for all regions that are in replaying state, there is a single buffer. All edits are appended to this buffer without any sorting. This buffer can be accounted as a memstore, and it will have the memstore flush size as max size. Once this is reached, or due to global memstore pressure, we are asked to flush, we do spill this to disk after sorting. This buffer keeps <kv,seq> pairs, and sorts according to <kv,seq>. If there is not memory pressure, and buffer does not fill up, we don't need to spill to disk. Once the replaying is finished, and master asks the region server to open the region for reading, then we do a final merge sort for the in-memory sorted buffer, and all on-disk spilled buffers and create an hfile, discarding kv's that have the same kv, but smaller seq_id. This file will be a single hfile that corresponds to a flush. This hfile will have a seq_id that is obtained from the wal edits. Then we add this hfile to the store, and open the region as usual. This kind of keeping an unsorted buffer, and sorting it with qsort with spills and final on-disk merge sort might even be faster, since otherwise, we would be doing an insertion to the memstore, which becomes an insertion sort. The other thing we need to change is that replayed edits will not go into the wal again, so we keep track of recovering state for the region server, and re-do the work if there is a subsequent failure. In sort, this will be close to the BigTable's in-memory sort for each WAL file approach, but instead we gather the edits for the region from all WAL files by doing the replay RPC, and do the sort per region. End result, we create a flushed hfile, as if the region just flushed before the crash.
          Hide
          Jeffrey Zhong added a comment -

          Thanks Enis for the proposed option.

          I firstly want to call the explicit version update as "time travel updates" because you can set something in the future or the past.

          Below is another option for the "time travel updates" issue:

          During replay we tag each KV with original sequence Id(may leverage the future cell tag). During compaction, we remove this tag totally and keep the kv with highest sequence Id. In scan code, we need to use the sequenceId tag if presents otherwise store file sequence Id value.

          Overall, after compaction there is no storage overhead because the extra sequence Id is removed. The only complexity added is to change the scan&compaction code a little bit to use the sequenceId tag.

          Show
          Jeffrey Zhong added a comment - Thanks Enis for the proposed option. I firstly want to call the explicit version update as "time travel updates" because you can set something in the future or the past. Below is another option for the "time travel updates" issue: During replay we tag each KV with original sequence Id(may leverage the future cell tag). During compaction, we remove this tag totally and keep the kv with highest sequence Id. In scan code, we need to use the sequenceId tag if presents otherwise store file sequence Id value. Overall, after compaction there is no storage overhead because the extra sequence Id is removed. The only complexity added is to change the scan&compaction code a little bit to use the sequenceId tag.
          Hide
          stack added a comment -

          Elliott and I had a chat and came up w/ something similar to the Enis proposal. Let me write it out.

          1. Add new API to RS called replay. Method would take a WALEdit or even a list of WALEdits which I think has seqid and target region in it.
          2. Replay all WAL edits into an in-memory recovered.edits file: i.e. a datastructure that orders edits by sequenceid. If it has to spill because of memory pressure, that is ok; we would write out this ordered-by-id "recovered.edits" file.
          3. On notification that all WALs have been replayed, play the ordered-by-id "recovered.edits" into a memstore that is NOT the per-column-family memstore we use in normal operation (If we flushed 'recovered.edits', they will need to be merged w/ what we have in memory doing the replay). This 'other' memstore we call the replay-memstore (RMS). The replay skips the WAL. If the RMS has to flush during replay, that is fine. The 'replayer' will provide the seqid to write the hfile out with (this could be messy). The seqid will be one gotten from the WALEdit, not from currently hosting RS.
          4. Flush out the RMS using the seqid of the last edit.
          5. Flip the region to take reads.

          Elliott then went on to remark that what we want is mapreduce; our task would group-by-region, then sort by sequenceid, and finally write out an hfile (with retries and recovery, etc.)

          I like Enis's putting off sort till last moment.

          Chatting more w/ Elliott, if we let go of the requirement that we respect insert order when two edits have same coordinates, it certainly would make a lot of ops easier (this replay, compactions, etc.). Was suggested too that this would be a good topic for the hackathon on weds if you fellas are going to come: http://www.meetup.com/hackathon/events/123403802/

          Show
          stack added a comment - Elliott and I had a chat and came up w/ something similar to the Enis proposal. Let me write it out. Add new API to RS called replay. Method would take a WALEdit or even a list of WALEdits which I think has seqid and target region in it. Replay all WAL edits into an in-memory recovered.edits file: i.e. a datastructure that orders edits by sequenceid. If it has to spill because of memory pressure, that is ok; we would write out this ordered-by-id "recovered.edits" file. On notification that all WALs have been replayed, play the ordered-by-id "recovered.edits" into a memstore that is NOT the per-column-family memstore we use in normal operation (If we flushed 'recovered.edits', they will need to be merged w/ what we have in memory doing the replay). This 'other' memstore we call the replay-memstore (RMS). The replay skips the WAL. If the RMS has to flush during replay, that is fine. The 'replayer' will provide the seqid to write the hfile out with (this could be messy). The seqid will be one gotten from the WALEdit, not from currently hosting RS. Flush out the RMS using the seqid of the last edit. Flip the region to take reads. Elliott then went on to remark that what we want is mapreduce; our task would group-by-region, then sort by sequenceid, and finally write out an hfile (with retries and recovery, etc.) I like Enis's putting off sort till last moment. Chatting more w/ Elliott, if we let go of the requirement that we respect insert order when two edits have same coordinates, it certainly would make a lot of ops easier (this replay, compactions, etc.). Was suggested too that this would be a good topic for the hackathon on weds if you fellas are going to come: http://www.meetup.com/hackathon/events/123403802/
          Hide
          Jeffrey Zhong added a comment -

          Yeah, we'll come to the hackathon in the afternoon to talk about this. Thanks.

          Show
          Jeffrey Zhong added a comment - Yeah, we'll come to the hackathon in the afternoon to talk about this. Thanks.
          Hide
          Lars Hofhansl added a comment -

          Coming late to this... It seems to me that we have edits with the same timestamps in different WAL files this can only happen when the client explicitly set the timestamps. Now, if a client does set the TS of two edits to the same timestamps it should not care about their relative ordering. So I'd say: Let's just punt on solving this problem.

          Show
          Lars Hofhansl added a comment - Coming late to this... It seems to me that we have edits with the same timestamps in different WAL files this can only happen when the client explicitly set the timestamps. Now, if a client does set the TS of two edits to the same timestamps it should not care about their relative ordering. So I'd say: Let's just punt on solving this problem.
          Hide
          Sergey Shelukhin added a comment -

          We had also had a discussion here, about both these and other schemes. We are in consensus with the above wrt "streaming recovered edits", so let me provide a short brain dump for the other approach
          On worker servers, we could rewrite HLogs into HFiles (in the simplest case, each worker writes one separate HFile per region, but bear with me... solution for too many HFile-s is described below). There's little new code involved, just use WAL reader, memstore, and file writer. We might want to make slight changes to memstore for this - given that it will be write-only and single-threaded, for example, ConcurrentSkipList is not the best ds; etc. That should be trivial, as long as the new ds supports the same interface...
          Each worker separately flushes all these memstores, commits the new HFile-s into target regions (seqnums don't require any special handling because they come from one, or several contiguous (see below), HLogs). Then it can nuke the processed HLogs; i.e. the recovery is incremental, which is good.
          Each worker processes one HLog file in simple case; if there are more HLogs than servers doing the recovery, assign several contiguous HLog-s per server (e.g. 30 HLogs, replay on 9 servers => 3-4 per server).
          After all workers complete region is ready to accept reads and writes before any merge; the merge of the HFile-s will be done via compaction of all these small files. We can also open region for writing during recovery with very little special handling, as soon as all HFiles are added, store will just have to load them. We may have to disallow compactions, but that is true for every scheme, btw we should handle it in this JIRA anyway.
          Moreover, as we rewrite the HLog as HFile(s), we can put one replica of new HFile onto the target server, so that "replay to other server" RPC cost is baked into normal HDFS replication cost.

          Now, the biggest problem with this scheme is how many HFile-s it can generate - up to # regions * min(# servers, # of WALs). To solve that, it is possible to create indexed MultiStoreFile, expanding the existing HalfStoreFile, and share it between multiple regions. That way, the /total/ number of HFile-s produced is bounded by min(# servers, # of WALs). This is slightly more involved, but again, region can be immediately opened for normal operation and files split via compaction, just like for region splits.
          Locality is lost for some regions if we do that; if the worker who does the rewriting HLog to HFile knows some metadata of the WAL, its size etc., however, it can make a decision that trades off locality vs number of files (e.g. produce one MultiStoreFile per multiple regions on 3 servers and place replicas there, or whatever).

          I like this scheme in particular because it requires very little special new logic for correctness, or new code/new file formats/etc. Enis only likes this scheme with one MultiStoreFile per WAL as far as I understand (correct me if I'm wrong )

          Show
          Sergey Shelukhin added a comment - We had also had a discussion here, about both these and other schemes. We are in consensus with the above wrt "streaming recovered edits", so let me provide a short brain dump for the other approach On worker servers, we could rewrite HLogs into HFiles (in the simplest case, each worker writes one separate HFile per region, but bear with me... solution for too many HFile-s is described below). There's little new code involved, just use WAL reader, memstore, and file writer. We might want to make slight changes to memstore for this - given that it will be write-only and single-threaded, for example, ConcurrentSkipList is not the best ds; etc. That should be trivial, as long as the new ds supports the same interface... Each worker separately flushes all these memstores, commits the new HFile-s into target regions (seqnums don't require any special handling because they come from one, or several contiguous (see below), HLogs). Then it can nuke the processed HLogs; i.e. the recovery is incremental, which is good. Each worker processes one HLog file in simple case; if there are more HLogs than servers doing the recovery, assign several contiguous HLog-s per server (e.g. 30 HLogs, replay on 9 servers => 3-4 per server). After all workers complete region is ready to accept reads and writes before any merge; the merge of the HFile-s will be done via compaction of all these small files. We can also open region for writing during recovery with very little special handling, as soon as all HFiles are added, store will just have to load them. We may have to disallow compactions, but that is true for every scheme, btw we should handle it in this JIRA anyway. Moreover, as we rewrite the HLog as HFile(s), we can put one replica of new HFile onto the target server, so that "replay to other server" RPC cost is baked into normal HDFS replication cost. Now, the biggest problem with this scheme is how many HFile-s it can generate - up to # regions * min(# servers, # of WALs). To solve that, it is possible to create indexed MultiStoreFile, expanding the existing HalfStoreFile, and share it between multiple regions. That way, the /total/ number of HFile-s produced is bounded by min(# servers, # of WALs). This is slightly more involved, but again, region can be immediately opened for normal operation and files split via compaction, just like for region splits. Locality is lost for some regions if we do that; if the worker who does the rewriting HLog to HFile knows some metadata of the WAL, its size etc., however, it can make a decision that trades off locality vs number of files (e.g. produce one MultiStoreFile per multiple regions on 3 servers and place replicas there, or whatever). I like this scheme in particular because it requires very little special new logic for correctness, or new code/new file formats/etc. Enis only likes this scheme with one MultiStoreFile per WAL as far as I understand (correct me if I'm wrong )
          Hide
          Sergey Shelukhin added a comment -

          Lars Hofhansl same TS can also result from system clock issues (and these will happen somewhere on any large cluster; however these can also cause TS-s to reverse, so we can probably ignore this case); then, one no-TS put + one explicit TS put where explicit TS happens to be a realistic time value that is the same as the implicit one. It might theoretically also happen if HBase can do two puts within one TS value from the same client sequentially... but yeah the defined-ness of the behavior prevents many optimizations. It is indeed spelled out in the book now though.

          Show
          Sergey Shelukhin added a comment - Lars Hofhansl same TS can also result from system clock issues (and these will happen somewhere on any large cluster; however these can also cause TS-s to reverse, so we can probably ignore this case); then, one no-TS put + one explicit TS put where explicit TS happens to be a realistic time value that is the same as the implicit one. It might theoretically also happen if HBase can do two puts within one TS value from the same client sequentially... but yeah the defined-ness of the behavior prevents many optimizations. It is indeed spelled out in the book now though.
          Hide
          Elliott Clark added a comment -

          It seems to me that we have edits with the same timestamps in different WAL files this can only happen when the client explicitly set the timestamps.

          Until we have a full Multi-Wal implementation, which is something that's definitely planned.

          Show
          Elliott Clark added a comment - It seems to me that we have edits with the same timestamps in different WAL files this can only happen when the client explicitly set the timestamps. Until we have a full Multi-Wal implementation, which is something that's definitely planned.
          Hide
          Lars Hofhansl added a comment -

          Sergey Shelukhin I thought we were talking about version of the same KV. So clock skew between servers could have an effect if one region server dies, another takes over, and then dies, and then only if the clock skew is > the time it takes the first region to be recovered, right?

          Elliott Clark Yep. Even then I would posit that if the TS is the same the two event happened at the same time (as far as HBase is converned) and we should not need to go to length to maintain any ordering.

          Let's all talk at the Hackathon.

          Show
          Lars Hofhansl added a comment - Sergey Shelukhin I thought we were talking about version of the same KV. So clock skew between servers could have an effect if one region server dies, another takes over, and then dies, and then only if the clock skew is > the time it takes the first region to be recovered, right? Elliott Clark Yep. Even then I would posit that if the TS is the same the two event happened at the same time (as far as HBase is converned) and we should not need to go to length to maintain any ordering. Let's all talk at the Hackathon.
          Hide
          Sergey Shelukhin added a comment -

          The clock can move on the same server if motherboard clock is faulty, at least I've seen it happen in the past. But yeah about this case we don't have to worry (as it's impossible to handle with time-based timestamps anyway); implicit and explicit timestamps can still collide. If we can get rid of this requirement it would be nice, this was already discussed in some compaction jira though.
          If we don't drop it, HBASE-8709-like seqNum management can allow us to strictly sort these though (that minus the out-of-order compactions)

          Show
          Sergey Shelukhin added a comment - The clock can move on the same server if motherboard clock is faulty, at least I've seen it happen in the past. But yeah about this case we don't have to worry (as it's impossible to handle with time-based timestamps anyway); implicit and explicit timestamps can still collide. If we can get rid of this requirement it would be nice, this was already discussed in some compaction jira though. If we don't drop it, HBASE-8709 -like seqNum management can allow us to strictly sort these though (that minus the out-of-order compactions)
          Hide
          stack added a comment -

          On the 'other approach', I like the incremental aspect – being able to remove HLogs as we go.

          A MultiStoreFile would have edits from one WAL file for all regions in the WAL? Which region would it live in and how would it get cleaned up? (When all references had been dropped?) We'd have to write 'reference' files into each region that pointed back to a range on this WAL? Wouldn't we be making near as many NN operations as for the case where we wrote out an hfile per region?

          I think this multistorefile notion too complex.

          We could keep hfiles per region in memory and not write them until we had too but then we lose the incremental benefit and we start to arrive at the Enis/Elliott scheme?

          On another note, I was thinking we could enable distributed replay now as the default if we turned off bringing the region online for writes but I realize now that we cannot enable distributed replay until we fix this problem; so ignore my request in the parent issue that asks that we turn it on.

          On allowing KVs with the same coordinates returning sometimes in insert order and post distributed replay, possibly returning in a different order (if we give up respecting seqid on WAL split), I am not sure we should; it would mean we could not overwrite an existing KV definitively to remove it from the db. Do we think this a big deal (rare yes, but perhaps a facility we need to retain?)?

          I commented over in hbase-8709. Chatting w/ LarsH this evening he raised having seqid in the KV as is suggested there. This has come up in a few context's. If it was seqid rather than mvcc it would be of use replaying (though this seqid is out in the WAL's WALEdit file so would be redundnant having it in kv). Replaying though, memstore would have to consider the seqid' when sorting. We would not collapse versions of same coordinates during replay in the memstore (as we currently do in memstore); we could only do the collapsing after all WALs had been replayed and we are flushing (This latter fact would mean the replay memstore would be different to our current memstore).

          Show
          stack added a comment - On the 'other approach', I like the incremental aspect – being able to remove HLogs as we go. A MultiStoreFile would have edits from one WAL file for all regions in the WAL? Which region would it live in and how would it get cleaned up? (When all references had been dropped?) We'd have to write 'reference' files into each region that pointed back to a range on this WAL? Wouldn't we be making near as many NN operations as for the case where we wrote out an hfile per region? I think this multistorefile notion too complex. We could keep hfiles per region in memory and not write them until we had too but then we lose the incremental benefit and we start to arrive at the Enis/Elliott scheme? On another note, I was thinking we could enable distributed replay now as the default if we turned off bringing the region online for writes but I realize now that we cannot enable distributed replay until we fix this problem; so ignore my request in the parent issue that asks that we turn it on. On allowing KVs with the same coordinates returning sometimes in insert order and post distributed replay, possibly returning in a different order (if we give up respecting seqid on WAL split), I am not sure we should; it would mean we could not overwrite an existing KV definitively to remove it from the db. Do we think this a big deal (rare yes, but perhaps a facility we need to retain?)? I commented over in hbase-8709. Chatting w/ LarsH this evening he raised having seqid in the KV as is suggested there. This has come up in a few context's. If it was seqid rather than mvcc it would be of use replaying (though this seqid is out in the WAL's WALEdit file so would be redundnant having it in kv). Replaying though, memstore would have to consider the seqid' when sorting. We would not collapse versions of same coordinates during replay in the memstore (as we currently do in memstore); we could only do the collapsing after all WALs had been replayed and we are flushing (This latter fact would mean the replay memstore would be different to our current memstore).
          Hide
          Ted Yu added a comment -

          I am in favor of associating seqid along with KeyValue.

          Replaying though, memstore would have to consider the seqid' when sorting.

          Looks like a new type of KVComparator should be created that considers seqid's, depending on how seqid is stored.

          Storing seqid along with KeyValue implies bumping minor version of HFile. If we use the upcoming call tag facility, minor version of HFile needs to be bumped only once.

          the replay memstore would be different to our current memstore

          +1. This is the replay-memstore (RMS) mentioned above.

          For performance consideration, we need to balance the frequency of flushing w.r.t. normal memstore and RMS. Looks like RMS should be flushed less frequently (compared to flushing of normal memstore) when global heap pressure gets to some threshold. Meaning, we may need to throttle the concurrent writes coming into normal memstore.

          Show
          Ted Yu added a comment - I am in favor of associating seqid along with KeyValue. Replaying though, memstore would have to consider the seqid' when sorting. Looks like a new type of KVComparator should be created that considers seqid's, depending on how seqid is stored. Storing seqid along with KeyValue implies bumping minor version of HFile. If we use the upcoming call tag facility, minor version of HFile needs to be bumped only once. the replay memstore would be different to our current memstore +1. This is the replay-memstore (RMS) mentioned above. For performance consideration, we need to balance the frequency of flushing w.r.t. normal memstore and RMS. Looks like RMS should be flushed less frequently (compared to flushing of normal memstore) when global heap pressure gets to some threshold. Meaning, we may need to throttle the concurrent writes coming into normal memstore.
          Hide
          Sergey Shelukhin added a comment -

          A MultiStoreFile would have edits from one WAL file for all regions in the WAL? Which region would it live in and how would it get cleaned up? (When all references had been dropped?) We'd have to write 'reference' files into each region that pointed back to a range on this WAL? Wouldn't we be making near as many NN operations as for the case where we wrote out an hfile per region?

          It will have to use some form of links and references, and will have edits from multiple stores. Something like that is described in BigTable paper actually

          I think this multistorefile notion too complex.

          We could keep hfiles per region in memory and not write them until we had too but then we lose the incremental benefit and we start to arrive at the Enis/Elliott scheme?

          Why more complex? Should be pretty simple, we already have something like that.
          Incremental benefit is auxiliary; main benefit in my view is precisely that we don't have additional in-memory/file structures/separate memstores in the same store that need to be reconciled/communication channels.

          Agree on having seqIds for each KV... it will increase file size which is not a huge deal probably, but will solve many problems. Should we do it now while we are still before singularity?

          Show
          Sergey Shelukhin added a comment - A MultiStoreFile would have edits from one WAL file for all regions in the WAL? Which region would it live in and how would it get cleaned up? (When all references had been dropped?) We'd have to write 'reference' files into each region that pointed back to a range on this WAL? Wouldn't we be making near as many NN operations as for the case where we wrote out an hfile per region? It will have to use some form of links and references, and will have edits from multiple stores. Something like that is described in BigTable paper actually I think this multistorefile notion too complex. We could keep hfiles per region in memory and not write them until we had too but then we lose the incremental benefit and we start to arrive at the Enis/Elliott scheme? Why more complex? Should be pretty simple, we already have something like that. Incremental benefit is auxiliary; main benefit in my view is precisely that we don't have additional in-memory/file structures/separate memstores in the same store that need to be reconciled/communication channels. Agree on having seqIds for each KV... it will increase file size which is not a huge deal probably, but will solve many problems. Should we do it now while we are still before singularity?
          Hide
          stack added a comment -

          It will have to use some form of links and references, and will have edits from multiple stores. Something like that is described in BigTable paper actually

          Where? Thanks.

          Why more complex?

          Because I do not understand how it works. I give an outline above of my understanding and going by it, I conclude it too complex. Help me understand better. Thanks.

          Should be pretty simple, we already have something like that.

          Where?

          We'd associate seqid w/ kvs throughtout the system so we could do distributed log replay? Associating seqid w/ keyvalue is a radical reworking of internals that has been punted on in the past because the size of the work involved was thought too large; the base KV would have to change as would how we carry edits in memstore, our policy incrementing/upserting when thousands received a second, and we'd then have to redo how we currently persist seqid in hfiles on way in and out. If a seqid, do we need a mvcc or should they be related and if not, how should they be related? And so on.

          Show
          stack added a comment - It will have to use some form of links and references, and will have edits from multiple stores. Something like that is described in BigTable paper actually Where? Thanks. Why more complex? Because I do not understand how it works. I give an outline above of my understanding and going by it, I conclude it too complex. Help me understand better. Thanks. Should be pretty simple, we already have something like that. Where? We'd associate seqid w/ kvs throughtout the system so we could do distributed log replay? Associating seqid w/ keyvalue is a radical reworking of internals that has been punted on in the past because the size of the work involved was thought too large; the base KV would have to change as would how we carry edits in memstore, our policy incrementing/upserting when thousands received a second, and we'd then have to redo how we currently persist seqid in hfiles on way in and out. If a seqid, do we need a mvcc or should they be related and if not, how should they be related? And so on.
          Hide
          Sergey Shelukhin added a comment - - edited

          Where? Thanks.

          page 8: "We avoid duplicating log reads by first sort- ing the commit log entries in order of the keys ⟨table, row name, log sequence number⟩. In the sorted output, all mutations for a particular tablet are contiguous and can therefore be read efficiently with one disk seek followed by a sequential read. To parallelize the sorting, we partition the log file into 64 MB seg- ments, and sort each segment in parallel on different tablet servers."

          Not clear if these outputs are ready to load or need to be replayed, but it should be ok to do the former.

          Because I do not understand how it works. I give an outline above of my understanding and going by it, I conclude it too complex. Help me understand better. Thanks.

          Where?

          HalfStoreFileReader.
          How does HalfStoreFileReader work right now - by having reference and splitkey.
          This thing can either have references (one per store), with multiple splitkey support; or, for a more involved solution w/o references, have (in the beginning, or tail) an index that points to precise locations to where each file's data starts. However in the latter case it's not clear where to store the file, as you said.

          We'd associate seqid w/ kvs throughtout the system so we could do distributed log replay?

          That would also allow things like picking files arbitrarily for compactions, for example.

          Associating seqid w/ keyvalue is a radical reworking of internals that has been punted on in the past because the size of the work involved was thought too large; the base KV would have to change as would how we carry edits in memstore, our policy incrementing/upserting when thousands received a second, and we'd then have to redo how we currently persist seqid in hfiles on way in and out. If a seqid, do we need a mvcc or should they be related and if not, how should they be related? And so on.

          I might be missing something... can you elaborate why? Right now we use seqid, taken from the file metadata, as a last-ditch conflict resolution mechanism (see KeyValueHeap::KVScannerComparator) after timestamps and stuff.
          We can do the same but take seqid from KV instead of the file. Granted, if resolving identical keys it's a pretty large number of bytes to store...
          The thing where memstore overwrites the value appears to be plain incorrect to me when VERSIONS is more than 1 (whether you will get a version or not depends on when the memstore flush happens), so that's an additional (and automatic?) advantage.
          We don't necessarily have to change anything else, what do you have in mind?

          Show
          Sergey Shelukhin added a comment - - edited Where? Thanks. page 8: "We avoid duplicating log reads by first sort- ing the commit log entries in order of the keys ⟨table, row name, log sequence number⟩. In the sorted output, all mutations for a particular tablet are contiguous and can therefore be read efficiently with one disk seek followed by a sequential read. To parallelize the sorting, we partition the log file into 64 MB seg- ments, and sort each segment in parallel on different tablet servers." Not clear if these outputs are ready to load or need to be replayed, but it should be ok to do the former. Because I do not understand how it works. I give an outline above of my understanding and going by it, I conclude it too complex. Help me understand better. Thanks. Where? HalfStoreFileReader. How does HalfStoreFileReader work right now - by having reference and splitkey. This thing can either have references (one per store), with multiple splitkey support; or, for a more involved solution w/o references, have (in the beginning, or tail) an index that points to precise locations to where each file's data starts. However in the latter case it's not clear where to store the file, as you said. We'd associate seqid w/ kvs throughtout the system so we could do distributed log replay? That would also allow things like picking files arbitrarily for compactions, for example. Associating seqid w/ keyvalue is a radical reworking of internals that has been punted on in the past because the size of the work involved was thought too large; the base KV would have to change as would how we carry edits in memstore, our policy incrementing/upserting when thousands received a second, and we'd then have to redo how we currently persist seqid in hfiles on way in and out. If a seqid, do we need a mvcc or should they be related and if not, how should they be related? And so on. I might be missing something... can you elaborate why? Right now we use seqid, taken from the file metadata, as a last-ditch conflict resolution mechanism (see KeyValueHeap::KVScannerComparator) after timestamps and stuff. We can do the same but take seqid from KV instead of the file. Granted, if resolving identical keys it's a pretty large number of bytes to store... The thing where memstore overwrites the value appears to be plain incorrect to me when VERSIONS is more than 1 (whether you will get a version or not depends on when the memstore flush happens), so that's an additional (and automatic?) advantage. We don't necessarily have to change anything else, what do you have in mind?
          Hide
          Sergey Shelukhin added a comment -

          *for resolving

          Show
          Sergey Shelukhin added a comment - *for resolving
          Hide
          stack added a comment -

          Thanks. I see what you are referring to now.

          The description sounds like a mapreduce job (as Elliott suggested) but in essence is what Enis proposes (sort edits including consideration of seqid and when done spill hfiles which we then load on region open).

          This thing can either have references (one per store), with multiple splitkey support; or, for a more involved solution w/o references, have (in the beginning, or tail) an index that points to precise locations to where each file's data starts. However in the latter case it's not clear where to store the file, as you said.

          Yeah sounds like a lot of file writing and then fragile links that we'd need to keep in order.

          Lets white board it on Weds.

          I might be missing something... can you elaborate why?

          I thought I had? On 0.96 restart, we'd read current hfiles and convert all KVs to KVversion2 as we read them (KVversion2 would include sequenceid). New files would be written w/ KVversion2. Memstores would have to change to factor in seqid (and would need to figure what to do on upsert, the overwrite of a memstore value....currently we let memstore values get overwritten when same coordinates... which is fine given our current semantic that two KVs at same coordinates do NOT count as different VERSIONS; we would have to change this...). There are probably a bunch of places where we presume the old KVversion1 serialization format that we would have to hunt out. This seems like a bunch of work to me?

          If we were going to go this route, we might as well get 'labels' into the the KV too. Might as well move over to using Cell interface.

          The thing where memstore overwrites the value appears to be plain incorrect to me when VERSIONS is more than 1 (whether you will get a version or not depends on when the memstore flush happens)...

          No. KVs at same VERSION are considered the same; you will only ever get the last one written.

          Show
          stack added a comment - Thanks. I see what you are referring to now. The description sounds like a mapreduce job (as Elliott suggested) but in essence is what Enis proposes (sort edits including consideration of seqid and when done spill hfiles which we then load on region open). This thing can either have references (one per store), with multiple splitkey support; or, for a more involved solution w/o references, have (in the beginning, or tail) an index that points to precise locations to where each file's data starts. However in the latter case it's not clear where to store the file, as you said. Yeah sounds like a lot of file writing and then fragile links that we'd need to keep in order. Lets white board it on Weds. I might be missing something... can you elaborate why? I thought I had? On 0.96 restart, we'd read current hfiles and convert all KVs to KVversion2 as we read them (KVversion2 would include sequenceid). New files would be written w/ KVversion2. Memstores would have to change to factor in seqid (and would need to figure what to do on upsert, the overwrite of a memstore value....currently we let memstore values get overwritten when same coordinates... which is fine given our current semantic that two KVs at same coordinates do NOT count as different VERSIONS; we would have to change this...). There are probably a bunch of places where we presume the old KVversion1 serialization format that we would have to hunt out. This seems like a bunch of work to me? If we were going to go this route, we might as well get 'labels' into the the KV too. Might as well move over to using Cell interface. The thing where memstore overwrites the value appears to be plain incorrect to me when VERSIONS is more than 1 (whether you will get a version or not depends on when the memstore flush happens)... No. KVs at same VERSION are considered the same; you will only ever get the last one written.
          Hide
          Ted Yu added a comment -

          I will post a patch Monday morning which does the following:

          HLogSplitter#LogReplayOutputSink#groupEditsByServer() would pass sequence Id as an attribue of the Put/Delete.
          The region server which does log replay would check for this attribute in the mutations. If present, the sequence Id would be negated and stored in memstoreTS field of the KeyValue.
          KeyValueHeap#KVScannerComparator#compare(KeyValueScanner left, KeyValueScanner right) would check for the presence of sequence Id in the KeyValue's retrieved by left.peek() and right.peek(). If sequence Id is present, its negated value would be used to sort KeyValue's accordingly.

          The above approach avoids creating intermediate files while at the same time incurs minimal change to HFile format.

          Show
          Ted Yu added a comment - I will post a patch Monday morning which does the following: HLogSplitter#LogReplayOutputSink#groupEditsByServer() would pass sequence Id as an attribue of the Put/Delete. The region server which does log replay would check for this attribute in the mutations. If present, the sequence Id would be negated and stored in memstoreTS field of the KeyValue. KeyValueHeap#KVScannerComparator#compare(KeyValueScanner left, KeyValueScanner right) would check for the presence of sequence Id in the KeyValue's retrieved by left.peek() and right.peek(). If sequence Id is present, its negated value would be used to sort KeyValue's accordingly. The above approach avoids creating intermediate files while at the same time incurs minimal change to HFile format.
          Hide
          Jeffrey Zhong added a comment -

          Thanks Ted Yu for the new proposal which looks very promising because it allows us to keep current hfile format and solve same version update ordering issue of this JIRA.

          I'll add remaining small changes(documented below) on top of your patch & a test case.

          Along storing negative original sequence numbers into mvcc field, we need to extend the wal so that wal keys of edits created by replay command store the original log sequence number to handle chained RS failure situation.
          In addition, in order to accept writes during recovering, we need to get the largest log sequence number from previous failed RS. There are several options to address that:

          1) add a large number to the max flushed sequenced number of store files of the failed region so that the new sequence number won't collide with old sequence value.(my favor option)
          For example, adding 200 millions on top of max store file sequence id:

          • it'd take 300+ years to overflow long integer assuming the same region recovers every second
          • it'd take 2+ days for a RS receives a change every millisecond and without a single flush

          2) reject puts with explicit timestamp input during recovery
          3) read through the last wal(may also the trailer of the second to last wal) to get the max sequence number. The disadvantage of this approach is recovery process is blocked till after reading the two possible wals. The recovery lease of the last wal may incur some time because it's most likely open when RS fails.

          Show
          Jeffrey Zhong added a comment - Thanks Ted Yu for the new proposal which looks very promising because it allows us to keep current hfile format and solve same version update ordering issue of this JIRA. I'll add remaining small changes(documented below) on top of your patch & a test case. Along storing negative original sequence numbers into mvcc field, we need to extend the wal so that wal keys of edits created by replay command store the original log sequence number to handle chained RS failure situation. In addition, in order to accept writes during recovering, we need to get the largest log sequence number from previous failed RS. There are several options to address that: 1) add a large number to the max flushed sequenced number of store files of the failed region so that the new sequence number won't collide with old sequence value.(my favor option) For example, adding 200 millions on top of max store file sequence id: it'd take 300+ years to overflow long integer assuming the same region recovers every second it'd take 2+ days for a RS receives a change every millisecond and without a single flush 2) reject puts with explicit timestamp input during recovery 3) read through the last wal(may also the trailer of the second to last wal) to get the max sequence number. The disadvantage of this approach is recovery process is blocked till after reading the two possible wals. The recovery lease of the last wal may incur some time because it's most likely open when RS fails.
          Hide
          Himanshu Vashishtha added a comment -

          Re option 1/3: In-fact, as part of 8741, I am working on a hybrid solution ( combining 1 and 3 options): read the trailer of the second last wal (which would contain the log sequenceID of the WAL at time of rolling, and would not suffer from recover lease problem you mentioned), and then add a delta (sufficient enough to avoid collision) while opening the regions of the dead server. I am using RegionInfo to pass this number while opening the region. The patch is still work-in-progress; will try to upload it soon.

          Show
          Himanshu Vashishtha added a comment - Re option 1/3: In-fact, as part of 8741, I am working on a hybrid solution ( combining 1 and 3 options): read the trailer of the second last wal (which would contain the log sequenceID of the WAL at time of rolling, and would not suffer from recover lease problem you mentioned), and then add a delta (sufficient enough to avoid collision) while opening the regions of the dead server. I am using RegionInfo to pass this number while opening the region. The patch is still work-in-progress; will try to upload it soon.
          Hide
          Ted Yu added a comment -

          Patch that illustrates the idea I described yesterday.

          Show
          Ted Yu added a comment - Patch that illustrates the idea I described yesterday.
          Hide
          Jeffrey Zhong added a comment -

          This v4 patch is built on top of Ted's v3 patch by adding wal support on chained failures, new sequence number of a just opened recovering region and a test case. It's a promising solution by far with minimum changes.

          Himanshu Vashishtha I used the simple safety bumper approach in the patch. If the approach of hbase-8741 runs well without degrading the performance, you can amend that part a little bit when you check in hbase-8741.

          Thanks.

          Show
          Jeffrey Zhong added a comment - This v4 patch is built on top of Ted's v3 patch by adding wal support on chained failures, new sequence number of a just opened recovering region and a test case. It's a promising solution by far with minimum changes. Himanshu Vashishtha I used the simple safety bumper approach in the patch. If the approach of hbase-8741 runs well without degrading the performance, you can amend that part a little bit when you check in hbase-8741. Thanks.
          Hide
          Ted Yu added a comment -
          +        /*
          +         * if(isInReplay) { walEdit. }
          +         */
          

          The above can be removed, right ?

          Show
          Ted Yu added a comment - + /* + * if (isInReplay) { walEdit. } + */ The above can be removed, right ?
          Hide
          Thomas Pan added a comment -

          Just want to share my thoughts out of HBase Hackathone last week. Though, I believe that people here have thorough thoughts on the implementation. This issue seems to be the exact distributed sort problem that well resolved by MapReduce platform. Sorted lists of WAL entry reach destination RS for final merge or replay. As the amount of data could be too much, the destination RS needs to use disk to store the data for final merge sort before being replayed. The final merge sort could be merged into normal compaction operations for optimal performance.

          Show
          Thomas Pan added a comment - Just want to share my thoughts out of HBase Hackathone last week. Though, I believe that people here have thorough thoughts on the implementation. This issue seems to be the exact distributed sort problem that well resolved by MapReduce platform. Sorted lists of WAL entry reach destination RS for final merge or replay. As the amount of data could be too much, the destination RS needs to use disk to store the data for final merge sort before being replayed. The final merge sort could be merged into normal compaction operations for optimal performance.
          Hide
          stack added a comment -

          Sometimes the mvcc number is a sequence number (a negative one!) and other times it is an mvcc. This hack is spread about the code base.

          On the below:

          + // in distributedLogReplay mode, we haven't read all wals so we don't know the last exact
          + // sequence number used by previous failed RS. Hence we introduce SEQNUM_SAFETY_BUMPER to add a
          + // large enough number to be sure that the new sequence number of the just opened region won't
          + // overlap with old sequence numbers.
          + // Using 200 million:
          + // 1) it'd take 300+ years to overflow long integer assuming the same region recovers every second
          + // 2) it'd take 2+ days for a RS receives a change every millisecond and without a single flush
          + static final long SEQNUM_SAFETY_BUMPER = 200 * 1024 * 1024; // 200 millions

          What does a flush do to the above? It does not effect sequence number right? It does not reset it.

          If a RS does 1k hits a second for two days, we are almost at 200million.

          The 200M here is meant to span all edits out in WAL logs?

          No explaination for why we set a -seqid into mvcc:

          • kv.setMemstoreTS(localizedWriteEntry.getWriteNumber());
            + kv.setMemstoreTS(seqId == NO_SEQ_ID ? localizedWriteEntry.getWriteNumber() :
            + -seqId);

          KeyValueHeap has this pollution. It goes negativing "seqid" w/o explaination. Yeah, this hack is spread all over code base.

          Why move recovering state setting from openregionhandler to HRegion?

          An HLogEdit doesn't have sequence number already? What is logSeqNum? What is relation to below?

          + // used in distributedLogReplay to store original log sequence number of an edit
          + private long origLogSeqNum;

          Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc?

          Show
          stack added a comment - Sometimes the mvcc number is a sequence number (a negative one!) and other times it is an mvcc. This hack is spread about the code base. On the below: + // in distributedLogReplay mode, we haven't read all wals so we don't know the last exact + // sequence number used by previous failed RS. Hence we introduce SEQNUM_SAFETY_BUMPER to add a + // large enough number to be sure that the new sequence number of the just opened region won't + // overlap with old sequence numbers. + // Using 200 million: + // 1) it'd take 300+ years to overflow long integer assuming the same region recovers every second + // 2) it'd take 2+ days for a RS receives a change every millisecond and without a single flush + static final long SEQNUM_SAFETY_BUMPER = 200 * 1024 * 1024; // 200 millions What does a flush do to the above? It does not effect sequence number right? It does not reset it. If a RS does 1k hits a second for two days, we are almost at 200million. The 200M here is meant to span all edits out in WAL logs? No explaination for why we set a -seqid into mvcc: kv.setMemstoreTS(localizedWriteEntry.getWriteNumber()); + kv.setMemstoreTS(seqId == NO_SEQ_ID ? localizedWriteEntry.getWriteNumber() : + -seqId); KeyValueHeap has this pollution. It goes negativing "seqid" w/o explaination. Yeah, this hack is spread all over code base. Why move recovering state setting from openregionhandler to HRegion? An HLogEdit doesn't have sequence number already? What is logSeqNum? What is relation to below? + // used in distributedLogReplay to store original log sequence number of an edit + private long origLogSeqNum; Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc?
          Hide
          Himanshu Vashishtha added a comment -

          Thanks for the patch, Ted and Jeffrey.

          Few comments:

          Once this is done, we have HFiles with positive and negative mvcc numbers?

          -        if (kv.getMemstoreTS() <= smallestReadPoint) {
          +        if (kv.getMemstoreTS() > 0 && kv.getMemstoreTS() <= smallestReadPoint) {
                     kv.setMemstoreTS(0);
                   }
          

          Now, when do we set it back to 0?

          Is it possible to have some doc on the scanner changes... Yeah, it needs scanner level change.
          Anyway, we were trying to fix the mttr, but now it looks we are changing code path that is used by the core.. scan/get. Could it have some performance implications?

          Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc?

          Yea, looks like there is no check for not flushing, etc while we are writing to the region. What happens if we are done with the replay, make the region up for read, but compaction has not kicked in yet.

          Re: "bumping up the sequence number"

          + nextSeqid += SEQNUM_SAFETY_BUMPER;

          Stack pointed out a case in 8741, where a region (with exceptionally high sequenceId) is opened to a region server, and that regionserver dies immediately after opening it. I think the bumping-up scheme will not be sufficient.

          Show
          Himanshu Vashishtha added a comment - Thanks for the patch, Ted and Jeffrey. Few comments: Once this is done, we have HFiles with positive and negative mvcc numbers? - if (kv.getMemstoreTS() <= smallestReadPoint) { + if (kv.getMemstoreTS() > 0 && kv.getMemstoreTS() <= smallestReadPoint) { kv.setMemstoreTS(0); } Now, when do we set it back to 0? Is it possible to have some doc on the scanner changes... Yeah, it needs scanner level change. Anyway, we were trying to fix the mttr, but now it looks we are changing code path that is used by the core.. scan/get. Could it have some performance implications? Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc? Yea, looks like there is no check for not flushing, etc while we are writing to the region. What happens if we are done with the replay, make the region up for read, but compaction has not kicked in yet. Re: "bumping up the sequence number" + nextSeqid += SEQNUM_SAFETY_BUMPER; Stack pointed out a case in 8741, where a region (with exceptionally high sequenceId) is opened to a region server, and that regionserver dies immediately after opening it. I think the bumping-up scheme will not be sufficient.
          Hide
          Jeffrey Zhong added a comment -

          Stack Thanks for the comments!

          In the v5 patch, I added a new verification step after flush.

          Sometimes the mvcc number is a sequence number (a negative one!) and other times it is an mvcc. This hack is spread about the code base.

          I have to admit that overloading mvcc number is a hack but it allows us without modifying hfile format by adding minimum changes(you can see Ted's v3 patch is only 9kb) to address the JIRA. I hope once the cell tag is in place we can clean the hack with trivial effort.

          The 200M here is meant to span all edits out in WAL logs?

          Yes. It's just a big enough number to make sure new sequence number won't collide with old log sequence number without reading wal files. A RS is extrem unlikely to have 200 million changes before a region flush because we flush a online region every hr by default and other logic to force a flush on regions with min sequence number when the number of log files reach certain limit.
          Beging said that, we can leave this out and till we have a consensus in hbase-8741.

          An HLogEdit doesn't have sequence number already? What is logSeqNum? What is relation to below?

          It's the original log sequence number when firstly replay a wal. Storing in waledit so that we can persistent the number into hlogkey of a wal Entry to handle the case when receiving RS fails again during a replay.

          Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc?

          It's possible due to we have sequence number along with the KV.

          Show
          Jeffrey Zhong added a comment - Stack Thanks for the comments! In the v5 patch, I added a new verification step after flush. Sometimes the mvcc number is a sequence number (a negative one!) and other times it is an mvcc. This hack is spread about the code base. I have to admit that overloading mvcc number is a hack but it allows us without modifying hfile format by adding minimum changes(you can see Ted's v3 patch is only 9kb) to address the JIRA. I hope once the cell tag is in place we can clean the hack with trivial effort. The 200M here is meant to span all edits out in WAL logs? Yes. It's just a big enough number to make sure new sequence number won't collide with old log sequence number without reading wal files. A RS is extrem unlikely to have 200 million changes before a region flush because we flush a online region every hr by default and other logic to force a flush on regions with min sequence number when the number of log files reach certain limit. Beging said that, we can leave this out and till we have a consensus in hbase-8741. An HLogEdit doesn't have sequence number already? What is logSeqNum? What is relation to below? It's the original log sequence number when firstly replay a wal. Storing in waledit so that we can persistent the number into hlogkey of a wal Entry to handle the case when receiving RS fails again during a replay. Chatting w/ Himanshu, he wondered if it is possible that a memstore get flushed w/ a negative mvcc? It's possible due to we have sequence number along with the KV.
          Hide
          Jeffrey Zhong added a comment -

          Could it have some performance implications?

          Should not because just adding several more checks.

          Yea, looks like there is no check for not flushing, etc while we are writing to the region. What happens if we are done with the replay, make the region up for read, but compaction has not kicked in yet.

          The changes in the scanner should cover the situation you mentioned. So far I didn't see any issue.

          Stack pointed out a case in 8741, where a region (with exceptionally high sequenceId) is opened to a region server, and that regionserver dies immediately after opening it. I think the bumping-up scheme will not be sufficient.

          I c. I can leave this part out till we have a consensus in 8741.

          Show
          Jeffrey Zhong added a comment - Could it have some performance implications? Should not because just adding several more checks. Yea, looks like there is no check for not flushing, etc while we are writing to the region. What happens if we are done with the replay, make the region up for read, but compaction has not kicked in yet. The changes in the scanner should cover the situation you mentioned. So far I didn't see any issue. Stack pointed out a case in 8741, where a region (with exceptionally high sequenceId) is opened to a region server, and that regionserver dies immediately after opening it. I think the bumping-up scheme will not be sufficient. I c. I can leave this part out till we have a consensus in 8741.
          Hide
          Enis Soztutar added a comment -

          Re option 1/3: In-fact, as part of 8741, I am working on a hybrid solution ( combining 1 and 3 options): read the trailer of the second last wal (which would contain the log sequenceID of the WAL at time of rolling, and would not suffer from recover lease problem you mentioned), and then add a delta (sufficient enough to avoid collision) while opening the regions of the dead server.

          We can put an upper bound on the number of KVEdits that a single log file can contain (64M / 64byte = 1M, lets put the limit 10M), which means that the log will block and has to roll before accepting any more updates. Then max(max(hfile seqIds),max(all WAL trailers)) + #WALs without trailer * KV limit per WAL will give us a safe sequenceId.

          Show
          Enis Soztutar added a comment - Re option 1/3: In-fact, as part of 8741, I am working on a hybrid solution ( combining 1 and 3 options): read the trailer of the second last wal (which would contain the log sequenceID of the WAL at time of rolling, and would not suffer from recover lease problem you mentioned), and then add a delta (sufficient enough to avoid collision) while opening the regions of the dead server. We can put an upper bound on the number of KVEdits that a single log file can contain (64M / 64byte = 1M, lets put the limit 10M), which means that the log will block and has to roll before accepting any more updates. Then max(max(hfile seqIds),max(all WAL trailers)) + #WALs without trailer * KV limit per WAL will give us a safe sequenceId.
          Hide
          stack added a comment -

          Jeffrey Zhong When you say '...but it allows us without modifying hfile format.', what you thinking? I'm not sure why we'd need to modify hfile format to accommodate replay.

          On the 200M, I like Enis's suggestion that we set an upper bound on KVs per file. It is a good idea for figuring a safe step in the sequenceids.

          It's the original log sequence number when firstly replay a wal. Storing in waledit so that we can persistent the number into hlogkey of a wal Entry to handle the case when receiving RS fails again during a replay.

          This seems like a hack on the hack. What do we do if the replay fails a second time? (once for original server crash, then a crash while replay, then again when replaying the replay?)

          It's possible due to we have sequence number along with the KV.

          So it IS possible to have an hfile w/ a negative sequence number? (We don't sort storefiles by sequenceid any more?) If so, that'll mess us up? Or the scan merge will accomodate?

          How do the replays work in memstore? If a negative mvcc is added first, then a positive (because the region is open for writes), then another negative comes in, what happens? Does the negative overwrite the positive at the same coordinates? Will we flush w/ a negative sequenceid though the file has postiives in it?

          Show
          stack added a comment - Jeffrey Zhong When you say '...but it allows us without modifying hfile format.', what you thinking? I'm not sure why we'd need to modify hfile format to accommodate replay. On the 200M, I like Enis's suggestion that we set an upper bound on KVs per file. It is a good idea for figuring a safe step in the sequenceids. It's the original log sequence number when firstly replay a wal. Storing in waledit so that we can persistent the number into hlogkey of a wal Entry to handle the case when receiving RS fails again during a replay. This seems like a hack on the hack. What do we do if the replay fails a second time? (once for original server crash, then a crash while replay, then again when replaying the replay?) It's possible due to we have sequence number along with the KV. So it IS possible to have an hfile w/ a negative sequence number? (We don't sort storefiles by sequenceid any more?) If so, that'll mess us up? Or the scan merge will accomodate? How do the replays work in memstore? If a negative mvcc is added first, then a positive (because the region is open for writes), then another negative comes in, what happens? Does the negative overwrite the positive at the same coordinates? Will we flush w/ a negative sequenceid though the file has postiives in it?
          Hide
          Jeffrey Zhong added a comment -

          This seems like a hack on the hack. What do we do if the replay fails a second time? (once for original server crash, then a crash while replay, then again when replaying the replay?)

          The original sequence number are used to store along with KV so that no matter how many RS failures we have during replay. The original sequence number will be used to determine the ordering.

          So it IS possible to have an hfile w/ a negative sequence number? (We don't sort storefiles by sequenceid any more?)

          No. We only touch mvcc field and hfile sequence id is intact. The negative mvcc value is only used to break the tie.(use the negative mvcc field if it's present instead of default hfile sequence id)

          How do the replays work in memstore?

          new writes with positive mvcc wins because replay writes have negative mvcc values. Once the memstore is flushed, the negative mvcc number is used only to replace the default hfile sequence id(if they're presents) to break tie. Since recovery doesn't allow reads till full recovery, it should be fine a hfile contains negative and positive mvcc values.

          I'll creat a test case to see if that's the case because you raises the good concern.

          Show
          Jeffrey Zhong added a comment - This seems like a hack on the hack. What do we do if the replay fails a second time? (once for original server crash, then a crash while replay, then again when replaying the replay?) The original sequence number are used to store along with KV so that no matter how many RS failures we have during replay. The original sequence number will be used to determine the ordering. So it IS possible to have an hfile w/ a negative sequence number? (We don't sort storefiles by sequenceid any more?) No. We only touch mvcc field and hfile sequence id is intact. The negative mvcc value is only used to break the tie.(use the negative mvcc field if it's present instead of default hfile sequence id) How do the replays work in memstore? new writes with positive mvcc wins because replay writes have negative mvcc values. Once the memstore is flushed, the negative mvcc number is used only to replace the default hfile sequence id(if they're presents) to break tie. Since recovery doesn't allow reads till full recovery, it should be fine a hfile contains negative and positive mvcc values. I'll creat a test case to see if that's the case because you raises the good concern.
          Hide
          Jeffrey Zhong added a comment -

          the original sequence number are used to store along with KV so that no matter how many RS failures we have during replay

          Basically I mean original sequence number value won't change even there is a chained RS failures during a recovery.

          Show
          Jeffrey Zhong added a comment - the original sequence number are used to store along with KV so that no matter how many RS failures we have during replay Basically I mean original sequence number value won't change even there is a chained RS failures during a recovery.
          Hide
          Jeffrey Zhong added a comment -

          v6 patch added a new test case which does same version updates recovery with writes during recovery process.

          This patch runs all unit tests well with & without distributedLogReplay turn on.

          For basic idea, please see Ted's v3 patch.

          Show
          Jeffrey Zhong added a comment - v6 patch added a new test case which does same version updates recovery with writes during recovery process. This patch runs all unit tests well with & without distributedLogReplay turn on. For basic idea, please see Ted's v3 patch.
          Hide
          stack added a comment -

          The original sequence number are used to store along with KV so that no matter how many RS failures we have during replay. The original sequence number will be used to determine the ordering.

          Sorry Jeffrey Zhong. I do not get why we have to have two sequenceids in an edit; the actual and the 'original'.

          We only touch mvcc field and hfile sequence id is intact.

          Doesn't mvcc make it out to hfile so that when we merge it w/ the memstore (because there was say, an ongoing scan at the time of the flush), that we still respect whatever the mvcc was at the time and not show clients' edits that are not meant for them?

          new writes with positive mvcc wins because replay writes have negative mvcc values.

          Do you mean 'replace' in the above?

          ....the negative mvcc number is used only to replace the default hfile sequence id(if they're presents) to break tie.

          The negative mvcc is out in the hfile occupying the sequenceid-for-the-hfile location?

          I think a test case would help.

          Show
          stack added a comment - The original sequence number are used to store along with KV so that no matter how many RS failures we have during replay. The original sequence number will be used to determine the ordering. Sorry Jeffrey Zhong . I do not get why we have to have two sequenceids in an edit; the actual and the 'original'. We only touch mvcc field and hfile sequence id is intact. Doesn't mvcc make it out to hfile so that when we merge it w/ the memstore (because there was say, an ongoing scan at the time of the flush), that we still respect whatever the mvcc was at the time and not show clients' edits that are not meant for them? new writes with positive mvcc wins because replay writes have negative mvcc values. Do you mean 'replace' in the above? ....the negative mvcc number is used only to replace the default hfile sequence id(if they're presents) to break tie. The negative mvcc is out in the hfile occupying the sequenceid-for-the-hfile location? I think a test case would help.
          Hide
          Himanshu Vashishtha added a comment -

          I have a follow up question on this -ve mvcc stuff.
          Current flusher resets the mvcc point to 0 if it is older than the minimum readpoint across all the scanners. There are two cases here:

          a) Region is under recovery mode: The minimum readpoint will be maxSequenceIds (obtained from the StoreFiles) + 1, as the region is not available for read yet. Thus, it will reset this -ve sequenceId to 0? Flushed hfiles under recovery will not be having -ve numbers?

          Let's say we changed this and keep the -ve numbers intact.

          b) Recovery is completed, and region is available for read. There might be some scanners open and we would now have some legit min readpoint.
          i) I see that as part of optimization, we set memstoreTS to 0 in case it is < MVCC.readpoint (even while simple scan).
          Do we need to remove that optimization now?

          ii) How we handle the Deletes now. I see SQM comparing memstoreTS with mvccReadPoint at some places (see the match method). Especially settings where it wants to seePastDeleteMarkers, or the CF level attribute of retaining Delete markers.
          Lars Hofhansl, what you think about this delete handling with -ve mvcc values?

          Please let me know what you think of the above concerns. In case I missed something, please correct me.
          Thanks.

          Show
          Himanshu Vashishtha added a comment - I have a follow up question on this -ve mvcc stuff. Current flusher resets the mvcc point to 0 if it is older than the minimum readpoint across all the scanners. There are two cases here: a) Region is under recovery mode: The minimum readpoint will be maxSequenceIds (obtained from the StoreFiles) + 1, as the region is not available for read yet. Thus, it will reset this -ve sequenceId to 0? Flushed hfiles under recovery will not be having -ve numbers? Let's say we changed this and keep the -ve numbers intact. b) Recovery is completed, and region is available for read. There might be some scanners open and we would now have some legit min readpoint. i) I see that as part of optimization, we set memstoreTS to 0 in case it is < MVCC.readpoint (even while simple scan). Do we need to remove that optimization now? ii) How we handle the Deletes now. I see SQM comparing memstoreTS with mvccReadPoint at some places (see the match method). Especially settings where it wants to seePastDeleteMarkers, or the CF level attribute of retaining Delete markers. Lars Hofhansl , what you think about this delete handling with -ve mvcc values? Please let me know what you think of the above concerns. In case I missed something, please correct me. Thanks.
          Hide
          Enis Soztutar added a comment -

          I have opened HBASE-8763 to discuss the MVCC + SeqID semantics further for 0.98+ releases. Let's use that to gather our long term plans.

          Show
          Enis Soztutar added a comment - I have opened HBASE-8763 to discuss the MVCC + SeqID semantics further for 0.98+ releases. Let's use that to gather our long term plans.
          Hide
          Ted Yu added a comment -

          A new patch is coming which addresses the persistence of sequence Id in the KeyValue's created by distributed log replay.

          Show
          Ted Yu added a comment - A new patch is coming which addresses the persistence of sequence Id in the KeyValue's created by distributed log replay.
          Hide
          Jeffrey Zhong added a comment -

          Thanks Stack and Himanshu Vashishtha for good comments. In the v7 patch, we really write negative mvcc into hfile. The patch runs all unit tests clean with & without distributedLogReplay.

          I do not get why we have to have two sequenceids in an edit; the actual and the 'original'.

          Oh, I see your question now. We don't have two sequence numbers.(there is no actual sequence number stored in WALEdit) Only this 'original' one which is introduced in the patch.

          Doesn't mvcc make it out to hfile so that when we merge it w/ the memstore (because there was say, an ongoing scan at the time of the flush), that we still respect whatever the mvcc was at the time and not show clients' edits that are not meant for them?

          Good point. Since recovery does NOT allow reads, the situation above won't happen. A read request will get an exception before actual read logic happens. After recovery. it means all writes are committed so there should be no issue to read them all. Because of the negative mvcc value(logically equal to 0), they will be fetched. The semantics around this are same as recovered edits recovery where MVCC values are 0.

          Do you mean 'replace' in the above?

          No, there is no replace and both versions of KV with different MVCC values exist in memstore.

          The negative mvcc is out in the hfile occupying the sequenceid-for-the-hfile location?

          No. mvcc doesn't affect sequence id of a hfile because mvcc and sequence number are independent with each other

          b) Recovery is completed, and region is available for read. There might be some scanners open and we would now have some legit min readpoint.

          Read requests will be rejected till the region is recovered by which time all writes are committed.

          Do we need to remove that optimization now?

          indeed, we need this. Thanks for the good point.

          ii) How we handle the Deletes now.

          Deletes won't be affected because delete always win of the same version no ordering gurantee. I also checked the code and negative mvcc doesn't affect the scenario you mentioned above.

          Thanks.

          Show
          Jeffrey Zhong added a comment - Thanks Stack and Himanshu Vashishtha for good comments. In the v7 patch, we really write negative mvcc into hfile. The patch runs all unit tests clean with & without distributedLogReplay. I do not get why we have to have two sequenceids in an edit; the actual and the 'original'. Oh, I see your question now. We don't have two sequence numbers.(there is no actual sequence number stored in WALEdit) Only this 'original' one which is introduced in the patch. Doesn't mvcc make it out to hfile so that when we merge it w/ the memstore (because there was say, an ongoing scan at the time of the flush), that we still respect whatever the mvcc was at the time and not show clients' edits that are not meant for them? Good point. Since recovery does NOT allow reads, the situation above won't happen. A read request will get an exception before actual read logic happens. After recovery. it means all writes are committed so there should be no issue to read them all. Because of the negative mvcc value(logically equal to 0), they will be fetched. The semantics around this are same as recovered edits recovery where MVCC values are 0. Do you mean 'replace' in the above? No, there is no replace and both versions of KV with different MVCC values exist in memstore. The negative mvcc is out in the hfile occupying the sequenceid-for-the-hfile location? No. mvcc doesn't affect sequence id of a hfile because mvcc and sequence number are independent with each other b) Recovery is completed, and region is available for read. There might be some scanners open and we would now have some legit min readpoint. Read requests will be rejected till the region is recovered by which time all writes are committed. Do we need to remove that optimization now? indeed, we need this. Thanks for the good point. ii) How we handle the Deletes now. Deletes won't be affected because delete always win of the same version no ordering gurantee. I also checked the code and negative mvcc doesn't affect the scenario you mentioned above. Thanks.
          Hide
          stack added a comment -

          Oh, I see your question now. We don't have two sequence numbers.(there is no actual sequence number stored in WALEdit) Only this 'original' one which is introduced in the patch.

          I was confusing WALEdit w/ HLogKey. HLogKey has a seqid. You are hoisting the seqid from HLogKey into WALEdit?

          The semantics around this are same as recovered edits recovery where MVCC values are 0.

          OK.

          But hfiles could have a -ve sequenceid, right? And these hfiles are added to the set of all hfiles. How then w/ its -ve number do the hfiles sort on merge? You have added code so that at merge time, the -ve number is allowed I suppose.

          Let me look at v7 patch.

          Show
          stack added a comment - Oh, I see your question now. We don't have two sequence numbers.(there is no actual sequence number stored in WALEdit) Only this 'original' one which is introduced in the patch. I was confusing WALEdit w/ HLogKey. HLogKey has a seqid. You are hoisting the seqid from HLogKey into WALEdit? The semantics around this are same as recovered edits recovery where MVCC values are 0. OK. But hfiles could have a -ve sequenceid, right? And these hfiles are added to the set of all hfiles. How then w/ its -ve number do the hfiles sort on merge? You have added code so that at merge time, the -ve number is allowed I suppose. Let me look at v7 patch.
          Hide
          stack added a comment -

          How will compactions deal with the -ve sequenceid? Compactions select adjacent files. They figure adjancey using sequenceid?

          Show
          stack added a comment - How will compactions deal with the -ve sequenceid? Compactions select adjacent files. They figure adjancey using sequenceid?
          Hide
          Sergey Shelukhin added a comment -

          No, they don't... we discussed that during pre-Con meetup, compactions will have to be disabled (or made smarter - only existing files, or only new files, can be compacted) during recovery w/writes enabled.

          Show
          Sergey Shelukhin added a comment - No, they don't... we discussed that during pre-Con meetup, compactions will have to be disabled (or made smarter - only existing files, or only new files, can be compacted) during recovery w/writes enabled.
          Hide
          stack added a comment -

          No, they don't... we discussed that during pre-Con meetup, compactions will have to be disabled (or made smarter - only existing files, or only new files, can be compacted) during recovery w/writes enabled.

          Sergey Shelukhin Sounds like we can have hfiles w/ -ve sequenceid. It can contain replayed edits and edits that came in during replay. We flip the region to start taking reads (and reenable compactions). Compactions currently – until kvs include a seqid – are compacting adjacent files. If we are figuring adjaceny using seqid, the -ve file goes to the end of the line though it was most-recently-written file.

          Show
          stack added a comment - No, they don't... we discussed that during pre-Con meetup, compactions will have to be disabled (or made smarter - only existing files, or only new files, can be compacted) during recovery w/writes enabled. Sergey Shelukhin Sounds like we can have hfiles w/ -ve sequenceid. It can contain replayed edits and edits that came in during replay. We flip the region to start taking reads (and reenable compactions). Compactions currently – until kvs include a seqid – are compacting adjacent files. If we are figuring adjaceny using seqid, the -ve file goes to the end of the line though it was most-recently-written file.
          Hide
          stack added a comment -

          Looking at v7:

          +      // If both KeyValues carry seq Id, there is no need to negate the result of comparison
          +      if (left.getMemstoreTS() < 0 && right.getMemstoreTS() < 0) {
          +        return Longs.compare(left.getMemstoreTS(), right.getMemstoreTS());
          +      }
          

          Needs better comment. The value being compared is a ts but its called a seqid in the comment. Confusing.

          Woah... whats up here?

          -      decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0;
          +      byte[] needDecoding = fileInfo.get(HFileWriterV2.NEED_DECODE_MEMSTORE_TS_KEY);
          +      if (needDecoding != null) {
          +        decodeMemstoreTS = Bytes.toBoolean(needDecoding);
          +      } else {
          +        decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0;
          +      }
          

          Sometimes its a boolean and other times its a ts?

          What is the 'decoding' that is going on here?

          Regards 200M.

          + RS A crashes. It was carrying 15 WALs. The last WAL was unfinished.
          + RS B gets a region X from RS A. We open it for writes while we are recovering this region. We add 200M to its sequenceid because this region has seqids in excess of what the RS is currently carrying. We take in 1 edit while recovering. We do not flush. We crash.
          + RS C recovers X. It adds 200M to the its seqid. We take in 1 edit while recovering X.

          What guarantees are there that the recovery done on RS C has seqids in excess of those of RS B?

          It seems wrong that a region would add itself to the list of recovering regions the HRegionServer is hosting. Doesn't the HRegionServer have more context? (And this is polluting HRegion w/ HRegionServer specifics). Who judges the region can start accepting reads? The HRegionServer? If so, it should be managing whether a region is in recovering state, not the HRegion itself.

          Presumption here is that edits are sorted:

          +        Mutation mutation = batchOp.operations[i].getFirst();
          

          Is that safe presumption to make in replay?

          Is this the least sequenceid of the batch?

          No comment on why of a sudden we decide to negate the sequence number:

          -          kv.setMemstoreTS(localizedWriteEntry.getWriteNumber());
          +          kv.setMemstoreTS(seqId == NO_SEQ_ID ? localizedWriteEntry.getWriteNumber() :
          +            -seqId);
          

          Again, what is the difference between these two sequenceids?

             private long logSeqNum;
          +  // used in distributedLogReplay to store original log sequence number of an edit
          +  private long origLogSeqNum;
          

          What is an original log seq num?

          What is going on here?

          +        long sequenceNum = (logKey.getOrigSequenceNumber() > 0) ? logKey.getOrigSequenceNumber()
          +            : logKey.getLogSeqNum();
          

          The 'orig' seq number is > 0 take it? Is this 'if it is present'?

          We only do this stuff for Puts and Deletes? Don't we have other types out in the WAL?

          The HLogKey gets carried into WALEdit? We have it in two places or it is just when we instantiate the WALEdit replaying edits? Do we have to add it to WALEdit at all?

          We seem to be polluting types to carry info down into the depths of an HRegion.

          Show
          stack added a comment - Looking at v7: + // If both KeyValues carry seq Id, there is no need to negate the result of comparison + if (left.getMemstoreTS() < 0 && right.getMemstoreTS() < 0) { + return Longs.compare(left.getMemstoreTS(), right.getMemstoreTS()); + } Needs better comment. The value being compared is a ts but its called a seqid in the comment. Confusing. Woah... whats up here? - decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0; + byte [] needDecoding = fileInfo.get(HFileWriterV2.NEED_DECODE_MEMSTORE_TS_KEY); + if (needDecoding != null ) { + decodeMemstoreTS = Bytes.toBoolean(needDecoding); + } else { + decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0; + } Sometimes its a boolean and other times its a ts? What is the 'decoding' that is going on here? Regards 200M. + RS A crashes. It was carrying 15 WALs. The last WAL was unfinished. + RS B gets a region X from RS A. We open it for writes while we are recovering this region. We add 200M to its sequenceid because this region has seqids in excess of what the RS is currently carrying. We take in 1 edit while recovering. We do not flush. We crash. + RS C recovers X. It adds 200M to the its seqid. We take in 1 edit while recovering X. What guarantees are there that the recovery done on RS C has seqids in excess of those of RS B? It seems wrong that a region would add itself to the list of recovering regions the HRegionServer is hosting. Doesn't the HRegionServer have more context? (And this is polluting HRegion w/ HRegionServer specifics). Who judges the region can start accepting reads? The HRegionServer? If so, it should be managing whether a region is in recovering state, not the HRegion itself. Presumption here is that edits are sorted: + Mutation mutation = batchOp.operations[i].getFirst(); Is that safe presumption to make in replay? Is this the least sequenceid of the batch? No comment on why of a sudden we decide to negate the sequence number: - kv.setMemstoreTS(localizedWriteEntry.getWriteNumber()); + kv.setMemstoreTS(seqId == NO_SEQ_ID ? localizedWriteEntry.getWriteNumber() : + -seqId); Again, what is the difference between these two sequenceids? private long logSeqNum; + // used in distributedLogReplay to store original log sequence number of an edit + private long origLogSeqNum; What is an original log seq num? What is going on here? + long sequenceNum = (logKey.getOrigSequenceNumber() > 0) ? logKey.getOrigSequenceNumber() + : logKey.getLogSeqNum(); The 'orig' seq number is > 0 take it? Is this 'if it is present'? We only do this stuff for Puts and Deletes? Don't we have other types out in the WAL? The HLogKey gets carried into WALEdit? We have it in two places or it is just when we instantiate the WALEdit replaying edits? Do we have to add it to WALEdit at all? We seem to be polluting types to carry info down into the depths of an HRegion.
          Hide
          Jeffrey Zhong added a comment -

          Thanks Stack for the comments.

          How will compactions deal with the -ve sequenceid

          The sequence ids of hfile are intact as before.

          Sometimes its a boolean and other times its a ts?

          decodeMemstoreTS is boolean. It's used to indicate hfilereader whether to decode memtoreTS(mvcc) number. An existing optimization to skip mvcc number decoding by using the following logic. Since we use negative mvcc, the optimization may skip decode mvcc number from a hfile.

          Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0;
          

          Regards 200M.

          This part will be updated later by 8741. I left the code there is to let one of my new test case pass where we test same version update comes during recovery.

          Is that safe presumption to make in replay?
          Is this the least sequenceid of the batch?
          Again, what is the difference between these two sequenceids?
          Do we have to add it to WALEdit at all?

          I think we may not need the origSequneceNumber because mvcc is part of KV and should be already written into WAL? Let me try to see if I can cut the origSequenceNumber.

          Is this 'if it is present'?

          Yes.

          We only do this stuff for Puts and Deletes? Don't we have other types out in the WAL?

          Only puts and deletes are used for recovery purpose in WAL.

          Show
          Jeffrey Zhong added a comment - Thanks Stack for the comments. How will compactions deal with the -ve sequenceid The sequence ids of hfile are intact as before. Sometimes its a boolean and other times its a ts? decodeMemstoreTS is boolean. It's used to indicate hfilereader whether to decode memtoreTS(mvcc) number. An existing optimization to skip mvcc number decoding by using the following logic. Since we use negative mvcc, the optimization may skip decode mvcc number from a hfile. Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) > 0; Regards 200M. This part will be updated later by 8741. I left the code there is to let one of my new test case pass where we test same version update comes during recovery. Is that safe presumption to make in replay? Is this the least sequenceid of the batch? Again, what is the difference between these two sequenceids? Do we have to add it to WALEdit at all? I think we may not need the origSequneceNumber because mvcc is part of KV and should be already written into WAL? Let me try to see if I can cut the origSequenceNumber. Is this 'if it is present'? Yes. We only do this stuff for Puts and Deletes? Don't we have other types out in the WAL? Only puts and deletes are used for recovery purpose in WAL.
          Hide
          stack added a comment -

          The sequence ids of hfile are intact as before.

          But some can be -ve? So they will be out of order? (I don't see special handling in v7 – I may have missed it).

          Thanks.

          Show
          stack added a comment - The sequence ids of hfile are intact as before. But some can be -ve? So they will be out of order? (I don't see special handling in v7 – I may have missed it). Thanks.
          Hide
          Jeffrey Zhong added a comment -

          But some can be -ve? So they will be out of order?

          the negative mvcc is only used in KV comparison to resolve conflicts when row + timestamp is same. The hfile sequence number used in flush is from current region server log sequence number not the mvcc value. KVs of the same key+timestamp can be out of order in multiple store files while the right KV is selected with the help of these negative mvcc values(origin log sequence number). (You can refer to KVScannerComparator#compare)

          Is that safe presumption to make in replay?
          Is this the least sequenceid of the batch?
          Do we have to add it to WALEdit at all?

          I checked the wal serialization code, kv isn't PBed and we don't write mvcc values into wal. Therefore, I still need to add the "original log sequence number" into receiving RS hlogkey of a wal entry to maintain the original order of changes when re-replaying these edits for a RS chain failure scenario.
          During replay, we can use the last sequence id of a batch because a batch request is from a single wal and the relative order of changes is maintained when constructing the batch request. So the last KV wins if there are multiple KVs of the same key + timestamp.
          I don't have to put the "original seq number" into WALEdit while need to change several method signatures to pass the info down.

          Show
          Jeffrey Zhong added a comment - But some can be -ve? So they will be out of order? the negative mvcc is only used in KV comparison to resolve conflicts when row + timestamp is same. The hfile sequence number used in flush is from current region server log sequence number not the mvcc value. KVs of the same key+timestamp can be out of order in multiple store files while the right KV is selected with the help of these negative mvcc values(origin log sequence number). (You can refer to KVScannerComparator#compare) Is that safe presumption to make in replay? Is this the least sequenceid of the batch? Do we have to add it to WALEdit at all? I checked the wal serialization code, kv isn't PBed and we don't write mvcc values into wal. Therefore, I still need to add the "original log sequence number" into receiving RS hlogkey of a wal entry to maintain the original order of changes when re-replaying these edits for a RS chain failure scenario. During replay, we can use the last sequence id of a batch because a batch request is from a single wal and the relative order of changes is maintained when constructing the batch request. So the last KV wins if there are multiple KVs of the same key + timestamp. I don't have to put the "original seq number" into WALEdit while need to change several method signatures to pass the info down.
          Hide
          Sergey Shelukhin added a comment -
          Sounds like we can have hfiles w/ -ve sequenceid. It can contain replayed edits and edits that came in during replay. We flip the region to start taking reads (and reenable compactions). Compactions currently – until kvs include a seqid – are compacting adjacent files. If we are figuring adjaceny using seqid, the -ve file goes to the end of the line though it was most-recently-written file.

          What is -ve? Negative? last time I checked the replayed and new KVs were supposed to go into separate memstores, right? The recovery-memstore can then flush into a file with "old" (WAL-based) seqId, so adjacency would be maintained, we will just have a potential gap for some time which we should be mindful of (i.e. we'll have some old, good files, some new files from writes-during-recovery, and they would look adjacent but should not be compacted together until the recovered file(s) are inserted "between" them).
          I don't think we can maintain the same guarantees if new and recovered KVs are written into mixed memstores (unless seqNums are also written there).

          Show
          Sergey Shelukhin added a comment - Sounds like we can have hfiles w/ -ve sequenceid. It can contain replayed edits and edits that came in during replay. We flip the region to start taking reads (and reenable compactions). Compactions currently – until kvs include a seqid – are compacting adjacent files. If we are figuring adjaceny using seqid, the -ve file goes to the end of the line though it was most-recently-written file. What is -ve? Negative? last time I checked the replayed and new KVs were supposed to go into separate memstores, right? The recovery-memstore can then flush into a file with "old" (WAL-based) seqId, so adjacency would be maintained, we will just have a potential gap for some time which we should be mindful of (i.e. we'll have some old, good files, some new files from writes-during-recovery, and they would look adjacent but should not be compacted together until the recovered file(s) are inserted "between" them). I don't think we can maintain the same guarantees if new and recovered KVs are written into mixed memstores (unless seqNums are also written there).
          Hide
          Sergey Shelukhin added a comment - - edited

          Well, it's also possible to ensure there's only one mixed memstore (as in, entire recovery and writes-during-recovery go into one single memstore), then if recovery can resolve conflicts inside it (using seqNums that are available from recovery I suppose), it can be flushed into correct file. But this is not practical...

          Show
          Sergey Shelukhin added a comment - - edited Well, it's also possible to ensure there's only one mixed memstore (as in, entire recovery and writes-during-recovery go into one single memstore), then if recovery can resolve conflicts inside it (using seqNums that are available from recovery I suppose), it can be flushed into correct file. But this is not practical...
          Hide
          Jeffrey Zhong added a comment -

          The basic idea of using MVCC field to store negative sequence number of replay edits is that we can identify the order of edits by referring their change sequence number. Storing negative values is to differentiate normal non-negative MVCC values.

          (Following are for edits of same key + timestamp)
          Since we tag each recovered edit with its sequence number, we don't need separate memstores to flush. When a store file contains mix recovered edits and writes during recovery, new writes has non-negative mvcc values so they will use the sequence number of its hfile while recovered edits has negative MVCC they will use -MVCC value as their sequence number which are less than the hfile sequence number as we guarantee a newer hfile has higher sequence number. Similar logic is used to resolve recovered redits in different hfiles.

          Since the patch is attached, I think it's better to suggest potential issues based on patch. You can see basic idea from the small 8701-v3.txt.

          Show
          Jeffrey Zhong added a comment - The basic idea of using MVCC field to store negative sequence number of replay edits is that we can identify the order of edits by referring their change sequence number. Storing negative values is to differentiate normal non-negative MVCC values. (Following are for edits of same key + timestamp) Since we tag each recovered edit with its sequence number, we don't need separate memstores to flush. When a store file contains mix recovered edits and writes during recovery, new writes has non-negative mvcc values so they will use the sequence number of its hfile while recovered edits has negative MVCC they will use -MVCC value as their sequence number which are less than the hfile sequence number as we guarantee a newer hfile has higher sequence number. Similar logic is used to resolve recovered redits in different hfiles. Since the patch is attached, I think it's better to suggest potential issues based on patch. You can see basic idea from the small 8701-v3.txt.
          Hide
          stack added a comment -

          Therefore, I still need to add the "original log sequence number" into receiving RS hlogkey of a wal entry to maintain the original order of changes when re-replaying these edits for a RS chain failure scenario.

          The original log sequence number is not serialized? (See #write method). Should it be?

          I think I understand carrying the first seqid along now but it is messy.

          Let me review more.

          Show
          stack added a comment - Therefore, I still need to add the "original log sequence number" into receiving RS hlogkey of a wal entry to maintain the original order of changes when re-replaying these edits for a RS chain failure scenario. The original log sequence number is not serialized? (See #write method). Should it be? I think I understand carrying the first seqid along now but it is messy. Let me review more.
          Hide
          stack added a comment -

          Pardon me. I see that the 'original seqnum' IS serialized, written into WALKey.

          In KV, why there are would be -ve mvcc needs to be explained.

          MINOR_VERSION_WITH_MVCC_SEQ_ID_UNION is defined but not used?

          What is 'decoding'? That it is -ve?

          +  /** whether memstore (mvcc) timestamp field needs decoding in reader */
          

          What is this? Should it be test for -ve?

          +    if (kv.getMemstoreTS() != 0) {
          +      this.needDecodeMemstoreTS = true;
          +    }
          

          Seems like any non-null mvcc needs 'decoding'. If so, do we need this flag stuff going on? (Sorry for the dumb questions... just trying to figure whats going on here).

          Could you just do this in reader:

          decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) != 0;

          ... and have to mess in Writer?

          I have already remarked on how it is odd that HRegion adds itself to the recovering region list – that the regionserver should be doing this, not the hregion (it used to be done in OpenRegionHandler?)

          If the log replay edits skip WAL, we don't have to carry this 'original seqid'? But doing it this way, we can archive WALs as we finish their replay and if the RS we are playing into dies, we can then play any left over WALs... and any WALs it may have accummulated in meantime.

          On this:

          -          this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
          +          this.comparator.compare(kvNext, this.current.getSequenceID(),
          +            topScanner.peek(), topScanner.getSequenceID()) >= 0) {
          

          Do you think above a bug fix? It looks like hfile is seqid of 0 and memstore is seqid of MAX_VALUE. Are they ever different? Does the 'orig seqid' ever get in the mix here?

          Is this method used in a few places?

          + public int compare(KeyValue leftKV, long leftSeqId, KeyValue rightKV, long rightSeqId) {

          Previous we'd get the seqid internal from the left and right key but not we allow the seqid being passed independent of the kvs being compared. Is that what we want? Why open up this compare? For the scan compares on the heap?

          This is pretty fundamental change. Compares mvcc first, then seqid.

          And if the mvcc is < 0 in the below, what happens?

          -          if (kv.getMemstoreTS() <= smallestReadPoint) {
          +          if (kv.getMemstoreTS() >= 0 && kv.getMemstoreTS() <= smallestReadPoint) {
          

          ditto here:

          -        if (kv.getMemstoreTS() <= smallestReadPoint) {
          +        if (kv.getMemstoreTS() > 0 && kv.getMemstoreTS() <= smallestReadPoint) {
          

          In general, little explanation in the patch on why things are as they are, the HRegion SEQNUM_SAFETY_BUMPER seqid counting needs to be done still (yeah, that is different issue), but high-level, this adds a load of complexity; too hard to follow what is going on (I am still not done w/ review; next up would be trying different failure scenarios).

          The sequence ids of hfile are intact as before.

          This means that the seqid we flush with is never -ve?

          Show
          stack added a comment - Pardon me. I see that the 'original seqnum' IS serialized, written into WALKey. In KV, why there are would be -ve mvcc needs to be explained. MINOR_VERSION_WITH_MVCC_SEQ_ID_UNION is defined but not used? What is 'decoding'? That it is -ve? + /** whether memstore (mvcc) timestamp field needs decoding in reader */ What is this? Should it be test for -ve? + if (kv.getMemstoreTS() != 0) { + this .needDecodeMemstoreTS = true ; + } Seems like any non-null mvcc needs 'decoding'. If so, do we need this flag stuff going on? (Sorry for the dumb questions... just trying to figure whats going on here). Could you just do this in reader: decodeMemstoreTS = Bytes.toLong(fileInfo.get(HFileWriterV2.MAX_MEMSTORE_TS_KEY)) != 0; ... and have to mess in Writer? I have already remarked on how it is odd that HRegion adds itself to the recovering region list – that the regionserver should be doing this, not the hregion (it used to be done in OpenRegionHandler?) If the log replay edits skip WAL, we don't have to carry this 'original seqid'? But doing it this way, we can archive WALs as we finish their replay and if the RS we are playing into dies, we can then play any left over WALs... and any WALs it may have accummulated in meantime. On this: - this .comparator.compare(kvNext, topScanner.peek()) >= 0) { + this .comparator.compare(kvNext, this .current.getSequenceID(), + topScanner.peek(), topScanner.getSequenceID()) >= 0) { Do you think above a bug fix? It looks like hfile is seqid of 0 and memstore is seqid of MAX_VALUE. Are they ever different? Does the 'orig seqid' ever get in the mix here? Is this method used in a few places? + public int compare(KeyValue leftKV, long leftSeqId, KeyValue rightKV, long rightSeqId) { Previous we'd get the seqid internal from the left and right key but not we allow the seqid being passed independent of the kvs being compared. Is that what we want? Why open up this compare? For the scan compares on the heap? This is pretty fundamental change. Compares mvcc first, then seqid. And if the mvcc is < 0 in the below, what happens? - if (kv.getMemstoreTS() <= smallestReadPoint) { + if (kv.getMemstoreTS() >= 0 && kv.getMemstoreTS() <= smallestReadPoint) { ditto here: - if (kv.getMemstoreTS() <= smallestReadPoint) { + if (kv.getMemstoreTS() > 0 && kv.getMemstoreTS() <= smallestReadPoint) { In general, little explanation in the patch on why things are as they are, the HRegion SEQNUM_SAFETY_BUMPER seqid counting needs to be done still (yeah, that is different issue), but high-level, this adds a load of complexity; too hard to follow what is going on (I am still not done w/ review; next up would be trying different failure scenarios). The sequence ids of hfile are intact as before. This means that the seqid we flush with is never -ve?
          Hide
          Jeffrey Zhong added a comment -

          Thanks Stack for the detailed reviews!

          I see that the 'original seqnum' IS serialized, written into WALKey.

          Yeah. That's introduced by the latest patch to handle chained failure scenarios. That's why you see me the "original sequence number" passed down by WALEdit. If we use "skipWal" for edits replay, we won't need to do this and also might achieve even better performance. Let me know which way you lean towards.(I personally prefer skipwal option)

          Seems like any non-null mvcc needs 'decoding'. If so, do we need this flag stuff going on?

          Not every one because most case mvcc has values 0 unless we have recovery edits. mvcc optimization normally writes mvcc out as 0. Since we pass negative mvcc, the MAX_MEMSTORE_TS_KEY will be 0. In that case, hfilereader won't read the negative mvcc value at all which against the purpose to use mvcc to store negative sequence number. I'll change the boolean decodeMemstoreTS to a long MIN_MEMSTORE_TS_KEY to be more clear on whether a hfile contains negative mvcc values.

          I have already remarked on how it is odd that HRegion adds itself to the recovering region list – that the regionserver should be doing this, not the hregion (it used to be done in OpenRegionHandler?)

          I'll modify this part.

          Previous we'd get the seqid internal from the left and right key but not we allow the seqid being passed independent of the kvs being compared. Is that what we want?

          This is to pass enough information to the compare() method in calls where only KeyValue's were passed before.

          MINOR_VERSION_WITH_MVCC_SEQ_ID_UNION is defined but not used?

          It serves the purpose for developers to know what purpose minor version 4 serves. Having a constant for version 4 is in line with prior practice.

          This means that the seqid we flush with is never -ve?

          That's correct. I can add an assertion here.

          I'll try to provide a new patch based on your feedbacks. Thanks.

          Show
          Jeffrey Zhong added a comment - Thanks Stack for the detailed reviews! I see that the 'original seqnum' IS serialized, written into WALKey. Yeah. That's introduced by the latest patch to handle chained failure scenarios. That's why you see me the "original sequence number" passed down by WALEdit. If we use "skipWal" for edits replay, we won't need to do this and also might achieve even better performance. Let me know which way you lean towards.(I personally prefer skipwal option) Seems like any non-null mvcc needs 'decoding'. If so, do we need this flag stuff going on? Not every one because most case mvcc has values 0 unless we have recovery edits. mvcc optimization normally writes mvcc out as 0. Since we pass negative mvcc, the MAX_MEMSTORE_TS_KEY will be 0. In that case, hfilereader won't read the negative mvcc value at all which against the purpose to use mvcc to store negative sequence number. I'll change the boolean decodeMemstoreTS to a long MIN_MEMSTORE_TS_KEY to be more clear on whether a hfile contains negative mvcc values. I have already remarked on how it is odd that HRegion adds itself to the recovering region list – that the regionserver should be doing this, not the hregion (it used to be done in OpenRegionHandler?) I'll modify this part. Previous we'd get the seqid internal from the left and right key but not we allow the seqid being passed independent of the kvs being compared. Is that what we want? This is to pass enough information to the compare() method in calls where only KeyValue's were passed before. MINOR_VERSION_WITH_MVCC_SEQ_ID_UNION is defined but not used? It serves the purpose for developers to know what purpose minor version 4 serves. Having a constant for version 4 is in line with prior practice. This means that the seqid we flush with is never -ve? That's correct. I can add an assertion here. I'll try to provide a new patch based on your feedbacks. Thanks.
          Hide
          Jeffrey Zhong added a comment -

          In v8 patch, I incorporated Stack's feedbacks and added a test case regarding compactions on hfiles containing negative mvcc values.

          In the patch, I didn't use skipWal for replay because it may causes many flushes(number of wals * number of regions of a wal) and not ideal fail over handling(have to replay the whole wal when receiving RS fails during replay)

          Show
          Jeffrey Zhong added a comment - In v8 patch, I incorporated Stack 's feedbacks and added a test case regarding compactions on hfiles containing negative mvcc values. In the patch, I didn't use skipWal for replay because it may causes many flushes(number of wals * number of regions of a wal) and not ideal fail over handling(have to replay the whole wal when receiving RS fails during replay)
          Hide
          Thomas Pan added a comment -

          Happy holidays! I am out of office from June 28 to July 7 during July 4th week. Please contact Pradeep for any urgent issues. $0.02, -Thomas

          Show
          Thomas Pan added a comment - Happy holidays! I am out of office from June 28 to July 7 during July 4th week. Please contact Pradeep for any urgent issues. $0.02, -Thomas
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12590155/hbase-8701-v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12590155/hbase-8701-v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6171//console This message is automatically generated.
          Hide
          stack added a comment -

          Whats up w/ this patch? Just needs more review? Any testing Jeffrey Zhong?

          Show
          stack added a comment - Whats up w/ this patch? Just needs more review? Any testing Jeffrey Zhong ?
          Hide
          Jeffrey Zhong added a comment -

          Stack The patch just needs rebase while it is using negative mvcc to store change sequence number of WALEdits to handle updates with same version. Since tag support is in horizon, I'd like to port this patch using tags instead of mvcc field.

          Show
          Jeffrey Zhong added a comment - Stack The patch just needs rebase while it is using negative mvcc to store change sequence number of WALEdits to handle updates with same version. Since tag support is in horizon, I'd like to port this patch using tags instead of mvcc field.
          Hide
          stack added a comment -

          Jeffrey Zhong I like that plan. Moving out of 0.96

          Show
          stack added a comment - Jeffrey Zhong I like that plan. Moving out of 0.96
          Hide
          Jeffrey Zhong added a comment -

          Port the old patch to use tag.

          1) During KeyValue comparison after rowkey, colfam/qual, timestamp, type are compared(same key), log sequence number will be used if there is any otherwise mvcc will be used. So the extra cost to fetch the log sequence number from tag is trivial because comparison normally terminates much earlier before reaching log sequence number or mvcc.

          2) Only edits being replayed during recovery will be tagged with their own original log sequence number therefore the extra tag storage overhead is insignificant considering recovery doesn't happen often and tagging is only applied upon unflushed edits.

          Show
          Jeffrey Zhong added a comment - Port the old patch to use tag. 1) During KeyValue comparison after rowkey, colfam/qual, timestamp, type are compared(same key), log sequence number will be used if there is any otherwise mvcc will be used. So the extra cost to fetch the log sequence number from tag is trivial because comparison normally terminates much earlier before reaching log sequence number or mvcc. 2) Only edits being replayed during recovery will be tagged with their own original log sequence number therefore the extra tag storage overhead is insignificant considering recovery doesn't happen often and tagging is only applied upon unflushed edits.
          Hide
          Ted Yu added a comment -
          -      return Longs.compare(right.getMvccVersion(), left.getMvccVersion());
          +      long leftChangeSeqNum = getReplaySeqNum(left);
          +      if (leftChangeSeqNum < 0) {
          +        leftChangeSeqNum = left.getMvccVersion();
          +      }
          +      long RightChangeSeqNum = getReplaySeqNum(right);
          +      if (RightChangeSeqNum < 0) {
          +        RightChangeSeqNum = right.getMvccVersion();
          

          What would happen if one Cell has sequence Id but the other cell doesn't have sequence Id ?

          Can you put the patch on review board ?

          Thanks

          Show
          Ted Yu added a comment - - return Longs.compare(right.getMvccVersion(), left.getMvccVersion()); + long leftChangeSeqNum = getReplaySeqNum(left); + if (leftChangeSeqNum < 0) { + leftChangeSeqNum = left.getMvccVersion(); + } + long RightChangeSeqNum = getReplaySeqNum(right); + if (RightChangeSeqNum < 0) { + RightChangeSeqNum = right.getMvccVersion(); What would happen if one Cell has sequence Id but the other cell doesn't have sequence Id ? Can you put the patch on review board ? Thanks
          Hide
          Jeffrey Zhong added a comment -

          Ted Yu A good point. I've updated the patch and moved it to review board(https://reviews.apache.org/r/16304/). Thanks.

          Show
          Jeffrey Zhong added a comment - Ted Yu A good point. I've updated the patch and moved it to review board( https://reviews.apache.org/r/16304/ ). Thanks.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12618942/hbase-8701-tag.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618942/hbase-8701-tag.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8176//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12618955/hbase-8701-tag-v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618955/hbase-8701-tag-v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8188//console This message is automatically generated.
          Hide
          Jeffrey Zhong added a comment -

          Incorporate the comments and make distributedLogReplay on by default when HFile v3 is used.

          Show
          Jeffrey Zhong added a comment - Incorporate the comments and make distributedLogReplay on by default when HFile v3 is used.
          Hide
          Jeffrey Zhong added a comment -

          There are two test case failures under TestVisibilityLabels due to known issues when distributedLogReplay is on.

          Show
          Jeffrey Zhong added a comment - There are two test case failures under TestVisibilityLabels due to known issues when distributedLogReplay is on.
          Hide
          Anoop Sam John added a comment -

          Yes. HBASE-10148 will solve that.. Hope I can commit that today

          Show
          Anoop Sam John added a comment - Yes. HBASE-10148 will solve that.. Hope I can commit that today
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12619233/hbase-8701-tag-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.security.visibility.TestVisibilityLabels

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619233/hbase-8701-tag-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.security.visibility.TestVisibilityLabels Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8207//console This message is automatically generated.
          Hide
          Jeffrey Zhong added a comment -

          Reattached tag-v2 patch to fix a javadoc warning

          Show
          Jeffrey Zhong added a comment - Reattached tag-v2 patch to fix a javadoc warning
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12619390/hbase-8701-tag-v2.patch
          against trunk revision .
          ATTACHMENT ID: 12619390

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619390/hbase-8701-tag-v2.patch against trunk revision . ATTACHMENT ID: 12619390 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8214//console This message is automatically generated.
          Hide
          stack added a comment -

          It will be odd if this feature is enable when hfilev3 is enabled. It will be unexpected. Since hfilev3 will not be on by default in 0.98, but it will be in trunk, this facility should probably go the same route. Otherwise +1 Jeffrey Zhong

          Show
          stack added a comment - It will be odd if this feature is enable when hfilev3 is enabled. It will be unexpected. Since hfilev3 will not be on by default in 0.98, but it will be in trunk, this facility should probably go the same route. Otherwise +1 Jeffrey Zhong
          Hide
          Jeffrey Zhong added a comment -

          Stack The v2-update patch removed the logic by turn distributedLogReplay on when HFile V3 is used. I'll turn on it by default whenever tags is turned on in trunk or other branches.

          I also run the whole test suite successfully with distributedLogReplay is on.

          Show
          Jeffrey Zhong added a comment - Stack The v2-update patch removed the logic by turn distributedLogReplay on when HFile V3 is used. I'll turn on it by default whenever tags is turned on in trunk or other branches. I also run the whole test suite successfully with distributedLogReplay is on.
          Hide
          stack added a comment -

          +1 from me. It is relatively safe +1'ing this when it is not the default. Himanshu Vashishtha You have any comments on this patch?

          Show
          stack added a comment - +1 from me. It is relatively safe +1'ing this when it is not the default. Himanshu Vashishtha You have any comments on this patch?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12619647/hbase-8701-tag-v2-update.patch
          against trunk revision .
          ATTACHMENT ID: 12619647

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619647/hbase-8701-tag-v2-update.patch against trunk revision . ATTACHMENT ID: 12619647 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8232//console This message is automatically generated.
          Hide
          stack added a comment -

          Jeffrey Zhong Go ahead commit. Himanshu is busy these times firefighting.

          Show
          stack added a comment - Jeffrey Zhong Go ahead commit. Himanshu is busy these times firefighting.
          Hide
          Jeffrey Zhong added a comment -

          Thanks for the reviews! I've integrated the patch into 0.98 and trunk.

          Show
          Jeffrey Zhong added a comment - Thanks for the reviews! I've integrated the patch into 0.98 and trunk.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4742 (See https://builds.apache.org/job/HBase-TRUNK/4742/)
          HBASE-8701: distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552828)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4742 (See https://builds.apache.org/job/HBase-TRUNK/4742/ ) HBASE-8701 : distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552828) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #14 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/14/)
          HBASE-8701: distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552828)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #14 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/14/ ) HBASE-8701 : distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552828) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #28 (See https://builds.apache.org/job/HBase-0.98/28/)
          HBASE-8701: distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552831)

          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java
          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #28 (See https://builds.apache.org/job/HBase-0.98/28/ ) HBASE-8701 : distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552831) /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #25 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/25/)
          HBASE-8701: distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552831)

          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java
          • /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #25 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/25/ ) HBASE-8701 : distributedLogReplay need to apply wal edits in the receiving order of those edits (jeffreyz: rev 1552831) /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/Tag.java /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/TagType.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.

            People

            • Assignee:
              Jeffrey Zhong
              Reporter:
              Jeffrey Zhong
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development