HBase
  1. HBase
  2. HBASE-5699

Run with > 1 WAL in HRegionServer

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Performance
    • Labels:
      None
    1. PerfHbase.txt
      40 kB
      ramkrishna.s.vasudevan

      Issue Links

        Activity

        Hide
        stack added a comment - - edited

        Please provide more detail on what this issue is about and correct the subject so it's properly spelled. Thanks.

        Show
        stack added a comment - - edited Please provide more detail on what this issue is about and correct the subject so it's properly spelled. Thanks.
        Hide
        binlijin added a comment -

        @stack,
        There is only one HLog and a Writer in a HRegionServer for write-ahead-log, at any time only one writer takes the HLog lock and the other will wait. If there are muti HLog or Writer, the write-ahead-log can be run parallel, the write performance should be improved.

        Show
        binlijin added a comment - @stack, There is only one HLog and a Writer in a HRegionServer for write-ahead-log, at any time only one writer takes the HLog lock and the other will wait. If there are muti HLog or Writer, the write-ahead-log can be run parallel, the write performance should be improved.
        Hide
        stack added a comment -

        Yes. This topic comes up from time to time. Would be nice to try it out. It is possible to stand up the WAL subsystem on its own so you could experiment having HLog output to > 1 WAL. A bunch of us would be interested in what you learn.

        Show
        stack added a comment - Yes. This topic comes up from time to time. Would be nice to try it out. It is possible to stand up the WAL subsystem on its own so you could experiment having HLog output to > 1 WAL. A bunch of us would be interested in what you learn.
        Hide
        Juhani Connolly added a comment -

        Since we have had some similar experience posting it here:
        We are finding most of our IPC threads in our region servers locked into HWal.append(42 out of 50. Of those 20 are in sync, and one is actually working... As is to be expected).
        We made the presumption that the problem was with the WAL synchronisation mechanisms holding things up and decided to try running multiple RS per node since we had significant amount of free CPU and memory resources as well as many barely active hard disks.
        By running 3 RS per node, we saw our application specific throughput go from 7k events to 18k. Each event is made up of roughly 2 writes and 2 increments, plus some reads/scans which shouldn't be touching the WAL.
        This situation is partially also just due to a very high spec per node. I don't think it would be necessary on more "commodity" type servers, but the option to use multiple WAL's on each region server may well give some significant throughput gains for some hardware setups.

        Show
        Juhani Connolly added a comment - Since we have had some similar experience posting it here: We are finding most of our IPC threads in our region servers locked into HWal.append(42 out of 50. Of those 20 are in sync, and one is actually working... As is to be expected). We made the presumption that the problem was with the WAL synchronisation mechanisms holding things up and decided to try running multiple RS per node since we had significant amount of free CPU and memory resources as well as many barely active hard disks. By running 3 RS per node, we saw our application specific throughput go from 7k events to 18k. Each event is made up of roughly 2 writes and 2 increments, plus some reads/scans which shouldn't be touching the WAL. This situation is partially also just due to a very high spec per node. I don't think it would be necessary on more "commodity" type servers, but the option to use multiple WAL's on each region server may well give some significant throughput gains for some hardware setups.
        Hide
        binlijin added a comment -

        @stack,
        I run a test with 0.90 version use 10 writer and 3 nodes, some times it has double write performance, may be it not very well.

        Show
        binlijin added a comment - @stack, I run a test with 0.90 version use 10 writer and 3 nodes, some times it has double write performance, may be it not very well.
        Hide
        chunhui shen added a comment -

        I think the number of datanodes is a litte few in the test. Using double hlogWrites in RS, write performance should be nearly double except limit by the HDFS.

        Show
        chunhui shen added a comment - I think the number of datanodes is a litte few in the test. Using double hlogWrites in RS, write performance should be nearly double except limit by the HDFS.
        Hide
        stack added a comment -

        @binlijin What Chunhui says. I'd think that if it were a bigger cluster you'd see a more marked improvement. What about recovery? How does log splitting work with all the extra WALs?

        Show
        stack added a comment - @binlijin What Chunhui says. I'd think that if it were a bigger cluster you'd see a more marked improvement. What about recovery? How does log splitting work with all the extra WALs?
        Hide
        binlijin added a comment -

        I just run a test and don't test the recovery and the others.

        Show
        binlijin added a comment - I just run a test and don't test the recovery and the others.
        Hide
        Li Pi added a comment -

        This seems interesting. I'll take a look at doing this.

        Show
        Li Pi added a comment - This seems interesting. I'll take a look at doing this.
        Hide
        stack added a comment -

        @Ted Why delete a comment, especially someone elses?

        Show
        stack added a comment - @Ted Why delete a comment, especially someone elses?
        Hide
        Ted Yu added a comment -

        It was a duplicate message.

        Show
        Ted Yu added a comment - It was a duplicate message.
        Hide
        stack added a comment -

        @Ted Would suggest you just leave it. When you delete, we all get a message in our mailbox about the delete transaction. Then we start to wonder...

        Show
        stack added a comment - @Ted Would suggest you just leave it. When you delete, we all get a message in our mailbox about the delete transaction. Then we start to wonder...
        Hide
        Ted Yu added a comment -

        Playing with a prototype of this feature using ycsb (half insert, half upate) on a 5-node cluster where usertable has 13 regions on each region server.
        Without this feature:

         10 sec: 99965 operations; 9996.5 current ops/sec; [UPDATE AverageLatency(us)=258.68] [INSERT AverageLatency(us)=610.28]
         20 sec: 99965 operations; 0 current ops/sec;
         25 sec: 99990 operations; 4.3 current ops/sec; [UPDATE AverageLatency(us)=2594303.62] [INSERT AverageLatency(us)=1240495.41]
        [OVERALL], RunTime(ms), 25844.0
        [OVERALL], Throughput(ops/sec), 3868.9831295465096
        [UPDATE], Operations, 49935
        [UPDATE], AverageLatency(us), 674.2635626314209
        

        with this feature:

         10 sec: 99952 operations; 9994.2 current ops/sec; [UPDATE AverageLatency(us)=178.7] [INSERT AverageLatency(us)=584.76]
         20 sec: 99990 operations; 3.8 current ops/sec; [UPDATE AverageLatency(us)=10.88] [INSERT AverageLatency(us)=679174.27]
         20 sec: 99990 operations; 0 current ops/sec;
        [OVERALL], RunTime(ms), 20867.0
        [OVERALL], Throughput(ops/sec), 4791.776489193463
        [UPDATE], Operations, 49992
        [UPDATE], AverageLatency(us), 178.6439030244839
        
        Show
        Ted Yu added a comment - Playing with a prototype of this feature using ycsb (half insert, half upate) on a 5-node cluster where usertable has 13 regions on each region server. Without this feature: 10 sec: 99965 operations; 9996.5 current ops/sec; [UPDATE AverageLatency(us)=258.68] [INSERT AverageLatency(us)=610.28] 20 sec: 99965 operations; 0 current ops/sec; 25 sec: 99990 operations; 4.3 current ops/sec; [UPDATE AverageLatency(us)=2594303.62] [INSERT AverageLatency(us)=1240495.41] [OVERALL], RunTime(ms), 25844.0 [OVERALL], Throughput(ops/sec), 3868.9831295465096 [UPDATE], Operations, 49935 [UPDATE], AverageLatency(us), 674.2635626314209 with this feature: 10 sec: 99952 operations; 9994.2 current ops/sec; [UPDATE AverageLatency(us)=178.7] [INSERT AverageLatency(us)=584.76] 20 sec: 99990 operations; 3.8 current ops/sec; [UPDATE AverageLatency(us)=10.88] [INSERT AverageLatency(us)=679174.27] 20 sec: 99990 operations; 0 current ops/sec; [OVERALL], RunTime(ms), 20867.0 [OVERALL], Throughput(ops/sec), 4791.776489193463 [UPDATE], Operations, 49992 [UPDATE], AverageLatency(us), 178.6439030244839
        Hide
        Elliott Clark added a comment -

        Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase. So maybe this should be either a configurable number or something that is derived from the number of mount points hdfs is hosted on ?

        Show
        Elliott Clark added a comment - Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase. So maybe this should be either a configurable number or something that is derived from the number of mount points hdfs is hosted on ?
        Hide
        Ted Yu added a comment -

        Currently I use the following knob for the maximum number of WAL's on an individual region server:

        +    int totalInstances = conf.getInt("hbase.regionserver.hlog.total", DEFAULT_MAX_HLOG_INSTANCES);
        
        Show
        Ted Yu added a comment - Currently I use the following knob for the maximum number of WAL's on an individual region server: + int totalInstances = conf.getInt( "hbase.regionserver.hlog.total" , DEFAULT_MAX_HLOG_INSTANCES);
        Hide
        Jean-Daniel Cryans added a comment -

        Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase.

        I disagree, considering that most of the deployments have rep=3 you're using three spindles not one. The multiplying effect could generate a lot of disk seeks since the WALs are competing like that (plus flushing, compacting, etc).

        Show
        Jean-Daniel Cryans added a comment - Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase. I disagree, considering that most of the deployments have rep=3 you're using three spindles not one. The multiplying effect could generate a lot of disk seeks since the WALs are competing like that (plus flushing, compacting, etc).
        Hide
        Todd Lipcon added a comment -

        I disagree, considering that most of the deployments have rep=3 you're using three spindles not one

        That said, most of our customers are deploying 6 disks if not 12

        IMO the other big gain we can get from multiple WALs is to automatically switch between WALs when one gets "slow". IMO we should maintain a count of outstanding requests (probably by size) for each WAL, and submit writes to whichever has fewer outstanding requests. That way if one is faster, it will take more of the load. Then simultaneously measure trailing latency stats on each WAL, and if one is significantly slower than the other for some period of time, have it roll (to try to get a new set of disks/nodes)

        Show
        Todd Lipcon added a comment - I disagree, considering that most of the deployments have rep=3 you're using three spindles not one That said, most of our customers are deploying 6 disks if not 12 IMO the other big gain we can get from multiple WALs is to automatically switch between WALs when one gets "slow". IMO we should maintain a count of outstanding requests (probably by size) for each WAL, and submit writes to whichever has fewer outstanding requests. That way if one is faster, it will take more of the load. Then simultaneously measure trailing latency stats on each WAL, and if one is significantly slower than the other for some period of time, have it roll (to try to get a new set of disks/nodes)
        Hide
        Li Pi added a comment -

        Agree with todd on the implementation details. The switching of logs should also serve to help balance our log writes.

        Show
        Li Pi added a comment - Agree with todd on the implementation details. The switching of logs should also serve to help balance our log writes.
        Hide
        Ted Yu added a comment -

        Trying to understand the implication of Todd's suggestion above.
        Currently each HRegion has reference to the HLog it uses. If requests can be freely redirected to the HLog instance having fewer outstanding requests, the reference would be to that of the region server.
        This means additional logic on region server for dispatching the write requests.

        Show
        Ted Yu added a comment - Trying to understand the implication of Todd's suggestion above. Currently each HRegion has reference to the HLog it uses. If requests can be freely redirected to the HLog instance having fewer outstanding requests, the reference would be to that of the region server. This means additional logic on region server for dispatching the write requests.
        Hide
        Jonathan Hsieh added a comment -

        Part of the motivation for multiple wals can be found in this tech talk: (most relavent to HBase is backup requests, starting slide 39)

        http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf

        Show
        Jonathan Hsieh added a comment - Part of the motivation for multiple wals can be found in this tech talk: (most relavent to HBase is backup requests, starting slide 39) http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
        Hide
        Jonathan Hsieh added a comment -

        The argument here is mostly aimed at read latency, but a similar idea could be used for write latency as well.

        Show
        Jonathan Hsieh added a comment - The argument here is mostly aimed at read latency, but a similar idea could be used for write latency as well.
        Hide
        Todd Lipcon added a comment -

        Currently each HRegion has reference to the HLog it uses. If requests can be freely redirected to the HLog instance having fewer outstanding requests, the reference would be to that of the region server.

        Sorry, I should be less free-wheeling with my terminology. My thought was that there is still a single "HLog" class, but underneath it would be multiple "SequenceFileLogWriters", most likely. Though maybe the correct implementation is to make HLog an interface, and then have a MultiHLog which wraps N other HLogs or something. Either way, any region would only have a reference to one "HLog" object, which might have more than one underlying stream.

        Show
        Todd Lipcon added a comment - Currently each HRegion has reference to the HLog it uses. If requests can be freely redirected to the HLog instance having fewer outstanding requests, the reference would be to that of the region server. Sorry, I should be less free-wheeling with my terminology. My thought was that there is still a single "HLog" class, but underneath it would be multiple "SequenceFileLogWriters", most likely. Though maybe the correct implementation is to make HLog an interface, and then have a MultiHLog which wraps N other HLogs or something. Either way, any region would only have a reference to one "HLog" object, which might have more than one underlying stream.
        Hide
        Ted Yu added a comment -

        to one "HLog" object, which might have more than one underlying stream.

        The above can be a (sub-)task by itself.

        Show
        Ted Yu added a comment - to one "HLog" object, which might have more than one underlying stream. The above can be a (sub-)task by itself.
        Hide
        Ted Yu added a comment -

        Currently we maintain one sequence number per region per HLog. From append():

              this.lastSeqWritten.putIfAbsent(regionInfo.getEncodedNameAsBytes(),
                Long.valueOf(seqNum));
        

        If WALEdit's from a particular region can spread across multiple streams, accounting would be more complex.

        Show
        Ted Yu added a comment - Currently we maintain one sequence number per region per HLog. From append(): this .lastSeqWritten.putIfAbsent(regionInfo.getEncodedNameAsBytes(), Long .valueOf(seqNum)); If WALEdit's from a particular region can spread across multiple streams, accounting would be more complex.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Do we need to gurantee the HLog edits sequencing even with multiple WALs? Just referring to Stack's comment in
        https://issues.apache.org/jira/browse/HBASE-5782?focusedCommentId=13255344&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13255344

        Show
        ramkrishna.s.vasudevan added a comment - Do we need to gurantee the HLog edits sequencing even with multiple WALs? Just referring to Stack's comment in https://issues.apache.org/jira/browse/HBASE-5782?focusedCommentId=13255344&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13255344
        Hide
        Li Pi added a comment -

        I'm assuming we don't need to guarantee HLog edit sequencing. If we do, this becomes a bit harder.

        Show
        Li Pi added a comment - I'm assuming we don't need to guarantee HLog edit sequencing. If we do, this becomes a bit harder.
        Hide
        Ted Yu added a comment -

        Are replication related unit tests passing ?

        Since the review process would at least take a month, I think developing against a branch would be good practice.

        Show
        Ted Yu added a comment - Are replication related unit tests passing ? Since the review process would at least take a month, I think developing against a branch would be good practice.
        Hide
        Li Pi added a comment -

        Replication is the failure point. I haven't really worked on that yet.

        Talked to Jon about the dev process. I'll create a seperate Jira for refactoring HLog into an interface. I'll probably continue to work within trunk.

        Separate JIRA should make things easier though.

        Show
        Li Pi added a comment - Replication is the failure point. I haven't really worked on that yet. Talked to Jon about the dev process. I'll create a seperate Jira for refactoring HLog into an interface. I'll probably continue to work within trunk. Separate JIRA should make things easier though.
        Hide
        Ted Yu added a comment -

        Using trunk has the drawback that performance numbers (without this feature) gathered on day N may be obsolete by day N + 5, considering the amount of changes going into trunk.

        I would suggest tackling replication as first priority. Dictionary WAL compression brought unexpected complexities w.r.t. replication. We shouldn't make replication any harder.

        w.r.t. refactoring HLog into an interface, I tend to think that the interface should make different implementations possible.
        If we only have one implementation, it is not easy to evaluate the effectiveness of the refactoring.

        Show
        Ted Yu added a comment - Using trunk has the drawback that performance numbers (without this feature) gathered on day N may be obsolete by day N + 5, considering the amount of changes going into trunk. I would suggest tackling replication as first priority. Dictionary WAL compression brought unexpected complexities w.r.t. replication. We shouldn't make replication any harder. w.r.t. refactoring HLog into an interface, I tend to think that the interface should make different implementations possible. If we only have one implementation, it is not easy to evaluate the effectiveness of the refactoring.
        Hide
        Ted Yu added a comment -

        Here're the key unit tests that must pass:

        TestDistributedLogSplitting, TestReplication, TestMasterReplication, TestMultiSlaveReplication, TestHLog, TestHLogSplit, TestLogRollAbort, TestLogRolling

        Show
        Ted Yu added a comment - Here're the key unit tests that must pass: TestDistributedLogSplitting, TestReplication, TestMasterReplication, TestMultiSlaveReplication, TestHLog, TestHLogSplit, TestLogRollAbort, TestLogRolling
        Hide
        Li Pi added a comment -

        While performance numbers will change, you can simply test with MultipleHLogs on and MultipleHLogsoff. I don't think we're going to move everyone over to multiple Hlogs immediately.

        Will take a look at those tests.

        Show
        Li Pi added a comment - While performance numbers will change, you can simply test with MultipleHLogs on and MultipleHLogsoff. I don't think we're going to move everyone over to multiple Hlogs immediately. Will take a look at those tests.
        Hide
        Ted Yu added a comment -

        This feature requires validation in real cluster.

        @Jonathan:
        Are you able to help Li in this regard ?

        From my experience in the past three weeks, development involves coding -> running test suite -> discovering defect through failed unit tests -> bug fixing -> validation through ycsb -> ...

        Show
        Ted Yu added a comment - This feature requires validation in real cluster. @Jonathan: Are you able to help Li in this regard ? From my experience in the past three weeks, development involves coding -> running test suite -> discovering defect through failed unit tests -> bug fixing -> validation through ycsb -> ...
        Hide
        stack added a comment -

        @Li You should be able to work whereever you like if you make up an harness for running hlog implementations apart from hbase. This should be first order of business (unless you are a masochist). Should be easy enough, if its not possible already, especially after you make it pluggable.

        Regards... "I'm assuming we don't need to guarantee HLog edit sequencing. If we do, this becomes a bit harder." – well thats the way it is currently so onus will be on you to come up a reason why it could be otherwise. In-order makes it easier to reason about whether or not all edits up to a particular sequence id have been sync'd or not.

        And don't forget the other side of the moon, the (distributed) log splitting story. That needs to work too after you are done.

        Show
        stack added a comment - @Li You should be able to work whereever you like if you make up an harness for running hlog implementations apart from hbase. This should be first order of business (unless you are a masochist). Should be easy enough, if its not possible already, especially after you make it pluggable. Regards... "I'm assuming we don't need to guarantee HLog edit sequencing. If we do, this becomes a bit harder." – well thats the way it is currently so onus will be on you to come up a reason why it could be otherwise. In-order makes it easier to reason about whether or not all edits up to a particular sequence id have been sync'd or not. And don't forget the other side of the moon, the (distributed) log splitting story. That needs to work too after you are done.
        Hide
        Jonathan Hsieh added a comment -

        Li, if you want to undertake this I'll help. Let's chat and then write a one-two page summary of what our goals are, what our assumptions are, and what our intended mechanisms are, how we are going to test this and then loopback here with a design/plan to get feedback..

        Another "feature" that may come into play also is the HLog compression.

        Show
        Jonathan Hsieh added a comment - Li, if you want to undertake this I'll help. Let's chat and then write a one-two page summary of what our goals are, what our assumptions are, and what our intended mechanisms are, how we are going to test this and then loopback here with a design/plan to get feedback.. Another "feature" that may come into play also is the HLog compression.
        Hide
        Li Pi added a comment -

        I thought about the compression bit already. I was going to compress each separate log individually.

        Yeah, I should have probably wrote up what I was going to do before hacking stuff up. Will switch gears and work on that a bit instead.

        Show
        Li Pi added a comment - I thought about the compression bit already. I was going to compress each separate log individually. Yeah, I should have probably wrote up what I was going to do before hacking stuff up. Will switch gears and work on that a bit instead.
        Hide
        Ted Yu added a comment -

        there is still a single "HLog" class, but underneath it would be multiple "SequenceFileLogWriters"

        My approach is different from the above.
        The new interface should be general enough that multi-HLog can be implemented without the requirement that HLog have multiple writers.

        Show
        Ted Yu added a comment - there is still a single "HLog" class, but underneath it would be multiple "SequenceFileLogWriters" My approach is different from the above. The new interface should be general enough that multi-HLog can be implemented without the requirement that HLog have multiple writers.
        Hide
        Jonathan Hsieh added a comment -

        Zhihong, I'm curious to learn about the approach you have taken in the prototype that you have. Is it on github somewhere perhaps?

        If you have multiple hlogs do you use a different hlog in different regions?
        Do you have a shim that looks like an hlog but has two hlogs inside it (as opposed to hdfs file handles)?

        Show
        Jonathan Hsieh added a comment - Zhihong, I'm curious to learn about the approach you have taken in the prototype that you have. Is it on github somewhere perhaps? If you have multiple hlogs do you use a different hlog in different regions? Do you have a shim that looks like an hlog but has two hlogs inside it (as opposed to hdfs file handles)?
        Hide
        Ted Yu added a comment -

        If you have multiple hlogs do you use a different hlog in different regions?

        Correct.

        I have to go through legal procedure at my employer before disclosing my patch.

        Show
        Ted Yu added a comment - If you have multiple hlogs do you use a different hlog in different regions? Correct. I have to go through legal procedure at my employer before disclosing my patch.
        Hide
        ramkrishna.s.vasudevan added a comment -

        We are also interested in this.
        Worked on a prototype with having one HLog instance but underlying there will be multiple writer instances. The regions will be allocated with any one of the writer instance and each region will be writing to hlog using the instance associated with it.

        Even on logrolling the instances against each region will be updated and the region will continue to use its mapping.
        Without patch
        ~53K puts/sec.

        With patch
        ~78-80k puts/sec

        It is a 3 node cluster and the size of each record was 1k. No of regions : 2800
        By default used 3 writer instances. I was able to pass the testcases related to TestHlog and TestDistributedLogSplitting. But Testmasterreplication was not passing.
        Replication needs some change based on this which i did not work on much.

        The pendingWrites list that we use is now converted into a map having the writer with the list of pending writes.

        Pls provide your suggestions on this.
        BTW, Li Pi, any progress on this? I would love to help you in this.
        May be i can prepare a more forma patch and upload over here.

        Show
        ramkrishna.s.vasudevan added a comment - We are also interested in this. Worked on a prototype with having one HLog instance but underlying there will be multiple writer instances. The regions will be allocated with any one of the writer instance and each region will be writing to hlog using the instance associated with it. Even on logrolling the instances against each region will be updated and the region will continue to use its mapping. Without patch ~53K puts/sec. With patch ~78-80k puts/sec It is a 3 node cluster and the size of each record was 1k. No of regions : 2800 By default used 3 writer instances. I was able to pass the testcases related to TestHlog and TestDistributedLogSplitting. But Testmasterreplication was not passing. Replication needs some change based on this which i did not work on much. The pendingWrites list that we use is now converted into a map having the writer with the list of pending writes. Pls provide your suggestions on this. BTW, Li Pi, any progress on this? I would love to help you in this. May be i can prepare a more forma patch and upload over here.
        Hide
        Ted Yu added a comment -

        @Ramkrishna:
        Your numbers look better than mine though the mix in my case was 50% updates and 50% puts.

        Can you publish latency numbers as well ?

        Show
        Ted Yu added a comment - @Ramkrishna: Your numbers look better than mine though the mix in my case was 50% updates and 50% puts. Can you publish latency numbers as well ?
        Hide
        Lars Hofhansl added a comment -

        Should we explore a WAL per Region? Would be a lot of open files, but if it'd work, we won't need log spitting anymore.

        Show
        Lars Hofhansl added a comment - Should we explore a WAL per Region? Would be a lot of open files, but if it'd work, we won't need log spitting anymore.
        Hide
        Ted Yu added a comment -

        There would be many regions in a cluster. They may not receive even write load.

        We should set configuration parameter which governs the maximum number of concurrent WALs on each region server.

        Show
        Ted Yu added a comment - There would be many regions in a cluster. They may not receive even write load. We should set configuration parameter which governs the maximum number of concurrent WALs on each region server.
        Hide
        Li Pi added a comment -

        My design is a bit different. Ill upload a patch soon. I'm doing any region
        to any blog. Currently distributed log splitting and replication do not
        work yet.

        Show
        Li Pi added a comment - My design is a bit different. Ill upload a patch soon. I'm doing any region to any blog. Currently distributed log splitting and replication do not work yet.
        Hide
        Li Pi added a comment -

        Btw. I have finals and other stuff coming up. So it might be a while before
        I finish my implementation. If anyone else wants to take a go at it. This
        is cool.

        Show
        Li Pi added a comment - Btw. I have finals and other stuff coming up. So it might be a while before I finish my implementation. If anyone else wants to take a go at it. This is cool.
        Hide
        Lars Hofhansl added a comment - - edited

        I suspect this will become more important when people eventually turn on HBASE-5954 (durable sync, if they don't run in data centers with backup power supplies).

        There would be many regions in a cluster. They may not receive even write load.

        Is that necessarily a problem? Just saying that while we are exploring this, might as well explore this option as well. I for one be happy if a region's edits are tied to that region and log splitting could just go away (well almost, would still need to split if the region is split).

        Show
        Lars Hofhansl added a comment - - edited I suspect this will become more important when people eventually turn on HBASE-5954 (durable sync, if they don't run in data centers with backup power supplies). There would be many regions in a cluster. They may not receive even write load. Is that necessarily a problem? Just saying that while we are exploring this, might as well explore this option as well. I for one be happy if a region's edits are tied to that region and log splitting could just go away (well almost, would still need to split if the region is split).
        Hide
        Todd Lipcon added a comment -

        I think with durable sync, having a WAL-per-region would be even less feasible than it is today – we currently depend on batching in order to get good throughput. If a server has 50 regions, then you'd get 50x less batching opportunity and write throughput would grind to a halt. Imagine a fan-out write to all of the regions – it would generate 50 disk seeks instead of just 1.

        Show
        Todd Lipcon added a comment - I think with durable sync, having a WAL-per-region would be even less feasible than it is today – we currently depend on batching in order to get good throughput. If a server has 50 regions, then you'd get 50x less batching opportunity and write throughput would grind to a halt. Imagine a fan-out write to all of the regions – it would generate 50 disk seeks instead of just 1.
        Hide
        Lars Hofhansl added a comment -

        Good point.

        Was referring to the general feature, not necessarily WAL/Region.
        It's a trade off: Batching vs. parallel writes (just to state the obvious)

        Do we batch beyond a region normally, though? Maybe during cache flush.

        Yeah, WAL/Region with sync is probably not a good idea, there just won't be enough spindles in the HDFS cluster to absorb that.

        So what's a good heuristic for the number of WALs? Maybe (assuming good block distribution and that HBase is the only user of the cluster) it should be around #spindles/#replicas...?

        Show
        Lars Hofhansl added a comment - Good point. Was referring to the general feature, not necessarily WAL/Region. It's a trade off: Batching vs. parallel writes (just to state the obvious) Do we batch beyond a region normally, though? Maybe during cache flush. Yeah, WAL/Region with sync is probably not a good idea, there just won't be enough spindles in the HDFS cluster to absorb that. So what's a good heuristic for the number of WALs? Maybe (assuming good block distribution and that HBase is the only user of the cluster) it should be around #spindles/#replicas...?
        Hide
        ramkrishna.s.vasudevan added a comment -

        Perf results.
        @Ted
        The file attached also has the latency results. Run using LoadTestTool. Sorry for being little late. Patch will upload later.

        Show
        ramkrishna.s.vasudevan added a comment - Perf results. @Ted The file attached also has the latency results. Run using LoadTestTool. Sorry for being little late. Patch will upload later.
        Hide
        Ted Yu added a comment -

        Can you run ycsb with 50% insert and 50% update load ?
        Performance numbers in attachment match what I got based on my implementation.

        Thanks

        Show
        Ted Yu added a comment - Can you run ycsb with 50% insert and 50% update load ? Performance numbers in attachment match what I got based on my implementation. Thanks
        Hide
        Lars Hofhansl added a comment -

        This is related HBASE-6116.
        HBASE-6116 would improve latency, whereas this issues would mostly improve throughput.

        Show
        Lars Hofhansl added a comment - This is related HBASE-6116 . HBASE-6116 would improve latency, whereas this issues would mostly improve throughput.
        Hide
        Lars Hofhansl added a comment -

        @Ted or @Ram: If you have any chance to test HBASE-6116 as well, that'd be really cool (although it would be more effort, as it only works against Hadoop trunk - and soon Hadoop 2.0-alpha).
        Andy said he might test against EC2.

        Show
        Lars Hofhansl added a comment - @Ted or @Ram: If you have any chance to test HBASE-6116 as well, that'd be really cool (although it would be more effort, as it only works against Hadoop trunk - and soon Hadoop 2.0-alpha). Andy said he might test against EC2.
        Hide
        stack added a comment -

        Whats the high level on the perf numbers? Does more WALs help? How much? Thanks.

        Show
        stack added a comment - Whats the high level on the perf numbers? Does more WALs help? How much? Thanks.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Ted
        The ycsb report we will get it tomorrow. Today environemnt is busy.
        @Lars
        We will try to check HBASE-6116 also but not very sure if in the next couple of days. Anyway will try.

        Show
        ramkrishna.s.vasudevan added a comment - @Ted The ycsb report we will get it tomorrow. Today environemnt is busy. @Lars We will try to check HBASE-6116 also but not very sure if in the next couple of days. Anyway will try.
        Hide
        Lars Hofhansl added a comment -

        I think we should wait for test result with HBASE-6116 before we invest more time in this.
        My gut feeling tells me, that is something that is better handled at the HDFS level.

        Show
        Lars Hofhansl added a comment - I think we should wait for test result with HBASE-6116 before we invest more time in this. My gut feeling tells me, that is something that is better handled at the HDFS level.
        Hide
        Ted Yu added a comment -

        As I mentioned in HBASE-6055 @ 04/Jun/12 17:47, one of the benefits of this feature is for each HLog file to receive edits for one single table.

        Show
        Ted Yu added a comment - As I mentioned in HBASE-6055 @ 04/Jun/12 17:47, one of the benefits of this feature is for each HLog file to receive edits for one single table.
        Hide
        Todd Lipcon added a comment -

        I think we should wait for test result with HBASE-6116 before we invest more time in this.

        HBASE-6116 seems like it would improve latency but hurt throughput – on a typical gbit link, the parallel writes would limit us to 50M/sec for 3 replicas, whereas pipelined writes could give us 100M+.

        The other main advantage of this JIRA is that the speed of the WAL is currently limited to the minimum speed of the 3 disks chosen in the pipeline. Given that disks can be heavily loaded, the probability of getting even a full disk's worth of throughput is low – the likelihood is that at least one of those disks is also being written to or read from at least another client. So typically any single HDFS stream is limited to 35-40MB/sec in my experience.

        Given that gbit is much faster than this, we can get better throughput by adding parallel WALs, so as to stripe across disks and dynamically push writes to less-loaded disks.

        Show
        Todd Lipcon added a comment - I think we should wait for test result with HBASE-6116 before we invest more time in this. HBASE-6116 seems like it would improve latency but hurt throughput – on a typical gbit link, the parallel writes would limit us to 50M/sec for 3 replicas, whereas pipelined writes could give us 100M+. The other main advantage of this JIRA is that the speed of the WAL is currently limited to the minimum speed of the 3 disks chosen in the pipeline. Given that disks can be heavily loaded, the probability of getting even a full disk's worth of throughput is low – the likelihood is that at least one of those disks is also being written to or read from at least another client. So typically any single HDFS stream is limited to 35-40MB/sec in my experience. Given that gbit is much faster than this, we can get better throughput by adding parallel WALs, so as to stripe across disks and dynamically push writes to less-loaded disks.
        Hide
        Lars Hofhansl added a comment -

        Assuming Datanodes and RegionServers are colocated no more bits will have to cross the (aggregate) "wires". Further assuming good load balancing within HBase the net bandwidth is still spread over the cluster (but with lower latency at each RegionServer).
        So I do not believe that HBASE-6116 will actually hurt performance.

        The key question is whether WAL writing is mostly bound by latency or bandwidth (And I do not know.)
        Do we get 35-40mb throughput from writing the WAL? If not, it is likely bound by latency.

        Show
        Lars Hofhansl added a comment - Assuming Datanodes and RegionServers are colocated no more bits will have to cross the (aggregate) "wires". Further assuming good load balancing within HBase the net bandwidth is still spread over the cluster (but with lower latency at each RegionServer). So I do not believe that HBASE-6116 will actually hurt performance. The key question is whether WAL writing is mostly bound by latency or bandwidth (And I do not know.) Do we get 35-40mb throughput from writing the WAL? If not, it is likely bound by latency.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3408 (See https://builds.apache.org/job/HBase-TRUNK/3408/)
        HBASE-5699 Refactor HLog into an interface (Revision 1393126)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HLogInputFormat.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/LogCleaner.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogMetrics.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogPrettyPrinter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/MetaUtils.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCacheOnWriteInSchema.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtilsForTests.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogMethods.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3408 (See https://builds.apache.org/job/HBase-TRUNK/3408/ ) HBASE-5699 Refactor HLog into an interface (Revision 1393126) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HLogInputFormat.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/LogCleaner.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogPrettyPrinter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/MetaUtils.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCacheOnWriteInSchema.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtilsForTests.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogMethods.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALActionsListener.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
        Hide
        Nicolas Liochon added a comment -

        If we implement this, we should test the impact on MTTR as well. My fear is that we will have much more lease to recover, and the way it's written today (one after this other), it could make failure recovery much slower on a small cluster.

        Show
        Nicolas Liochon added a comment - If we implement this, we should test the impact on MTTR as well. My fear is that we will have much more lease to recover, and the way it's written today (one after this other), it could make failure recovery much slower on a small cluster.
        Hide
        Anoop Sam John added a comment -

        May be we need to combine efforts here with HBASE-7835
        Jeffrey Zhong Working with HBASE-7835 where he try to do WAL replay with HTable#put()

        I had added below comment to HBASE-7835

        I was thinking on this area. We have different JIRAs now related to HLOG and its split and replay.
        This one + HBASE-6772 + multi WAL...
        Can we think on all together.
        For multi WAL if we have a fixed set of regions for one WAL approach, during one RS down Master can assign those regions(Try max) to another one RS [Region groups in RS]. If the corresponding HLog file also assigned to that RS, then for the replay it can directly do puts on the region rather than IPC.

        If we can do all these I think MTTR also can be improved.
        I will start working with JIRA (along with Ram) from next week.

        Show
        Anoop Sam John added a comment - May be we need to combine efforts here with HBASE-7835 Jeffrey Zhong Working with HBASE-7835 where he try to do WAL replay with HTable#put() I had added below comment to HBASE-7835 I was thinking on this area. We have different JIRAs now related to HLOG and its split and replay. This one + HBASE-6772 + multi WAL... Can we think on all together. For multi WAL if we have a fixed set of regions for one WAL approach, during one RS down Master can assign those regions(Try max) to another one RS [Region groups in RS] . If the corresponding HLog file also assigned to that RS, then for the replay it can directly do puts on the region rather than IPC. If we can do all these I think MTTR also can be improved. I will start working with JIRA (along with Ram) from next week.
        Hide
        Ted Yu added a comment -

        we will have much more lease to recover

        At the beginning of recovery, master can send lease recovery requests for outstanding WAL files using thread pool.
        Each split worker would first check whether the WAL file it processes is closed.

        Show
        Ted Yu added a comment - we will have much more lease to recover At the beginning of recovery, master can send lease recovery requests for outstanding WAL files using thread pool. Each split worker would first check whether the WAL file it processes is closed.
        Hide
        stack added a comment -

        This issue adds switching between WALs

        Show
        stack added a comment - This issue adds switching between WALs

          People

          • Assignee:
            Li Pi
            Reporter:
            binlijin
          • Votes:
            0 Vote for this issue
            Watchers:
            50 Start watching this issue

            Dates

            • Created:
              Updated:

              Development