Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: regionserver
    • Labels:
      None

      Description

      I thank we should open a ticket to track what needs changed to support appends when the coding is done on HADOOP-1700.

        Issue Links

          Activity

          Hide
          Billy Pearson added a comment -

          The Major item needing appends in HBase is HLog so we can handle crashed regional servers with out data loss. Currently we could lose up to 30K updates if a server fails.

          I was thanking we should maybe thank of something like mysql does with innodb database
          innodb_flush_log_at_trx_commit

          Options:
          1 = HLog buffer is flushed to disk with each update.
          2 = HLog buffer is flushed to disk once per second or every x secs.
          3 = HLog buffer is flushed to disk when reached x size.

          Clearly option 1 would be slower per update as we would have to append to the file with each change but that would allow us an option to have zero data loss and options 2-3 would give users option to chose how much data lose would be acceptable for a performance increase.

          Show
          Billy Pearson added a comment - The Major item needing appends in HBase is HLog so we can handle crashed regional servers with out data loss. Currently we could lose up to 30K updates if a server fails. I was thanking we should maybe thank of something like mysql does with innodb database innodb_flush_log_at_trx_commit Options: 1 = HLog buffer is flushed to disk with each update. 2 = HLog buffer is flushed to disk once per second or every x secs. 3 = HLog buffer is flushed to disk when reached x size. Clearly option 1 would be slower per update as we would have to append to the file with each change but that would allow us an option to have zero data loss and options 2-3 would give users option to chose how much data lose would be acceptable for a performance increase.
          Hide
          Billy Pearson added a comment -

          Upgraded to a blocker now that append is supported in hadoop 0.18.0
          I thank this is a #1 for 0.3.0 to make it where we can run with out data loss from hlogs on failed region servers.

          Show
          Billy Pearson added a comment - Upgraded to a blocker now that append is supported in hadoop 0.18.0 I thank this is a #1 for 0.3.0 to make it where we can run with out data loss from hlogs on failed region servers.
          Hide
          stack added a comment -

          You think we should target hbase 0.3.0 for hadoop 0.19.0 altogether (since 0.2.0 seems to run on hadoop 0.18 – though we haven't tested it much). 0.3.0 would be performance + implement appends?

          Show
          stack added a comment - You think we should target hbase 0.3.0 for hadoop 0.19.0 altogether (since 0.2.0 seems to run on hadoop 0.18 – though we haven't tested it much). 0.3.0 would be performance + implement appends?
          Hide
          Jean-Daniel Cryans added a comment -

          Maybe we should add appends in the 0.2 branch since it is the one that is supposed to provide more reliability. Once 0.2.0 is out, this should be a priority.

          Show
          Jean-Daniel Cryans added a comment - Maybe we should add appends in the 0.2 branch since it is the one that is supposed to provide more reliability. Once 0.2.0 is out, this should be a priority.
          Hide
          Billy Pearson added a comment -

          I thank they added append to 0.18.0 and we only support 0.17.1 with 0.2.0 we could bump 0.2.0 to 0.18.0
          Might be something to thank about sense we have not release 0.2.0

          Show
          Billy Pearson added a comment - I thank they added append to 0.18.0 and we only support 0.17.1 with 0.2.0 we could bump 0.2.0 to 0.18.0 Might be something to thank about sense we have not release 0.2.0
          Hide
          stack added a comment -

          J-D:

          hbase can't fall too far behind hadoop releases. We need to always have an offering for current hadoops. If 0.2.0 hbase works on both 0.17 and 0.18 hadoop – we need to test a little more to be sure – then we can leapfrog 0.3.0 hbase back on to the hadoop TRUNK. More work on 0.2.0 I'm afraid will just have us falling further behind. Also, will exploiting the append additions to hadoop 0.18 in hbase 0.2.0 make it so it we're not API compatible with hadoop 0.17 (I may be wrong on the latter but I seem to recall rework of flush API).

          Billy:

          Though 0.17 hadoop seems to be slower than 0.16, IMO, we should have an hbase for all hadoop versions.

          Show
          stack added a comment - J-D: hbase can't fall too far behind hadoop releases. We need to always have an offering for current hadoops. If 0.2.0 hbase works on both 0.17 and 0.18 hadoop – we need to test a little more to be sure – then we can leapfrog 0.3.0 hbase back on to the hadoop TRUNK. More work on 0.2.0 I'm afraid will just have us falling further behind. Also, will exploiting the append additions to hadoop 0.18 in hbase 0.2.0 make it so it we're not API compatible with hadoop 0.17 (I may be wrong on the latter but I seem to recall rework of flush API). Billy: Though 0.17 hadoop seems to be slower than 0.16, IMO, we should have an hbase for all hadoop versions.
          Hide
          Jean-Daniel Cryans added a comment -

          stack, I totally agree. In fact, by saying 0.2 branch and not 0.2.0, I meant to release the append feature for a minor version. I also totally agree that there should be an HBase for all hadoop versions; I think we should even release as often as them even if it's for a small set of features. A strong argument for this is current migration to 0.2.0 which speaks for itself.

          Show
          Jean-Daniel Cryans added a comment - stack, I totally agree. In fact, by saying 0.2 branch and not 0.2.0, I meant to release the append feature for a minor version. I also totally agree that there should be an HBase for all hadoop versions; I think we should even release as often as them even if it's for a small set of features. A strong argument for this is current migration to 0.2.0 which speaks for itself.
          Hide
          Billy Pearson added a comment -

          If we plan on releasing for each version then we need to be looking to do a feature freeze on 0.3.0 soon so we can roll it out fast and start work on 0.4.0 that will tagged for 0.19.0

          Show
          Billy Pearson added a comment - If we plan on releasing for each version then we need to be looking to do a feature freeze on 0.3.0 soon so we can roll it out fast and start work on 0.4.0 that will tagged for 0.19.0
          Hide
          Billy Pearson added a comment -

          Re reading bigtable paper found this

          To protect mutations from GFS latency spikes,
          each tablet server actually has two log writing threads,
          each writing to its own log le; only one of these two
          threads is actively in use at a time. If writes to the active
          log le are performing poorly, the log le writing is
          switched to the other thread, and mutations that are in
          the commit log queue are written by the newly active log
          writing thread. Log entries contain sequence numbers
          to allow the recovery process to elide duplicated entries
          resulting from this log switching process.
          
          Show
          Billy Pearson added a comment - Re reading bigtable paper found this To protect mutations from GFS latency spikes, each tablet server actually has two log writing threads, each writing to its own log le; only one of these two threads is actively in use at a time. If writes to the active log le are performing poorly, the log le writing is switched to the other thread, and mutations that are in the commit log queue are written by the newly active log writing thread. Log entries contain sequence numbers to allow the recovery process to elide duplicated entries resulting from this log switching process.
          Hide
          Jim Kellerman added a comment -

          It is unlikely that we will implement dual log writing threads for the first release that uses hadoop append.

          We don't know anything about the performance of append at this point, and writing the HLog is not a bottleneck at this time.

          Show
          Jim Kellerman added a comment - It is unlikely that we will implement dual log writing threads for the first release that uses hadoop append. We don't know anything about the performance of append at this point, and writing the HLog is not a bottleneck at this time.
          Hide
          Billy Pearson added a comment -

          I know just adding ideas on this so when we start adding this option we have thing we cna thank on and test if we run in to bottlenecks in the append feature

          Show
          Billy Pearson added a comment - I know just adding ideas on this so when we start adding this option we have thing we cna thank on and test if we run in to bottlenecks in the append feature
          Hide
          Jim Kellerman added a comment -

          Good point, Billy, thanks for writing this down so we don't lose it.

          Later on, we may want to make this a separate issue.

          Show
          Jim Kellerman added a comment - Good point, Billy, thanks for writing this down so we don't lose it. Later on, we may want to make this a separate issue.
          Hide
          Jim Kellerman added a comment -

          Before resolving this issue, I will investigate HBASE-888. It may no longer be an issue.

          In the mean time, if you are on trunk, enjoy reliable and faster HBase.

          Show
          Jim Kellerman added a comment - Before resolving this issue, I will investigate HBASE-888 . It may no longer be an issue. In the mean time, if you are on trunk, enjoy reliable and faster HBase.
          Hide
          stack added a comment -

          I made comments against the hbase-dev mail message.

          Show
          stack added a comment - I made comments against the hbase-dev mail message.
          Hide
          stack added a comment -

          Oh, ain't the fact that we now have appends a reason to go drinking?

          Show
          stack added a comment - Oh, ain't the fact that we now have appends a reason to go drinking?
          Hide
          Jim Kellerman added a comment -

          > From: Michael Stack stack@duboce.net
          > Sent: Wednesday, October 22, 2008 8:53 PM
          > To: hbase-dev@hadoop.apache.org
          > Subject: Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/
          > src/java/org/apache/hadoop/hbase/regionserver/
          >
          > How does new feature effect hbase throughput? Does it make it slower?
          > Faster? Any measurement done?

          I measured PerformanceEvaluation random write 1 with one region server
          before and after the appends patch.

          I would say that throughput is either the same or a little faster.

          I only ran one run on the code before appends, and this test completed
          in 2 minutes 31 seconds

          In fixing up a couple of bugs in appends, I have run this test 5 times.
          The slowest was 2 minutes 33 seconds, but the other times were all faster:
          2:24, 2:20, 2:21 and 2:21.

          > Do appends work for hbase? Did you try crashing an hbase server and
          > see if it comes back up with only a few edits lost?

          Yes, after fixing HBASE-952 and HBASE-954.

          > I was thinking that the size of the log file is a better measure of
          > when to rotate given that there can be a wide divergence in WAL log
          > file size but maybe not given that flush sequenceids are pegged
          > against a particular edit.

          This could be done either way and I have no preference. With the default
          settings, running PerformanceEvaluation random write 1 with one region
          server, the HLogs were about 160MB. It might be nice to use the file size
          so we can get closer to a multiple of HDFS block size. Doing so, might
          be better in the general case, which is any application except
          PerformanceEvaluation. In some cases, we might put more updates into a
          log (if keys and values are small), and in others we might put fewer
          (when keys and values are large). Being close to a multiple of HDFS block
          size is probably a good thing, so I am kind of leaning toward log size
          instead of number of updates. What do others think?

          > I like how Flusher has had a bunch of code purged.

          Without the time based cache flush and the time based log roll, yes things
          have gotten simpler and we don't end up creating small MapFiles or small
          log files.

          > We have convention naming threads. Its name of server –
          > master/regionserver host and port – followed by the what thread does
          > (This used to be hlog? Or log?). Makes it easy sorting them out in
          > thread dump.

          Currently the thread is named HLog. Would it be preferable to name it
          <servername>.Hlog ? Log entries only appear in one region server's log.
          Does it matter?

          > Should this Log thread inherit from Chore?

          Currently only the root, meta scanners and CleanOldTransactions (in
          regionserver.transactional) extend chore. This change was made a while
          back, but I can't remember why. Should all the threads in HRS and HMaster
          extend Chore? We would need to add the "interrupt politely" method,
          but I can't think of a reason we shouldn't do this (as a separate Jira).

          > There is a place in HRS where all service threads are started. Now
          > HLog is a Thread, should it be moved in there? Into startServiceThreads?

          Currently, the HLog thread is started by HRS.setupHLog. Since it is called
          from multiple locations, moving the thread start to startServiceThreads,
          would involve extra synchronization.

          However I note that the HLog thread is not set to be a daemon thread, which
          should probably be fixed.

          Show
          Jim Kellerman added a comment - > From: Michael Stack stack@duboce.net > Sent: Wednesday, October 22, 2008 8:53 PM > To: hbase-dev@hadoop.apache.org > Subject: Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/ > src/java/org/apache/hadoop/hbase/regionserver/ > > How does new feature effect hbase throughput? Does it make it slower? > Faster? Any measurement done? I measured PerformanceEvaluation random write 1 with one region server before and after the appends patch. I would say that throughput is either the same or a little faster. I only ran one run on the code before appends, and this test completed in 2 minutes 31 seconds In fixing up a couple of bugs in appends, I have run this test 5 times. The slowest was 2 minutes 33 seconds, but the other times were all faster: 2:24, 2:20, 2:21 and 2:21. > Do appends work for hbase? Did you try crashing an hbase server and > see if it comes back up with only a few edits lost? Yes, after fixing HBASE-952 and HBASE-954 . > I was thinking that the size of the log file is a better measure of > when to rotate given that there can be a wide divergence in WAL log > file size but maybe not given that flush sequenceids are pegged > against a particular edit. This could be done either way and I have no preference. With the default settings, running PerformanceEvaluation random write 1 with one region server, the HLogs were about 160MB. It might be nice to use the file size so we can get closer to a multiple of HDFS block size. Doing so, might be better in the general case, which is any application except PerformanceEvaluation. In some cases, we might put more updates into a log (if keys and values are small), and in others we might put fewer (when keys and values are large). Being close to a multiple of HDFS block size is probably a good thing, so I am kind of leaning toward log size instead of number of updates. What do others think? > I like how Flusher has had a bunch of code purged. Without the time based cache flush and the time based log roll, yes things have gotten simpler and we don't end up creating small MapFiles or small log files. > We have convention naming threads. Its name of server – > master/regionserver host and port – followed by the what thread does > (This used to be hlog? Or log?). Makes it easy sorting them out in > thread dump. Currently the thread is named HLog. Would it be preferable to name it <servername>.Hlog ? Log entries only appear in one region server's log. Does it matter? > Should this Log thread inherit from Chore? Currently only the root, meta scanners and CleanOldTransactions (in regionserver.transactional) extend chore. This change was made a while back, but I can't remember why. Should all the threads in HRS and HMaster extend Chore? We would need to add the "interrupt politely" method, but I can't think of a reason we shouldn't do this (as a separate Jira). > There is a place in HRS where all service threads are started. Now > HLog is a Thread, should it be moved in there? Into startServiceThreads? Currently, the HLog thread is started by HRS.setupHLog. Since it is called from multiple locations, moving the thread start to startServiceThreads, would involve extra synchronization. However I note that the HLog thread is not set to be a daemon thread, which should probably be fixed.
          Hide
          Jean-Daniel Cryans added a comment -

          Testing with hbase.regionserver.flushlogentries=1 on 1 client and one HRS doesn't give bigger or smaller numbers. Would need to be tested on a cluster.

          Show
          Jean-Daniel Cryans added a comment - Testing with hbase.regionserver.flushlogentries=1 on 1 client and one HRS doesn't give bigger or smaller numbers. Would need to be tested on a cluster.
          Hide
          Jim Kellerman added a comment -

          HBASE-956 Master and region server threads should extend Chore

          Chore: Note that chores are repetitive tasks that do not wake up when there is work to be done.

          Leases: are not Chores, because they wait on a DelayQueue rather than doing the same thing.

          HMaster: is not a Chore because the run loop waits for an entry on the toDoQueue or delayedToDoQueue

          CompactSplitThred: is not a Chore because it waits for compaction requests

          HLog: neither a Thread nor a Chore. It is now just an object which manages the HLog

          LogRoller is a Thread and not a Chore because it waits for log roll requests.

          HBASE-955 Stop HLog thread before starting a new one

          No longer applies as HLog is neither a Thread nor a Chore. HLog has new method HLog.optionalSync.

          HRegionServer:

          • starts new Chore which is LogFlusher. Thread is named consistantly with other threads, region server tells the LogFlusher whenever the HLog object changes.
          • renamed Flusher to MemcacheFlusher so it would not be confused with LogFlusher

          Rename Flusher to MemcacheFlusher

          New Chore LogFlusher. Allows client (HRegionServer) to change the HLog instance being used, calls HLog.optionalSync() every threadWakeInterval.

          Show
          Jim Kellerman added a comment - HBASE-956 Master and region server threads should extend Chore Chore: Note that chores are repetitive tasks that do not wake up when there is work to be done. Leases: are not Chores, because they wait on a DelayQueue rather than doing the same thing. HMaster: is not a Chore because the run loop waits for an entry on the toDoQueue or delayedToDoQueue CompactSplitThred: is not a Chore because it waits for compaction requests HLog: neither a Thread nor a Chore. It is now just an object which manages the HLog LogRoller is a Thread and not a Chore because it waits for log roll requests. HBASE-955 Stop HLog thread before starting a new one No longer applies as HLog is neither a Thread nor a Chore. HLog has new method HLog.optionalSync. HRegionServer: starts new Chore which is LogFlusher. Thread is named consistantly with other threads, region server tells the LogFlusher whenever the HLog object changes. renamed Flusher to MemcacheFlusher so it would not be confused with LogFlusher Rename Flusher to MemcacheFlusher New Chore LogFlusher. Allows client (HRegionServer) to change the HLog instance being used, calls HLog.optionalSync() every threadWakeInterval.
          Hide
          Jim Kellerman added a comment -

          Moved HBASE-888 out to separate issue as it is not clear it still is applicable.

          Show
          Jim Kellerman added a comment - Moved HBASE-888 out to separate issue as it is not clear it still is applicable.
          Hide
          Jim Kellerman added a comment -

          Resolving issue as all sub-issues have been resolved. If HBASE-888 is still an issue, it will be dealt with separately.

          Show
          Jim Kellerman added a comment - Resolving issue as all sub-issues have been resolved. If HBASE-888 is still an issue, it will be dealt with separately.

            People

            • Assignee:
              Jim Kellerman
              Reporter:
              Billy Pearson
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development