> From: Michael Stack firstname.lastname@example.org
> Sent: Wednesday, October 22, 2008 8:53 PM
> To: email@example.com
> Subject: Re: svn commit: r707247 - in /hadoop/hbase/trunk: ./ conf/
> How does new feature effect hbase throughput? Does it make it slower?
> Faster? Any measurement done?
I measured PerformanceEvaluation random write 1 with one region server
before and after the appends patch.
I would say that throughput is either the same or a little faster.
I only ran one run on the code before appends, and this test completed
in 2 minutes 31 seconds
In fixing up a couple of bugs in appends, I have run this test 5 times.
The slowest was 2 minutes 33 seconds, but the other times were all faster:
2:24, 2:20, 2:21 and 2:21.
> Do appends work for hbase? Did you try crashing an hbase server and
> see if it comes back up with only a few edits lost?
Yes, after fixing
HBASE-952 and HBASE-954.
> I was thinking that the size of the log file is a better measure of
> when to rotate given that there can be a wide divergence in WAL log
> file size but maybe not given that flush sequenceids are pegged
> against a particular edit.
This could be done either way and I have no preference. With the default
settings, running PerformanceEvaluation random write 1 with one region
server, the HLogs were about 160MB. It might be nice to use the file size
so we can get closer to a multiple of HDFS block size. Doing so, might
be better in the general case, which is any application except
PerformanceEvaluation. In some cases, we might put more updates into a
log (if keys and values are small), and in others we might put fewer
(when keys and values are large). Being close to a multiple of HDFS block
size is probably a good thing, so I am kind of leaning toward log size
instead of number of updates. What do others think?
> I like how Flusher has had a bunch of code purged.
Without the time based cache flush and the time based log roll, yes things
have gotten simpler and we don't end up creating small MapFiles or small
> We have convention naming threads. Its name of server –
> master/regionserver host and port – followed by the what thread does
> (This used to be hlog? Or log?). Makes it easy sorting them out in
> thread dump.
Currently the thread is named HLog. Would it be preferable to name it
<servername>.Hlog ? Log entries only appear in one region server's log.
Does it matter?
> Should this Log thread inherit from Chore?
Currently only the root, meta scanners and CleanOldTransactions (in
regionserver.transactional) extend chore. This change was made a while
back, but I can't remember why. Should all the threads in HRS and HMaster
extend Chore? We would need to add the "interrupt politely" method,
but I can't think of a reason we shouldn't do this (as a separate Jira).
> There is a place in HRS where all service threads are started. Now
> HLog is a Thread, should it be moved in there? Into startServiceThreads?
Currently, the HLog thread is started by HRS.setupHLog. Since it is called
from multiple locations, moving the thread start to startServiceThreads,
would involve extra synchronization.
However I note that the HLog thread is not set to be a daemon thread, which
should probably be fixed.