A few more questions...
What's the persistence story? How many nodes is the log data stored on?
On performance, how about 100-200k ops/sec with data sizes about 150 bytes
or so? This would be aggregately generated on 20 nodes.
On Mar 12, 2010 2:18 PM, "Flavio Paiva Junqueira (JIRA)" <email@example.com>
Flavio Paiva Junqueira commented on HBASE-2315:
Those are good questions, Stack. BookKeeper scales throughput with the
number of servers, so adding more bookies should increase your current
throughput if your traffic is not saturating the network already.
Consequently, if you don't have a constraint on the number of bookies you'd
like to use, your limitation would be the amount of bandwidth you have
Just to give you some numbers, we have so far been able to saturate the
network when writing around 1k bytes per entry, and the number of writes/s
for 1k writes is of the order of tens of thousands for 3-5 bookies. Now, if
I pick the largest numbers in your small example to consider the worst case
(5 nodes, 1MB writes, 5k writes/s), then we would need a 40Gbit/s network,
so I'm not sure you can do it with any distributed system unless you write
locally in all nodes, in which case you can't guarantee the data will be
available upon a node crash. Let me know if I'm misinterpreting your comment
and you have something else in mind.
I also have to mention that we added a feature to enable thousands of
concurrent ledgers with minimal performance penalty on the writes, so I
don't see any trouble in increasing the number of concurrent nodes as long
as the BookKeeper cluster is provisioned accordingly. Of course, it would be
great to measure it with hbase, though.