Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
In a high performance API or low latency stream workers, you often do not want to incur costs on the first few requests. In these cases, you want to warm connections before ever adding to the load balancer or processing group.
Upon first creating a Connection, there are two areas that can slow down the first few requests:
- Fetching region locations
- Creating the initial connection to each RegionServer, which sends connection headers, possibly does auth handshakes, etc.
A user can easily work around the first slowness by calling Table.getRegionLocator().getAllRegionLocations().
It's more challenging for a user to warm the actual RegionServer connections. One way we have done this is to use a RegionLocator to fetch all locations for a table, reduce that down to 1 region per server, and then issue a Get to each row. We end up repeating this for every table that a process may connect to, because at the level we do this we can't easily tell which servers have already been warmed. We also have run into various bugs over time, for example where an empty startkey causes a Get to fail.
We can make this easier for the users by providing an API which uses Connection internals to as cheaply as possible warm these connections. I'd propose we add the following:
New Table/AsyncTable method warmConnections(). This would do the following:
- use region locator to fetch all locations (with caching)
- reduce returned locations to unique ServerNames
- for each ServerName (with lock):
- if already warmed, skip
- otherwise, get a connection to that server and send an initial request to trigger socket creation/connection header/etc
With this API, if someone is connecting to multiple tables, they could warm each of them Table in parallel and we'd only create connections to each server once.
Attachments
Issue Links
- relates to
-
HBASE-27764 scan table is slow when enter hbase shell first time
- Open