[HBASE-13090] Progress heartbeats for long running scanners - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0, 2.0.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Previously, there was no way to enforce a time limit on scan RPC requests. The server would receive a scan RPC request and take as much time as it needed to accumulate enough results to reach a limit or exhaust the region. The problem with this approach was that, in the case of a very selective scan, the processing of the scan could take too long and cause timeouts client side.

With this fix, the server will now enforce a time limit on the execution of scan RPC requests. When a scan RPC request arrives to the server, a time limit is calculated to be half of whichever timeout value is more restictive between the configurations ("hbase.client.scanner.timeout.period" and "hbase.rpc.timeout"). When the time limit is reached, the server will return whatever results it has accumulated up to that point. The results may be empty.

To ensure that timeout checks do not occur too often (which would hurt the performance of scans), the configuration "hbase.cells.scanned.per.heartbeat.check" has been introduced. This configuration controls how often System.currentTimeMillis() is called to update the progress towards the time limit. Currently, the default value of this configuration value is 10000. Specifying a smaller value will provide a tighter bound on the time limit, but may hurt scan performance due to the higher frequency of calls to System.currentTimeMillis().

Protobuf models for ScanRequest and ScanResponse have been updated so that heartbeat support can be communicated. Support for heartbeat messages is specified in the request sent to the server via ScanRequest.Builder#setClientHandlesHeartbeats. Only when the server sees that ScanRequest#getClientHandlesHeartbeats() is true will it send heartbeat messages back to the client. A response is marked as a heartbeat message via the boolean flag ScanResponse#getHeartbeatMessage

Show
Previously, there was no way to enforce a time limit on scan RPC requests. The server would receive a scan RPC request and take as much time as it needed to accumulate enough results to reach a limit or exhaust the region. The problem with this approach was that, in the case of a very selective scan, the processing of the scan could take too long and cause timeouts client side. With this fix, the server will now enforce a time limit on the execution of scan RPC requests. When a scan RPC request arrives to the server, a time limit is calculated to be half of whichever timeout value is more restictive between the configurations ("hbase.client.scanner.timeout.period" and "hbase.rpc.timeout"). When the time limit is reached, the server will return whatever results it has accumulated up to that point. The results may be empty. To ensure that timeout checks do not occur too often (which would hurt the performance of scans), the configuration "hbase.cells.scanned.per.heartbeat.check" has been introduced. This configuration controls how often System.currentTimeMillis() is called to update the progress towards the time limit. Currently, the default value of this configuration value is 10000. Specifying a smaller value will provide a tighter bound on the time limit, but may hurt scan performance due to the higher frequency of calls to System.currentTimeMillis(). Protobuf models for ScanRequest and ScanResponse have been updated so that heartbeat support can be communicated. Support for heartbeat messages is specified in the request sent to the server via ScanRequest.Builder#setClientHandlesHeartbeats. Only when the server sees that ScanRequest#getClientHandlesHeartbeats() is true will it send heartbeat messages back to the client. A response is marked as a heartbeat message via the boolean flag ScanResponse#getHeartbeatMessage

Description

It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress.

This is related but orthogonal to streaming scan (~~HBASE-13071~~).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

13090-branch-1.addendum
17/Apr/15 23:41
0.7 kB
Ted Yu
HBASE-13090-v1.patch
12/Mar/15 00:08
98 kB
Jonathan Lawlor
HBASE-13090-v2.patch
13/Mar/15 21:24
129 kB
Jonathan Lawlor
HBASE-13090-v3.patch
13/Mar/15 22:52
129 kB
Jonathan Lawlor
HBASE-13090-v3.patch
13/Mar/15 22:21
129 kB
Jonathan Lawlor
HBASE-13090-v4.patch
19/Mar/15 18:05
126 kB
Jonathan Lawlor
HBASE-13090-v6.patch
19/Mar/15 22:08
126 kB
Jonathan Lawlor
HBASE-13090-v7.patch
16/Apr/15 22:21
97 kB
Jonathan Lawlor

Issue Links

is related to

HBASE-15378 Scanner cannot handle heartbeat message with no results

Resolved

HBASE-15358 canEnforceTimeLimitFromScope should use timeScope instead of sizeScope

Resolved

HBASE-11544 [Ergonomics] hbase.client.scanner.caching is dogged and will try to return batch even if it means OOME

Closed

HBASE-13082 Coarsen StoreScanner locks to RegionScanner

Closed

HBASE-12266 Slow Scan can cause dead loop in ClientScanner

Closed

HBASE-15593 Time limit of scanning should be offered by client

Closed

relates to

HBASE-13333 Renew Scanner Lease without advancing the RegionScanner

Closed

HBASE-12266 Slow Scan can cause dead loop in ClientScanner

Closed

HBASE-13071 Hbase Streaming Scan Feature

Closed

links to

Reviewboard

(1 is related to, 3 relates to, 1 links to)

Sub-Tasks

1.

Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout

Closed

Jonathan Lawlor

Progress heartbeats for long running scanners

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates