[HBASE-9778] Add hint to ExplicitColumnTracker to avoid seeking - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.96.2, 0.98.1, 0.99.0, 0.94.18
Component/s: None
Labels:
None

Release Note:

Hide
Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation to seek between columns.

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. With small rows and few versions look ahead is typically more efficient.

API:
{code}
    Scan s = new Scan(...);
    s.addColumn(...);
    // instructs the RegionServer to attempt two iterations of next before scheduling a seek
    s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
    table.getScanner(s);
{code}

Show
Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation to seek between columns. A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. With small rows and few versions look ahead is typically more efficient. API: {code}     Scan s = new Scan(...);     s.addColumn(...);     // instructs the RegionServer to attempt two iterations of next before scheduling a seek     s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));     table.getScanner(s); {code}

Description

The issue of slow seeking in ExplicitColumnTracker was brought up by vrodionov on the dev list.

My idea here is to avoid the seeking if we know that there aren't many versions to skip.
How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call SKIP repeatedly.

~~HBASE-9769~~ has some initial number for this approach:
Interestingly it depends on which column(s) is (are) selected.

Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds.

Without patch:

Wildcard	Col 1	Col 2	Col 4	Col 5	Col 2+4
6.4	8.5	14.3	14.6	11.1	20.3

With patch:

Wildcard	Col 1	Col 2	Col 4	Col 5	Col 2+4
6.4	8.4	8.9	9.9	6.4	10.0

Variation here was +- 0.2s.

So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

9778-trunk-v9.txt
11/Mar/14 17:13
14 kB
Lars Hofhansl
9778-0.94-v9.txt
11/Mar/14 17:13
14 kB
Lars Hofhansl
9778-trunk-v8.txt
10/Mar/14 21:51
14 kB
Lars Hofhansl
9778-0.94-v8.txt
10/Mar/14 21:51
14 kB
Lars Hofhansl
9778-0.94-v7.txt
10/Mar/14 21:35
13 kB
Lars Hofhansl
9778-trunk-v7.txt
10/Mar/14 21:28
14 kB
Lars Hofhansl
9778-trunk-v6.txt
08/Mar/14 00:02
12 kB
Lars Hofhansl
9778-0.94-v6.txt
07/Mar/14 19:57
12 kB
Lars Hofhansl
9778-0.94-v5.txt
07/Mar/14 04:24
6 kB
Lars Hofhansl
9778-0.94-v4.txt
22/Oct/13 06:12
9 kB
Lars Hofhansl
9778-trunk-v3.txt
16/Oct/13 21:03
11 kB
Lars Hofhansl
9778-0.94-v3.txt
16/Oct/13 20:53
11 kB
Lars Hofhansl
9778-trunk-v2.txt
16/Oct/13 17:53
6 kB
Lars Hofhansl
9778-0.94-v2.txt
16/Oct/13 17:51
6 kB
Lars Hofhansl
9778-trunk.txt
16/Oct/13 04:54
0.9 kB
Lars Hofhansl
9778-0.94.txt
16/Oct/13 04:53
0.9 kB
Lars Hofhansl

Issue Links

is related to

HBASE-4433 avoid extra next (potentially a seek) if done with column/row

Closed

HBASE-9000 Linear reseek in Memstore

Closed

relates to

HBASE-13109 Make better SEEK vs SKIP decisions during scanning

Closed

Activity

People

Assignee:: Lars Hofhansl

Reporter:: Lars Hofhansl

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 16/Oct/13 04:50

Updated:: 04/Mar/15 22:25

Resolved:: 11/Mar/14 18:35