[HIVE-3420] Inefficiency in hbase handler when process query including rowkey range scan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: HBase Handler
Labels:
None
Environment:

Hive-0.9.0 + HBase-0.94.1

Description

When query hive with hbase rowkey range, hive map tasks do not leverage startrow, endrow information in tablesplit. For example, if the rowkeys fit into 5 hbase files, then where will be 5 map tasks. Ideally, each task will process 1 file. But in current implementation, each task processes 5 files repeatedly. The behavior not only waste network bandwidth, but also worse the lock contention in HBase block cache as each task have to access the same block. The problem code is in HiveHBaseTableInputFormat.convertFilte as below:
……
if (tableSplit != null)

{ tableSplit = new TableSplit( tableSplit.getTableName(), startRow, stopRow, tableSplit.getRegionLocation()); }

scan.setStartRow(startRow);
scan.setStopRow(stopRow);
……
As tableSplit already include startRow, endRow information of file, the better implementation will be:

……
byte[] splitStart = startRow;
byte[] splitStop = stopRow;
if (tableSplit != null) {

if(tableSplit.getStartRow() != null)

{ splitStart = startRow.length == 0 || Bytes.compareTo(tableSplit.getStartRow(), startRow) >= 0 ? tableSplit.getStartRow() : startRow; }

if(tableSplit.getEndRow() != null)

{ splitStop = (stopRow.length == 0 || Bytes.compareTo(tableSplit.getEndRow(), stopRow) <= 0) && tableSplit.getEndRow().length > 0 ? tableSplit.getEndRow() : stopRow; }

tableSplit = new TableSplit(
tableSplit.getTableName(),
splitStart,
splitStop,
tableSplit.getRegionLocation());
}
scan.setStartRow(splitStart);
scan.setStopRow(splitStop);
……
In my test, the changed code will improve performance more than 30%.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-3420.D7311.1.patch
12/Dec/12 07:05
5 kB
Phabricator

Issue Links

is duplicated by

HIVE-4247 Filtering on a hbase row key duplicates results across multiple mappers

Resolved

is related to

HIVE-11609 Capability to add a filter to hbase scan via composite key doesn't work

Closed

Activity

People

Assignee:: Navis Ryu

Reporter:: Gang Deng

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 31/Aug/12 07:10

Updated:: 11/Oct/15 07:22

Resolved:: 22/Sep/13 08:17

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified