[HBASE-14340] Add second bulk load option to Spark Bulk Load to send puts as the value - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, connector-1.0.0
Component/s: hbase-connectors, spark
Labels:
None

Hadoop Flags:

Reviewed

Description

The initial bulk load option for Spark bulk load sends values over one by one through the shuffle. This is the similar to how the original MR bulk load worked.

How ever the MR bulk loader have more then one bulk load option. There is a second option that allows for all the Column Families, Qualifiers, and Values or a row to be combined in the map side.

This only works if the row is not super wide.

But if the row is not super wide this method of sending values through the shuffle will reduce the data and work the shuffle has to deal with.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-14340.2.patch
15/Nov/15 19:19
64 kB
Theodore michael Malaska
HBASE-14340.1.patch
09/Sep/15 00:58
65 kB
Theodore michael Malaska

Issue Links

depends upon

HBASE-14150 Add BulkLoad functionality to HBase-Spark Module

Closed

is depended upon by

HBASE-14217 Add Java access to Spark bulk load functionality

Closed

links to

review board

Activity

People

Assignee:: Theodore michael Malaska

Reporter:: Theodore michael Malaska

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 30/Aug/15 21:59

Updated:: 24/Jun/22 19:30

Resolved:: 17/Nov/15 21:55