Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a given RDD.
This will do the following:
1. figure out the number of regions and sort and partition the data correctly to be written out to HFiles
2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle stage and not in the memory of the reducer. This will allow this design to support super wide records with out going out of memory.
Attachments
Attachments
Issue Links
- depends upon
-
HBASE-13992 Integrate SparkOnHBase into HBase
- Closed
- is depended upon by
-
HBASE-14340 Add second bulk load option to Spark Bulk Load to send puts as the value
- Closed
-
HBASE-14158 Add documentation for Initial Release for HBase-Spark Module integration
- Closed
-
HBASE-14216 Consolidate MR and Spark BulkLoad shared functions and string consts
- Closed
-
HBASE-14217 Add Java access to Spark bulk load functionality
- Closed
- links to