[SPARK-2447] Add common solution for sending upsert actions to HBase (put, deletes, and increment) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: DStreams, Spark Core
Labels:
None

Description

Going to review the design with Tdas today.

But first thoughts is to have an extension of VoidFunction that handles the connection to HBase and allows for options such as turning auto flush off for higher through put.

Need to answer the following questions first.

Can it be written in Java or should it be written in Scala?
What is the best way to add the HBase dependency? (will review how Flume does this as the first option)
What is the best way to do testing? (will review how Flume does this as the first option)
How to support python? (python may be a different Jira it is unknown at this time)

Goals:

Simple to use
Stable
Supports high load
Documented (May be in a separate Jira need to ask Tdas)
Supports Java, Scala, and hopefully Python
Supports Streaming and normal Spark

Attachments

Issue Links

duplicates

SPARK-1127 Add saveAsHBase to PairRDDFunctions

Resolved

is depended upon by

SPARK-944 Give example of writing to HBase from Spark Streaming

Closed

relates to

HBASE-11482 Optimize HBase TableInput/OutputFormats for exposing tables and snapshots as Spark RDDs

Closed

HBASE-13992 Integrate SparkOnHBase into HBase

Closed

Activity

People

Assignee:: Theodore michael Malaska

Reporter:: Theodore michael Malaska

Votes:: 1 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 11/Jul/14 11:00

Updated:: 30/Jun/15 20:59

Resolved:: 30/Jun/15 20:59