[SPARK-1054] Get Cassandra support in Spark Core/Spark Cassandra Module - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Spark Core
Labels:
- calliope
- cassandra

Description

Calliope is a library providing an interface to consume data from Cassandra to spark and store RDDs from Spark to Cassandra.

Building as wrapper over Cassandra's Hadoop I/O it provides a simplified and very generic API to consume and produces data from and to Cassandra. It allows you to consume data from Legacy as well as CQL3 Cassandra Storage. It can also harness C* to speed up your process by fetching only the relevant data from C* harnessing CQL3 and C*'s secondary indexes. Though it currently uses only the Hadoop I/O formats for Cassandra in near future we see the same API harnessing other means of consuming Cassandra data like using the StorageProxy or even reading from SSTables directly.

Over the basic data fetch functionality, the Calliope API harnesses Scala and it's implicit parameters and conversions for you to work on a higher abstraction dealing with tuples/objects instead of Cassandra's Row/Columns in your MapRed jobs.

Over past few months we have seen the combination of Spark+Cassandra gaining a lot of traction. And we feel Calliope provides the path of least friction for developers to start working with this combination.

We have been using this ins production for over a year now and the Calliope early access repository has 30+ users. I am putting this issue to start a discussion around whether we would want Calliope to be a part of Spark and if yes, what will be involved in doing so.

You can read more about Calliope here -
http://tuplejump.github.io/calliope

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rohit Rai

Votes:: 3 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Feb/14 00:31

Updated:: 01/Mar/15 11:53

Resolved:: 01/Mar/15 11:53