Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1054

Get Cassandra support in Spark Core/Spark Cassandra Module



    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core


      Calliope is a library providing an interface to consume data from Cassandra to spark and store RDDs from Spark to Cassandra.

      Building as wrapper over Cassandra's Hadoop I/O it provides a simplified and very generic API to consume and produces data from and to Cassandra. It allows you to consume data from Legacy as well as CQL3 Cassandra Storage. It can also harness C* to speed up your process by fetching only the relevant data from C* harnessing CQL3 and C*'s secondary indexes. Though it currently uses only the Hadoop I/O formats for Cassandra in near future we see the same API harnessing other means of consuming Cassandra data like using the StorageProxy or even reading from SSTables directly.

      Over the basic data fetch functionality, the Calliope API harnesses Scala and it's implicit parameters and conversions for you to work on a higher abstraction dealing with tuples/objects instead of Cassandra's Row/Columns in your MapRed jobs.

      Over past few months we have seen the combination of Spark+Cassandra gaining a lot of traction. And we feel Calliope provides the path of least friction for developers to start working with this combination.

      We have been using this ins production for over a year now and the Calliope early access repository has 30+ users. I am putting this issue to start a discussion around whether we would want Calliope to be a part of Spark and if yes, what will be involved in doing so.

      You can read more about Calliope here -




            • Assignee:
              rohitbrai Rohit Rai
            • Votes:
              3 Vote for this issue
              4 Start watching this issue


              • Created: