[CASSANDRA-15354] Cassandra CQLSSTableWriter and sstableloader support HDFS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Triage Needed
Priority: Normal
Resolution: Unresolved
Fix Version/s: None
Component/s: Legacy/Local Write-Read Paths, Local/SSTable, Tool/sstable
Labels:
None

Platform:

All
Impacts:

None

Description

//代码占位符
rdd.foreachPartition( msgIterator => {
  val writer = CQLSSTableWriter.builder()
    .inDirectory(outputDir)
    // set target schema
    .forTable(SCHEMA)
    // set CQL statement to put data
    .using(INSERT_STMT)
    // set partitioner if needed
    // default is Murmur3Partitioner so set if you use different one.
    .withPartitioner( new Murmur3Partitioner()).build()
  msgIterator.foreach(msg => {
    val items = msg.toString().split(",")
    val  javaList = new util.ArrayList[Object]();
    items.foreach(t=> javaList.add(t))
    writer.addRow(javaList)
  })
  writer.close()
})

Cassandra has provided bulkdata's export/import via SSTable, which is very fancy for users. In some case we have TB-level data from HDFS to Cassandra, and we can use spark to generate SSTable files by distributed computation with codes like above. Unfortunately CQLSSTableWriter can only write data to local path, and sstableloader can only load from local path. So if we use CQLSSTableWriter in Spark or Hadoop MR program, we need to write other codes put local sstables distributed in distributed nodes to HDFS, then download all sstables from HDFS to the machine with sstableloader, bigdata stored and transferred between pysical machines will bring many reliability problems.

So we'd better let CQLSSTableWriter can write data to HDFS directly or have other writer which supports HDFS, and let sstableloader can load from HDFS path.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: YAN Bo

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Oct/19 07:10

Updated:: 11/Oct/19 07:10