Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Incomplete
-
None
-
None
-
None
Description
When using a IPartitioner that does not sort data in byte order (RandomPartitioner for example) with Cassandra's Hadoop integration, Hadoop is unaware of the output order of the data.
We can make Hadoop aware of the proper order of the output data by implementing Hadoop's mapreduce.Partitioner interface: then Hadoop will handle sorting all of the data according to Cassandra's IPartitioner, and the writing clients will be able to connect to smaller numbers of Cassandra nodes.