Description
CqlInputFormat use number of rows in C* version < 2.2 to define split size
The default split size was 64K rows.
private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
The doc:
* You can also configure the number of rows per InputSplit with
* ConfigHelper.setInputSplitSize. The default split size is 64k rows.
New split algorithm assumes that SPLIT size is in bytes, so it creates really small map hadoop tasks by default (or with old configs).
There two way to fix it:
1. Update the doc and increase default value to something like 16MB
2. Make the C* to be compatible with older version.
I like the second options, as it will not surprise people who upgrade from old versions. I do not expect a lot of new user that will use Hadoop.