Details
Description
RowCounter allows for column names to be specify at command line:
Usage: RowCounter [options] <tablename> [--range=[startKey],[endKey]] [<column1> <column2>...]
For performance consider the following options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false
However, the column names are parsed assuming that if there is a colon, there are only two parts to the string. In other words, it assumes family:qualifier where qualifier wouldn't contain a colon.
This came up as I was trying to do a row count on a kiji table where qualifiers typically have multiple colon-delimited components (i.e. B:C could be a qualifier in the B family).
The flaw is in this code:
String [] fields = columnName.split(":"); if(fields.length == 1) { scan.addFamily(Bytes.toBytes(fields[0])); } else { byte[] qualifier = Bytes.toBytes(fields[1]); qualifiers.add(qualifier); scan.addColumn(Bytes.toBytes(fields[0]), qualifier);