Affects Version/s: None
Fix Version/s: None
I have a table that stores it's records with big-endian long (8 byte integer) rowkeys. I'd like to access this data via the hbase-rest api, but have come across an issue where I can't access every row that exists. For example:
$ curl -v -H "Accept: application/json" "http://hbase-rest:8080/emps/%00%00%00%00%00%00%04%00/"
Returns the expected row without issue. However
$ curl -v -H "Accept: application/json" "http://hbase-rest:8080/emps/%00%00%00%00%00%00%03%FF/"
Returns a 404 Not Found, though I'm certain the record exists. The broken query also generates a log message on the rest server like this:
WARN [qtp1473981203-37561] util.URIUtil: /emps/%00%00%00%00%00%00%03%FF/ org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte Ff in state 0
Some troubleshooting and testing suggests that the error happens when any query contains an encoded byte above 0x7f.
I've read that hbase-rest supports hex-escaped representation, like the shell, but that has not worked for me, and when looking through RowSpec.java, I don't see any indication that the parseRowKeys() method is attempting to parse the hex-escaped representation. Am I missing something here? Is the rest server supposed to support hex-escaped representation, and I'm not querying correctly?
I've looked at version 0.98, and the current master branch, and the RowSpec.java source looks largely the same, so I don't believe this to even be a regression.
I believe the error to be caused by java.net.urldecoder. I can only speculate, but would it be more appropriate to have a generic function that converts %XX strings directly to bytes, not relying on a specific Charset? Or perhaps some logic should be put into the parser to truly support the hex-escaped representation. Perhaps with a url parameter to indicate parsing as such, much like the shell requires using double quotes to indicate byte parsing.