Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.1.0
-
None
-
None
-
None
Description
Since a VARBINARY can be any length with any bytes, we cannot know where it ends. Thus we only allow it at the end of the row key. With a BINARY, you're telling Phoenix how big it is, so it can occur anywhere in the PK constraint.
One way to solve this would be to the same way Orderly encodes a variable length blob [1]:
Each encoded byte thereafter consists of a header bit followed by 7 bits of payload. A header bit of '1' indicates continuation of the encoding. A header bit of '0' indicates this byte contains the last of the payload.
When encoding arrays of byte[]s Phoenix doesn't correctly encode the null-byte (0x00). Phoenix sees that as the terminating character for the element, but when you do something like org.apache.hadoop.hbase.util.Bytes.asBytes(int) it creates a byte[4] and sets bytes from the right to the left (so 1 would be converted to [0,0,0,1]), and then phoenix will see the leading 0-byte as the terminator the element and just return a null element
Instead, arrays of byte[]s need to include a length (probably prefix) so it knows how many bytes to read in. Its a bigger overhead than any other encoding type, but that may be the overhead if you want to do anything goes byte arrays.