Details
Description
When loading data into HBase with HBaseStorage, there is unexpected behavior regarding record schema and casting.
Here is the relevant code snippet:
B = group A by (time_tuple, some_scalar); C = foreach B { -- UDF to generate id (bytearray) generate id, flatten(group.$0), COUNT(A); }
At this point the schema for C is unknown, so I declare a schema with a foreach statement
D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, $4 as count:int;
Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I go to insert into HBase I get a ClassCastException since the schema (int) does not match the actual tuple value (long). I can fix this by explicitly casting when I declare the schema.
D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, (int)$4 as count:int;
Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema before going off casting things?
Cheers,
David