Kudu's wire format is actually very close to impala's and we should probably take it the rest of the way before we release and start to impact "released" clients.
The potential performance upside for the kudu-impala integration is pretty big, we can copy whole rows instead of doing tuple by tuple transformations and eventually we can make impala just adopt the data as it arrives from kudu and do no copying or transformations at all.
Here is the list of things that need addressing:
- The bitmaps are in opposite sides of the row (Kudu's are at the end and Impala's are at the beginning).
- Kudu's bitmaps are proportional to the whole column set and contain garbage for non-nullable columns, Impala's bitmaps only refer to the nullable columns (and thus do not contain garbage).
- Impala's row layout does padding (8 byte alignment). We should mimic that, though it should be optional since it seems like it can be costly space wise.
- Impala's timestamps have a different size and format from kudu's. We should create rowwiserow blocks with space for impala to do the transformation in place, versus having to memcopy the whole thing.