Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Hive tables have a schema, which is copied into the partition storage descriptor when adding a partition. Currently only columns stored in the table storage descriptor are copied - columns that are reported by the serde are not copied. Instead of copying the table storage descriptor columns into the partition columns, the full table schema should be copied.
This is a little long but is necessary to show 3 things: current behavior when explicitly listing columns, behavior with
HIVE-2941 patched in and serde reported columns, and finally the behavior with this patch (full table schema copied into the partition storage descriptor).
Here's an example of what currently happens. Note the following:
- the two manually-defined fields defined for the table are listed in the table storage descriptor.
- both fields are present in the partition storage descriptor
This works great because users who query for a partition can look at its storage descriptor and get the schema.
CURRENT BEHAVIOR WITH
HIVE-2941 PATCHED IN
Now let's examine what happens when creating a table when the serde reports the schema. Notice the following:
- The table storage descriptor contains an empty list of columns. However, the table schema is available from the serde reflecting on the serialization class.
- The partition storage descriptor does contain a single "part_dt" column that was copied from the table partition keys. The actual data columns are not present.
I believe the correct thing to do is copy the full table schema (serde-reported columns + partition keys) into the partition storage descriptor. Notice the following:
- Table storage descriptor does not contain any columns, because they are reported by the serde.
- Partition storage descriptor now contains both the serde-reported schema, and full table schema.