This will need documentation, as discussed in some Dec. 12 messages on the dev@orc email thread, archived here:
It's strange that the messages don't appear here in the JIRA.
Dain Sundstrom – Is "ORC's schema evolution uses the column names when they are available” documented somewhere?
Owen O'Malley – No, unfortunately, but it needs to be. The basic rules from SchemaEvolution.java look like:
structs (including the top row):
if field names are available (post
HIVE-4243), use name matching
otherwise use positional matching
lists, maps, unions:
children must match
Many primitives can convert to each other, but this list needs to be cleaned up:
boolean, byte, short, int, long, float, double, decimal -> boolean, byte,
short, int, long, float, double, decimal, string, char, varchar, timestamp
string, char, varchar -> all
timestamp -> boolean, byte, short, int, long, float, double, decimal,
string, char, varchar, date
date -> string, char, varchar, timestamp
binary -> string, char, varchar, date
Dain Sundstrom – So, rename column is not expected to work anymore?
Owen O'Malley –
ORC-120 will add an option to force positional mapping.
Dain Sundstrom – Oh, I see this is an ORC feature like the Parquet schema evolution stuff. We implemented support for ordering by the top level struct names in Presto a while back.