As discussed in FLIP-152:
In order to support multiple Hive versions, we keep our own copy of Hive code to generate the ASTNode and do semantic analysis. This means that a SQL statement is processed with the same code no matter which Hive version is in use. The rationale behind this decision is:
1. HiveQL syntax is in general backward compatible. So we can use a newer version to support older versions.
2. The process to generate RelNode plan is tightly coupled with ASTNode and semantic analysis. While it’s theoretically possible to make HiveParserCalcitePlanner support different versions, that’ll make the logic much more complicated and error-prone.
3. The copied code gives us more flexibility to support new features in the future. For example, we can adapt the code to support writing HiveQL for generic tables, or support querying tables across multiple catalogs.
Since most Hive users are still using Hive 2.x or 1.x, we'll copy Hive code from 2.x, which would reduce the required efforts to cover these versions. For new features in Hive 3.x, e.g. table constraints, we'll extend the copied code to support them.