Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21769

Add a table option for Hive-serde tables to make Spark always respect schemas inferred by Spark SQL

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      For Hive-serde tables, we always respect the schema stored in Hive metastore, because the schema could be altered by the other engines that share the same metastore. Thus, we always trust the metastore-controlled schema for Hive-serde tables when the schemas are different (without considering the nullability and cases). However, in some scenarios, Hive metastore also could INCORRECTLY overwrite the schemas when the serde and Hive metastore built-in serde are different.

      The proposed solution is to introduce a table property for such scenarios. For a specific Hive-serde table, users can manually setting such table property for asking Spark for always respect Spark-inferred schema instead of trusting metastore-controlled schema. By default, it is off.

        Attachments

          Activity

            People

            • Assignee:
              smilegator Xiao Li
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: