Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21769

Add a table option for Hive-serde tables to make Spark always respect schemas inferred by Spark SQL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      For Hive-serde tables, we always respect the schema stored in Hive metastore, because the schema could be altered by the other engines that share the same metastore. Thus, we always trust the metastore-controlled schema for Hive-serde tables when the schemas are different (without considering the nullability and cases). However, in some scenarios, Hive metastore also could INCORRECTLY overwrite the schemas when the serde and Hive metastore built-in serde are different.

      The proposed solution is to introduce a table property for such scenarios. For a specific Hive-serde table, users can manually setting such table property for asking Spark for always respect Spark-inferred schema instead of trusting metastore-controlled schema. By default, it is off.

      Attachments

        Activity

          People

            smilegator Xiao Li
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: