Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16552

Store the Inferred Schemas into External Catalog Tables when Creating Tables

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: SQL

      Description

      Currently, in Spark SQL, the initial creation of schema can be classified into two groups. It is applicable to both Hive tables and Data Source tables:

      Group A. Users specify the schema.

      Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema of the SELECT clause. For example,

      CREATE TABLE tab STORED AS TEXTFILE
      AS SELECT * from input
      

      Case 2 CREATE TABLE: users explicitly specify the schema. For example,

      CREATE TABLE jsonTable (_1 string, _2 string)
      USING org.apache.spark.sql.json
      

      Group B. Spark SQL infer the schema at runtime.

      Case 3 CREATE TABLE. Users do not specify the schema but the path to the file location. For example,

      CREATE TABLE jsonTable 
      USING org.apache.spark.sql.json
      OPTIONS (path '${tempDir.getCanonicalPath}')
      

      Now, Spark SQL does not store the inferred schema in the external catalog for the cases in Group B. When users refreshing the metadata cache, accessing the table at the first time after (re-)starting Spark, Spark SQL will infer the schema and store the info in the metadata cache for improving the performance of subsequent metadata requests. However, the runtime schema inference could cause undesirable schema changes after each reboot of Spark.

      It is desirable to store the inferred schema in the external catalog when creating the table. When users intend to refresh the schema, they issue `REFRESH TABLE`. Spark SQL will infer the schema again based on the previously specified table location and update/refresh the schema in the external catalog and metadata cache.

        Attachments

          Activity

            People

            • Assignee:
              smilegator Xiao Li
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: