Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15705

Spark won't read ORC schema from metastore for partitioned tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)

      Description

      Spark does not seem to read the schema from the Hive metastore for partitioned tables stored as ORC files. It appears to read the schema from the files themselves, which, if they were created with Hive, does not match the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:

      In Hive:

      hive> create table default.test (id BIGINT, name STRING) partitioned by (state STRING) stored as orc;
      hive> insert into table default.test partition (state="CA") values (1, "mike"), (2, "steve"), (3, "bill");
      

      In Spark

      scala> spark.table("default.test").printSchema
      

      Expected result: Spark should preserve the column names that were defined in Hive.

      Actual Result:

      root
       |-- _col0: long (nullable = true)
       |-- _col1: string (nullable = true)
       |-- state: string (nullable = true)
      

      Possibly related to SPARK-14959?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yhuai Yin Huai
                Reporter:
                nseggert Nic Eggert
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: