Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25367

The column attributes obtained by Spark sql are inconsistent with hive

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1
    • Fix Version/s: None
    • Component/s: Spark Shell, SQL
    • Environment:

      spark2.2.1-hadoop-2.6.0-chd-5.4.2

      hive-1.2.1

    • Flags:
      Important

      Description

      We save the dataframe object as a hive table in orc/parquet format in the spark shell.
      After we modified the column type (int to double) of this table in hive jdbc, we  found the column type queried in spark-shell didn't change, but changed in hive jdbc. After we restarted the spark-shell, this table's column type is still incompatible as showed in hive jdbc.

      The coding process are as follows:

      spark-shell:

      val df = spark.read.json("examples/src/main/resources/people.json");
      df.write.format("orc").saveAsTable("people_test");
      spark.sql("desc people_test").show()
      
      +--------+---------+-------+
      |col_name|data_type|comment|
      +--------+---------+-------+
      | age| bigint| null|
      | name| string| null|
      +--------+---------+-------+
      

      hive:

      hive> desc people_test;
      OK
      age bigint 
      name string 
      Time taken: 0.454 seconds, Fetched: 2 row(s)
      hive> alter table people_test change column age age double;
      OK
      Time taken: 0.68 seconds
      hive> desc people_test;
      OK
      age double 
      name string 
      Time taken: 0.358 seconds, Fetched: 2 row(s)

      spark-shell:

      spark.catalog.refreshTable("people_test")
      spark.sql("desc people_test").show()
      +--------+---------+-------+
      |col_name|data_type|comment|
      +--------+---------+-------+
      | age| bigint| null|
      | name| string| null|
      +--------+---------+-------+
      

       

      We also tested in spark-shell by creating a table using spark.sql("create table XXX()"),  the modified columns are consistent.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ppyy22 yy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: