Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31751

spark serde property path overwrites table property location

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.1, 2.4.5
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      This is an issue that have caused us so many data errors. 

      1) using spark ( with hive context enabled )

      df = spark.createDataFrame([{"a": "x", "b": "y", "c": "3"}])
      df.write.format("orc").option("compression", "ZLIB").mode("overwrite").saveAsTable('test_spark');
      

       

      2) from hive 

      alter table test_spark rename to test_spark2
      

       

      3)from spark-sql from command line ( note : not pyspark or spark-shell )  

      select * from test_spark2
      

       

      will give output 

      NULL NULL NULL
      Time taken: 0.334 seconds, Fetched 1 row(s)
      

       

      This will throw NULL because , pyspark write API will add a serde property called path into the hive metastore. when hive renames the table , it do not understand this serde and hence keep it as it is. Now when spark-sql tries to read it , it will honor the serde property first and then tries to read from the non-existent hdfs location. If it had given an error , then also it would have been fine , but throwing out NULL will cause applications to fail pretty bad. Spark claims to support hive tables , hence it should respect hive metastore location property rather than spark serde property when trying to read a table. This cannot be classified as a expected behaviour.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              niths Nithin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: