Description
Support for writing to Hive table which uses Avro schema pointed to by avro.schema.url is missing.
I have Hive table with Avro data format. Table is created with query like this:
CREATE TABLE some_table PARTITIONED BY (YEAR int, MONTH int, DAY int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs:///user/some_user/some_table' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/some_user/some_table.avsc' )
Please notice that there is `avro.schema.url` and not `avro.schema.literal` property, as we have to keep schemas in separate files for some reasons.
Trying to write to such table results in NPE.
Tried to find workaround for this, but nothing helps. Tried:
- setting df.write.option("avroSchema", avroSchema) with explicit schema in string
- changing TBLPROPERTIES to SERDEPROPERTIES
- replacing explicit detailed SERDE specification with STORED AS AVRO
I found that this can be solved by adding a couple of lines in `org.apache.spark.sql.hive.HiveShim` next to `AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL` is referenced.
Attachments
Issue Links
- is related to
-
SPARK-19878 Add hive configuration when initialize hive serde in InsertIntoHiveTable.scala
- Resolved
- relates to
-
SPARK-17920 HiveWriterContainer passes null configuration to serde.initialize, causing NullPointerException in AvroSerde when using avro.schema.url
- Resolved
- links to