Support for writing to Hive table which uses Avro schema pointed to by avro.schema.url is missing.
I have Hive table with Avro data format. Table is created with query like this:
CREATE TABLE some_table PARTITIONED BY (YEAR int, MONTH int, DAY int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT '' OUTPUTFORMAT '' LOCATION 'hdfs:///user/some_user/some_table' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/some_user/some_table.avsc' )
Please notice that there is `avro.schema.url` and not `avro.schema.literal` property, as we have to keep schemas in separate files for some reasons.
Trying to write to such table results in NPE.
Tried to find workaround for this, but nothing helps. Tried:
- setting df.write.option("avroSchema", avroSchema) with explicit schema in string
- replacing explicit detailed SERDE specification with STORED AS AVRO
I found that this can be solved by adding a couple of lines in `org.apache.spark.sql.hive.HiveShim` next to `AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL` is referenced.
