Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.5.0
Description
Steps to reproduce the problem -
Steps to reproduce the problem:
Step1:
create external table sample (username string, tweet string, timewhen int) partitioned by (year string,month string) location '/tmp/test_avro/data/' TBLPROPERTIES ('avro.schema.url'='hdfs://host-10-17-80-187.coe.cloudera.com:8020/tmp/test_avro/schema/twitter.avsc');
Step2:
hadoop fs -mkdir /tmp/test_avro/data/year=2016/month=03
hadoop fs -put twitter.avro /tmp/test_avro/data/year=2016/month=03
Step3:
alter table sample add partition (year="2016",month="03") location '/tmp/test_avro/data/year=2016/month=03';
Step4:
alter table sample partition (year="2016",month="03") set fileformat avro
Step5:
select * from sample;
Data and schema can be found here-
https://github.com/miguno/avro-cli-examples
core dump -
0x00000000015c4c10 in impala::HdfsAvroScanner::ResolveSchemas (this=0x92bac60, table_root=..., file_root=0x8c742b8) at /home/bharath/Impala/be/src/exec/hdfs-avro-scanner.cc:186
(gdb) print table_root
$2 = (const impala::AvroSchemaElement &) @0x8c46250:
So, the schema is clearly null and we are dereferencing a null pointer at
if (table_root.schema->type != AVRO_RECORD) return Status("Table schema is not a record");
The schema is NULL since hdfs_scan_node sees an empty schema url-
// Parse Avro table schema if applicable
const string& avro_schema_str = hdfs_table_->avro_schema(); <<<< - Empty string.
if (!avro_schema_str.empty()) {
avro_schema_t avro_schema;
int error = avro_schema_from_json_length(
avro_schema_str.c_str(), avro_schema_str.size(), &avro_schema);
if (error != 0)
RETURN_IF_ERROR(AvroSchemaElement::ConvertSchema(avro_schema, avro_schema_.get()));
}
This information is usually passed on to the backend from the frontend table descriptor.
avroSchema_ = hdfsTable.isSetAvroSchema() ? hdfsTable.getAvroSchema() : null;
it is a per table property and not per partition. This means that the avro schema URL is only passed on to the backend if the base table is avro.
if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO) {
........
avroSchema_ = AvroSchemaUtils.getAvroSchema(schemaSearchLocations);
.........
}
Changing the base table format to avro works fine.
Attachments
Issue Links
- is duplicated by
-
IMPALA-3513 Impala crashes on altering file format to avro and on not invalidating the metadata.
- Resolved