Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3314

Altering table partition's storage format is not working and crashing the daemon

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Frontend
    • Labels:

      Description

      Steps to reproduce the problem -
      Steps to reproduce the problem:
      Step1:
      create external table sample (username string, tweet string, timewhen int) partitioned by (year string,month string) location '/tmp/test_avro/data/' TBLPROPERTIES ('avro.schema.url'='hdfs://host-10-17-80-187.coe.cloudera.com:8020/tmp/test_avro/schema/twitter.avsc');
      Step2:
      hadoop fs -mkdir /tmp/test_avro/data/year=2016/month=03
      hadoop fs -put twitter.avro /tmp/test_avro/data/year=2016/month=03
      Step3:
      alter table sample add partition (year="2016",month="03") location '/tmp/test_avro/data/year=2016/month=03';
      Step4:
      alter table sample partition (year="2016",month="03") set fileformat avro
      Step5:
      select * from sample;

      Data and schema can be found here-
      https://github.com/miguno/avro-cli-examples

      core dump -
      0x00000000015c4c10 in impala::HdfsAvroScanner::ResolveSchemas (this=0x92bac60, table_root=..., file_root=0x8c742b8) at /home/bharath/Impala/be/src/exec/hdfs-avro-scanner.cc:186
      (gdb) print table_root
      $2 = (const impala::AvroSchemaElement &) @0x8c46250:

      {schema = 0x0, children = std::vector of length 0, capacity 0, null_union_position = -1, slot_desc = 0x0, static LLVM_CLASS_NAME = 0x279ccf0 "struct.impala::AvroSchemaElement"}

      So, the schema is clearly null and we are dereferencing a null pointer at
      if (table_root.schema->type != AVRO_RECORD) return Status("Table schema is not a record");

      The schema is NULL since hdfs_scan_node sees an empty schema url-
      // Parse Avro table schema if applicable
      const string& avro_schema_str = hdfs_table_->avro_schema(); <<<< - Empty string.
      if (!avro_schema_str.empty()) {
      avro_schema_t avro_schema;
      int error = avro_schema_from_json_length(
      avro_schema_str.c_str(), avro_schema_str.size(), &avro_schema);
      if (error != 0)

      { return Status(Substitute("Failed to parse table schema: $0", avro_strerror())); }

      RETURN_IF_ERROR(AvroSchemaElement::ConvertSchema(avro_schema, avro_schema_.get()));
      }

      This information is usually passed on to the backend from the frontend table descriptor.
      avroSchema_ = hdfsTable.isSetAvroSchema() ? hdfsTable.getAvroSchema() : null;

      it is a per table property and not per partition. This means that the avro schema URL is only passed on to the backend if the base table is avro.
      if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO) {
      ........
      avroSchema_ = AvroSchemaUtils.getAvroSchema(schemaSearchLocations);
      .........
      }

      Changing the base table format to avro works fine.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                anujphadke Anuj Phadke
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: