When scanning a deeply nested Avro file Impala gets into an infinite loop/recursion and the query hangs. The query cannot be cancelled and will continue to take 100% of one CPU core. The only remedy is to restart the impalad.
Skye, I had applied Patch Set 4 of your Avro scanner CR.
Steps to repro:
1. Copy the attached Parquet data file to a local dir
2. Copy the file somewhere into a new HDFS dir (assuming /test-warehouse/max_depth/ below)
3. In Impala, create a Parquet table using that file:
4. In Hive, create an Avro table from that Parquet table:
During my initial investigation I found the following:
The query hangs in AvroSchemaElement::ConvertSchema() called from HdfsScanNode::Prepare().
I added some logging in AvroSchemaElement::ConvertSchema() to print the pointers of traversed child elements, and there appears to be a cycle because the pointers traversed repeat after some number of recursive calls.