Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2374

Infinite loop/recursion when scanning deeply nested Avro file.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.3.0
    • Impala 2.3.0
    • None

    Description

      When scanning a deeply nested Avro file Impala gets into an infinite loop/recursion and the query hangs. The query cannot be cancelled and will continue to take 100% of one CPU core. The only remedy is to restart the impalad.

      Skye, I had applied Patch Set 4 of your Avro scanner CR.

      Steps to repro:
      1. Copy the attached Parquet data file to a local dir
      2. Copy the file somewhere into a new HDFS dir (assuming /test-warehouse/max_depth/ below)
      3. In Impala, create a Parquet table using that file:

      create external table max_depth_parquet
      like parquet '/test-warehouse/max_depth/max_depth.parq'
      stored as parquet
      location '/test-warehouse/max_depth/
      

      4. In Hive, create an Avro table from that Parquet table:

      create table max_depth_avro stored as avro as select * from max_depth_parquet;
      

      During my initial investigation I found the following:
      The query hangs in AvroSchemaElement::ConvertSchema() called from HdfsScanNode::Prepare().

      I added some logging in AvroSchemaElement::ConvertSchema() to print the pointers of traversed child elements, and there appears to be a cycle because the pointers traversed repeat after some number of recursive calls.

      Attachments

        1. fix-create-table-like.diff
          0.7 kB
          Alexander Behm
        2. max_depth.parq
          3 kB
          Alexander Behm

        Activity

          People

            skye Skye Wanderman-Milne
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: