Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3480

Avro files with multiple "blocks" fail to deserialize when using a compression codec (throwing an error instead)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.11.0
    • None
    • php
    • None

    Description

      When attempting in PHP to deserialize a file containing a large number of records (see example file attached – 20,000 records) that uses the DEFLATE codec, the `$decoder` instance advances through the file incorrectly, eventually yielding an empty string that is passed into `gzinflate(...)` on this line: https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176

       

      ...resulting in a PHP error being raised. Notably, at the time when this happens, not all records have been deserialized, so it seems that this is related to there being multiple "blocks" in the file.

      I've attached a file that meets this condition, and also a quick Kotlin project using the official Java library that I used to generate the file.

      The PHP code in question to reproduce this behavior is pretty standard, lifted directly from the provided examples/write_read.php file:

       

      <?phpif (count($argv) < 2) {
          echo "USAGE: php main.php FILENAME";
          exit(1);
      }
      $filename = $argv[1];

      require_once _DIR_ . '/../vendor/avro-php-1.11.0/lib/autoload.php';

      use Apache\Avro\DataFile\AvroDataIO;

      $data_reader = AvroDataIO::openFile($filename);
      echo "Reading from $filename:\n";
      foreach ($data_reader->data() as $datum) {
          echo var_export($datum, true) . "\n";
      }
      $data_reader->close();

       

      Attachments

        1. repro_java_create_problematic_avro_file.zip
          4 kB
          Spencer Williams
        2. test.avro
          38 kB
          Spencer Williams

        Activity

          People

            Unassigned Unassigned
            spencer.williams.salesforce Spencer Williams
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: