Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46633

Reading a non-empty Avro file with empty blocks returns 0 records

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0, 4.0.0
    • 4.0.0
    • SQL

    Description

      When an Avro file contains empty blocks, Spark returns 0 records while "fastavro" and "avro-python-3" both read the file correctly and return records.

       

      This is due to the way Spark handles empty blocks (or does not handle). Call to `hasNext` loads the next block and if that block is empty, it returns false. But instead of exiting the loop, we need to probe the next block until sync point is reached.

      Attachments

        Issue Links

          Activity

            People

              ivan.sadikov Ivan Sadikov
              ivan.sadikov Ivan Sadikov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: