Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20407

ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky test

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.1.1, 2.2.0
    • Tests
    • None

    Description

      ParquetQuerySuite test "Enabling/disabling ignoreCorruptFiles" can sometimes fail. This is caused by the fact that when one task fails, the driver call returns and test code continues, but there might still be tasks running that will be killed at the next killing point.

      There are 2 specific issues created by this:
      1. Files can be closed some time after the test finishes, so DebugFilesystem.assertNoOpenStreams fails. One solution for this is to change SharedSqlContext and call assertNoOpenStreams inside eventually {}

      2. ParquetFileReader constructor from apache parquet 1.8.2 can leak a stream at line 538. This happens when the next line throws an exception. So, the constructor fails and Spark doesn't have any way to close the file.
      This happens in this test because the test deletes the temporary directory at the end (but while tasks might still be running). Deleting the directory causes the constructor to fail.
      The solution for this could be to Thread.sleep at the end of the test or to somehow wait for all tasks to be definitely killed before finishing the test

      Attachments

        Activity

          People

            bograd Bogdan Raducanu
            bograd Bogdan Raducanu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: