Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23457

Register task completion listeners first for ParquetFileFormat

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.1, 2.4.0
    • SQL
    • None

    Description

      ParquetFileFormat leaks open files in some cases. This issue aims to register task completion listener first.

        test("SPARK-23390 Register task completion listeners first in ParquetFileFormat") {
          withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE.key -> s"${Int.MaxValue}") {
            withTempDir { dir =>
              val basePath = dir.getCanonicalPath
              Seq(0).toDF("a").write.format("parquet").save(new Path(basePath, "first").toString)
              Seq(1).toDF("a").write.format("parquet").save(new Path(basePath, "second").toString)
              val df = spark.read.parquet(
                new Path(basePath, "first").toString,
                new Path(basePath, "second").toString)
              val e = intercept[SparkException] {
                df.collect()
              }
              assert(e.getCause.isInstanceOf[OutOfMemoryError])
            }
          }
        }
      

      Attachments

        Activity

          People

            dongjoon Dongjoon Hyun
            dongjoon Dongjoon Hyun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: