Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23457

Register task completion listeners first for ParquetFileFormat

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.1, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      ParquetFileFormat leaks open files in some cases. This issue aims to register task completion listener first.

        test("SPARK-23390 Register task completion listeners first in ParquetFileFormat") {
          withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE.key -> s"${Int.MaxValue}") {
            withTempDir { dir =>
              val basePath = dir.getCanonicalPath
              Seq(0).toDF("a").write.format("parquet").save(new Path(basePath, "first").toString)
              Seq(1).toDF("a").write.format("parquet").save(new Path(basePath, "second").toString)
              val df = spark.read.parquet(
                new Path(basePath, "first").toString,
                new Path(basePath, "second").toString)
              val e = intercept[SparkException] {
                df.collect()
              }
              assert(e.getCause.isInstanceOf[OutOfMemoryError])
            }
          }
        }
      

        Attachments

          Activity

            People

            • Assignee:
              dongjoon Dongjoon Hyun
              Reporter:
              dongjoon Dongjoon Hyun

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment