Description
RECENT HISTORY
We're seeing multiple failures in FileBasedDataSourceSuite in spark-branch-2.3-test-sbt-hadoop-2.7:
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 15 times over 10.012158059999999 seconds. Last failure message: There are 1 possibly leaked file streams..
Here's the full history: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/history/
From a very quick look, these failures seem to be correlated with https://github.com/apache/spark/pull/20479 (cc dongjoon) as evident from the following stack trace (full logs here):
[info] - Enabling/disabling ignoreMissingFiles using orc (648 milliseconds) 15:55:58.673 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 61.0 (TID 85, localhost, executor driver): TaskKilled (Stage cancelled) 15:55:58.674 WARN org.apache.spark.DebugFilesystem: Leaked filesystem connection created at: java.lang.Throwable at org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.open(RecordReaderUtils.java:173) at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:254) at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:633) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:138)
Also, while this might be just a false correlation but the frequency of these test failures have increased considerably in https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/ after https://github.com/apache/spark/pull/20562 (cc fengliu@databricks.com) was merged.
The following is Parquet leakage.
Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null
at org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36)
at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:538)
at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:125)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106)
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/322/ (May 3rd)
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/331/ (May 9th)
- https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536 (May 11st)
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/342/ (May 16th)
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/347/ (May 19th)
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/367/ (June 2nd)
Attachments
Issue Links
- blocks
-
SPARK-24139 Fix Flaky Tests
- Resolved
- is duplicated by
-
SPARK-23606 Flakey FileBasedDataSourceSuite
- Resolved
- is related to
-
SPARK-25688 Potential resource leak in ORC
- Resolved
-
SPARK-23399 Register a task completion listener first for OrcColumnarBatchReader
- Resolved
- relates to
-
SPARK-23458 Flaky test: OrcQuerySuite
- Resolved
-
SPARK-23505 Flaky test: ParquetQuerySuite
- Resolved
-
ORC-419 Ensure to call `close` at RecordReaderImpl constructor exception
- Closed
-
ORC-416 Avoid opening data reader when there is no stripe
- Closed
- links to