Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23308

ignoreCorruptFiles should not ignore retryable IOException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 2.2.1
    • None
    • SQL
    • None

    Description

      When `spark.sql.files.ignoreCorruptFiles` is set it totally ignores any kind of RuntimeException or IOException, but some possible IOExceptions may happen even if the file is not corrupted.

      One example is the SocketTimeoutException which can be retried to possibly fetch the data without meaning the data is corrupted.

       

      See: 

      https://github.com/apache/spark/blob/e30e2698a2193f0bbdcd4edb884710819ab6397c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L163

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              marciocarmona Márcio Furlani Carmona
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: