Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28082

Add a note to DROPMALFORMED mode of CSV for column pruning

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Trivial
    • Resolution: Duplicate
    • 3.0.0
    • None
    • Documentation, PySpark, SQL
    • None

    Description

      This is inspired by SPARK-28058.

      When using DROPMALFORMED mode, corrupted records aren't dropped if malformed columns aren't read. This behavior is due to CSV parser column pruning. Current doc of DROPMALFORMED doesn't mention the effect of column pruning. Users will be confused by the fact that DROPMALFORMED mode doesn't work as expected.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: