Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26016

Document that UTF-8 is required in text data source

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • Java API
    • None

    Description

      Attached you will find a project with unit tests showing the issue at hand.

      If I read in a ISO-8859-1 encoded file and simply write out what was read; the contents in the part file matches what was read.  Which is great.

      However, the second I use a map / mapPartitions function it looks like the encoding is not correct.  In addition a simple collectAsList and writing that list of strings to a file does not work either.  I don't think I'm doing anything wrong.  Can someone please investigate?  I think this is a bug.

      Attachments

        1. spark-sandbox.zip
          118 kB
          Chris Caspanello

        Issue Links

          Activity

            People

              srowen Sean R. Owen
              ccaspanello Chris Caspanello
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: