[SPARK-26016] Document that UTF-8 is required in text data source - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 3.0.0
Component/s: Java API
Labels:
None

Description

Attached you will find a project with unit tests showing the issue at hand.

If I read in a ISO-8859-1 encoded file and simply write out what was read; the contents in the part file matches what was read. Which is great.

However, the second I use a map / mapPartitions function it looks like the encoding is not correct. In addition a simple collectAsList and writing that list of strings to a file does not work either. I don't think I'm doing anything wrong. Can someone please investigate? I think this is a bug.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

spark-sandbox.zip
12/Nov/18 14:24
118 kB
Chris Caspanello

Issue Links

links to

GitHub Pull Request #23962

Activity

People

Assignee:: Sean R. Owen

Reporter:: Chris Caspanello

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Nov/18 14:19

Updated:: 12/Dec/22 18:10

Resolved:: 04/Mar/19 23:05