[SPARK-23725] Improve Hadoop's LineReader to support charsets different from UTF-8 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

If the record delimiter is not specified, Hadoop LineReader splits lines/records by '\n', '\r' or/and '\r\n' in UTF-8 encoding: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L173-L177 . The implementation should be improved to support any charset.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 17/Mar/18 14:02

Updated:: 25/May/21 01:55

Resolved:: 25/May/21 01:39