Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.4.0
-
None
Description
If the record delimiter is not specified, Hadoop LineReader splits lines/records by '\n', '\r' or/and '\r\n' in UTF-8 encoding: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L173-L177 . The implementation should be improved to support any charset.