[SPARK-23554] Hive's textinputformat.record.delimiter equivalent in Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.2.1, 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- csv
- csvparser

Description

It would be great if Spark would support an option similar to Hive's textinputformat.record.delimiter in spark-csv reader.

We currently have to create Hive tables to workaround this missing functionality natively in Spark.

textinputformat.record.delimiter was introduced back in 2011 in map-reduce era -
see ~~MAPREDUCE-2254~~.

As an example, one of the most common use cases for us involving textinputformat.record.delimiter is to read multiple lines of text that make up a "record". Number of actual lines per "record" is varying and so textinputformat.record.delimiter is a great solution for us to process these files natively in Hadoop/Spark (custom .map() function then actually does processing of those records), and we convert it to a dataframe..

Attachments

Issue Links

relates to

MAPREDUCE-2254 Allow setting of end-of-record delimiter for TextInputFormat

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Ruslan Dautkhanov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Mar/18 20:57

Updated:: 12/Dec/22 18:11

Resolved:: 12/Apr/18 16:34