Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.2.1, 2.3.0
-
None
Description
It would be great if Spark would support an option similar to Hive's textinputformat.record.delimiter in spark-csv reader.
We currently have to create Hive tables to workaround this missing functionality natively in Spark.
textinputformat.record.delimiter was introduced back in 2011 in map-reduce era -
see MAPREDUCE-2254.
As an example, one of the most common use cases for us involving textinputformat.record.delimiter is to read multiple lines of text that make up a "record". Number of actual lines per "record" is varying and so textinputformat.record.delimiter is a great solution for us to process these files natively in Hadoop/Spark (custom .map() function then actually does processing of those records), and we convert it to a dataframe..
Attachments
Issue Links
- relates to
-
MAPREDUCE-2254 Allow setting of end-of-record delimiter for TextInputFormat
- Closed