Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23554

Hive's textinputformat.record.delimiter equivalent in Spark

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.2.1, 2.3.0
    • None
    • Spark Core

    Description

      It would be great if Spark would support an option similar to Hive's textinputformat.record.delimiter  in spark-csv reader.

      We currently have to create Hive tables to workaround this missing functionality natively in Spark.

      textinputformat.record.delimiter was introduced back in 2011 in map-reduce era -
      see MAPREDUCE-2254.

      As an example, one of the most common use cases for us involving textinputformat.record.delimiter is to read multiple lines of text that make up a "record". Number of actual lines per "record" is varying and so textinputformat.record.delimiter is a great solution for us to process these files natively in Hadoop/Spark (custom .map() function then actually does processing of those records), and we convert it to a dataframe..

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Tagar Ruslan Dautkhanov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: