Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18352 Parse normal, multi-line JSON files (not just JSON Lines)
  3. SPARK-18658

Writing to a text DataSource buffers one or more lines in memory

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      The JSON and CSV writing paths buffer entire lines (or multiple lines) in memory prior to writing to disk. For large rows this is inefficient. It may make sense to skip the TextOutputFormat record writer and go directly to the underlying FSDataOutputStream, allowing the writers to append arbitrary byte arrays (fractions of a row) instead of a full row.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                NathanHowell Nathan Howell
                Reporter:
                NathanHowell Nathan Howell
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: