Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13184

Support minPartitions parameter for JSON and CSV datasources as options

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      After looking through the pull requests below at Spark CSV datasources,

      https://github.com/databricks/spark-csv/pull/256
      https://github.com/databricks/spark-csv/issues/141
      https://github.com/databricks/spark-csv/pull/186

      It looks Spark might need to be able to set minPartitions.

      repartition() or coalesce() can be alternatives but it looks it needs to shuffle the data for most cases.

      Although I am still not sure if it needs this, I will open this ticket just for discussion.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hyukjin.kwon Hyukjin Kwon
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: