Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13184

Support minPartitions parameter for JSON and CSV datasources as options

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 2.0.0
    • None
    • SQL
    • None

    Description

      After looking through the pull requests below at Spark CSV datasources,

      https://github.com/databricks/spark-csv/pull/256
      https://github.com/databricks/spark-csv/issues/141
      https://github.com/databricks/spark-csv/pull/186

      It looks Spark might need to be able to set minPartitions.

      repartition() or coalesce() can be alternatives but it looks it needs to shuffle the data for most cases.

      Although I am still not sure if it needs this, I will open this ticket just for discussion.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: