Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23003

Deterministic name to s3 partition while writing to s3 by Spark

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 2.1.1
    • None
    • Input/Output, Spark Core
    • None
    • aws emr-5.7.0

    Description

      While writing to s3 how can one control the name of the partition being written to s3 i.e.
      lets say df.write().json() writes 5 partitions with some hashed name in an S3 bucket/keyPath.
      Is it possible that somehow that name is deterministic?

      Attachments

        Activity

          People

            Unassigned Unassigned
            docodon Dhruv sharma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified