Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19256 Hive bucketing write support
  3. SPARK-38015

Mark legacy file naming functions as deprecated in FileCommitProtocol

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • Spark Core
    • None

    Description

      FileCommitProtocol is the class to commit Spark job output (staging file & directory renaming, etc). During Spark 3.2 development, we added new functions into this class to allow more flexible output file naming (the PR detail is here). We didn’t delete the existing file naming functions (newTaskTempFile(ext) & newTaskTempFileAbsPath(ext)), because we were aware of many other downstream projects or codebases already implemented their own custom implementation for FileCommitProtocol. Delete the existing functions would be a breaking change for them when upgrading Spark version, and we would like to avoid this unpleasant surprise for anyone if possible. But we also need to clean up legacy as we evolve our codebase.

      So for next step, I would like to propose:

      • Spark 3.3 (now): Add @deprecate annotation to legacy functions in FileCommitProtocol - newTaskTempFile(ext) & newTaskTempFileAbsPath(ext).
      • Next Spark major release (or whenever people feel comfortable): delete the legacy functions mentioned above from our codebase.

      Attachments

        Activity

          People

            chengsu Cheng Su
            chengsu Cheng Su
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: