Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44990

CSV conversion performance severely degraded for null fields

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.3.4
    • 3.4.2, 3.5.0, 3.3.4
    • SQL
    • None

    Description

       
      https://github.com/apache/spark/pull/36110/files
      introduced a SQLConf access in a critical section for every field processed in a record that is null.

      This causes severe degradation of performance causing one workload that was completing in a couple of seconds to now take around 8 minutes.

      This conf needs to be moved out of the critical path, there's no need for it to be in this location.

      The version of Spark prior to this commit didn't exhibit the slowdown. I also generated a patch on an affected version with the suspected line removed and the problem went away.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            atulpayapilly_amazon Atul Felix Payapilly
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment