Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5180 Data source API improvement (Spark 1.5)
  3. SPARK-8907

Speed up path construction in DynamicPartitionWriterContainer.outputWriterForRow

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None
    • Spark 1.5 release

    Description

      Don't use zip and scala collection methods to avoid garbage collection

          val partitionPath = partitionColumns.zip(row.toSeq).map { case (col, rawValue) =>
            val string = if (rawValue == null) null else String.valueOf(rawValue)
            val valueString = if (string == null || string.isEmpty) {
              defaultPartitionName
            } else {
              PartitioningUtils.escapePathName(string)
            }
            s"/$col=$valueString"
          }.mkString.stripPrefix(Path.SEPARATOR)
      

      We can probably use catalyst expressions themselves to construct the path, and then we can leverage code generation to do this.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lian cheng Cheng Lian
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Spark 1.5 release ended 14/Aug/15
                View on Board

                Slack

                  Issue deployment