Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5180 Data source API improvement (Spark 1.5)
  3. SPARK-8907

Speed up path construction in DynamicPartitionWriterContainer.outputWriterForRow

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None
    • Spark 1.5 release

    Description

      Don't use zip and scala collection methods to avoid garbage collection

          val partitionPath = partitionColumns.zip(row.toSeq).map { case (col, rawValue) =>
            val string = if (rawValue == null) null else String.valueOf(rawValue)
            val valueString = if (string == null || string.isEmpty) {
              defaultPartitionName
            } else {
              PartitioningUtils.escapePathName(string)
            }
            s"/$col=$valueString"
          }.mkString.stripPrefix(Path.SEPARATOR)
      

      We can probably use catalyst expressions themselves to construct the path, and then we can leverage code generation to do this.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: