Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17480

CompressibleColumnBuilder inefficiently call gatherCompressibilityStats

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • SQL
    • None

    Description

      When we profile one of our Spark jobs we saw that:

      6.24% of the CPU is spend on List.length.
      Scala List's length method is O(N) => https://github.com/scala/scala/blob/2.10.x/src/library/scala/collection/LinearSeqOptimized.scala#L36

      Since we loop this method becomes O(N^2)

      Attachments

        Activity

          People

            eseyfe Ergin Seyfe
            eseyfe Ergin Seyfe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: