Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15821

Should we use mvn -T for multithreaded Spark builds?

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.0.0
    • Build
    • None

    Description

      With Maven we can build Spark in a multithreaded way and benefit from increased build time performance as a result.

      On a machine with eight cores, I noticed the build time reduced from 20-25 minutes to five minutes; this is by building with

      mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean package

      -T 1C says that we'll use one extra thread for each core available, I've never experienced a problem with using this option (ranging from a single cored box to one with 192 cores available)

      Should we use this for building Spark quicker or is the Jenkins job deliberately set up such that each "executor" is needed for each pull request and we wouldn't see an improvement anyway?

      This can be discovered by checking core utilization across the farm and can potentially reduce our build times.

      Here's more information on the feature: https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3

      If this isn't suitable for the current farm then I think we should document it for those building Spark from source

      Attachments

        Activity

          People

            aroberts Adam Roberts
            aroberts Adam Roberts
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: