Hadoop Common
  1. Hadoop Common
  2. HADOOP-7985

maven build should be super fast when there are no changes

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:

      Description

      I use this command "mvn -Pdist -P-cbuild -Dmaven.javadoc.skip -DskipTests install" to build. Without ANY changes in code, running this command takes 1:32. It seems to me this is too long. Investigate if this time can be reduced drastically.

        Activity

        Hide
        Ravi Prakash added a comment -

        Just a lil' bit of improvement. Still a long ways to go

        Show
        Ravi Prakash added a comment - Just a lil' bit of improvement. Still a long ways to go
        Hide
        Ravi Prakash added a comment -

        Thanks to Jason (who's too shy to comment on this JIRA =P hahaha)

        I did some investigation a while ago, thought I mentioned the protobuf/records regenerating java code each time
        plus package-info.java files are being recompiled each time
        org.apache.hadoop.jmx, org.apache.hadoop.mapred.tools, org.apache.hadoop.tools, org.apache.hadoop.tools.rumen
        Because those files generate no .class file when compiled (there's no annotation on the package declaration that causes the compiler to generate any code, so it reinvokes the compiler on them each time like the Touchz problem)
        There's also a number of empty java files that cause the same issue
        hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AggregatedLogsBlock.java
        hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AggregatedLogsPage.java
        hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockApp.java
        hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainer.java
        it all adds up since it has to update the jar each time it recompiles something
        getting the protobuf,records,testcode not to regenerate each time is probably a big part of it
        I think there's other problems beyond that, but ran out of time to look into it further
        anyway, hope it helps!

        Show
        Ravi Prakash added a comment - Thanks to Jason (who's too shy to comment on this JIRA =P hahaha) I did some investigation a while ago, thought I mentioned the protobuf/records regenerating java code each time plus package-info.java files are being recompiled each time org.apache.hadoop.jmx, org.apache.hadoop.mapred.tools, org.apache.hadoop.tools, org.apache.hadoop.tools.rumen Because those files generate no .class file when compiled (there's no annotation on the package declaration that causes the compiler to generate any code, so it reinvokes the compiler on them each time like the Touchz problem) There's also a number of empty java files that cause the same issue hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AggregatedLogsBlock.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AggregatedLogsPage.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockApp.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockContainer.java it all adds up since it has to update the jar each time it recompiles something getting the protobuf,records,testcode not to regenerate each time is probably a big part of it I think there's other problems beyond that, but ran out of time to look into it further anyway, hope it helps!
        Hide
        Ravi Prakash added a comment -

        all protobuf compilation sections in pom.xml can be modified like this

        $ git diff pom.xml 
        diff --git a/hadoop-hdfs-project/hadoop-hdfs/pom.xml b/hadoop-hdfs-project/hadoop-hdfs/pom.xml
        index 0dcff87..7384385 100644
        --- a/hadoop-hdfs-project/hadoop-hdfs/pom.xml
        +++ b/hadoop-hdfs-project/hadoop-hdfs/pom.xml
        @@ -238,9 +238,9 @@
                             mkdir -p $JAVA_DIR 2> /dev/null
                             for PROTO_FILE in `ls $PROTO_DIR/*.proto 2> /dev/null`
                             do
        -                        if [ "$IS_WIN" = "true" ]; then
        +                        if [[ "$IS_WIN" = "true" && $PROTO_FILE -nt $WIN_JAVA_DIR ]]; then
                                   protoc -I$WIN_PROTO_DIR --java_out=$WIN_JAVA_DIR $PROTO_FILE
        -                        else
        +                        elif [ $PROTO_FILE -nt $JAVA_DIR ]; then
                                   protoc -I$PROTO_DIR --java_out=$JAVA_DIR $PROTO_FILE
                                 fi
                             done
        

        I am now trying to optimize mvn -P-cbuild -Dmaven.javadoc.skip -DskipTests -X compile

        Even here jsps are being compiled into java files every time unnecessarily

        Show
        Ravi Prakash added a comment - all protobuf compilation sections in pom.xml can be modified like this $ git diff pom.xml diff --git a/hadoop-hdfs-project/hadoop-hdfs/pom.xml b/hadoop-hdfs-project/hadoop-hdfs/pom.xml index 0dcff87..7384385 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/pom.xml +++ b/hadoop-hdfs-project/hadoop-hdfs/pom.xml @@ -238,9 +238,9 @@ mkdir -p $JAVA_DIR 2> /dev/null for PROTO_FILE in `ls $PROTO_DIR/*.proto 2> /dev/null` do - if [ "$IS_WIN" = "true" ]; then + if [[ "$IS_WIN" = "true" && $PROTO_FILE -nt $WIN_JAVA_DIR ]]; then protoc -I$WIN_PROTO_DIR --java_out=$WIN_JAVA_DIR $PROTO_FILE - else + elif [ $PROTO_FILE -nt $JAVA_DIR ]; then protoc -I$PROTO_DIR --java_out=$JAVA_DIR $PROTO_FILE fi done I am now trying to optimize mvn -P-cbuild -Dmaven.javadoc.skip -DskipTests -X compile Even here jsps are being compiled into java files every time unnecessarily
        Hide
        Ravi Prakash added a comment -

        In hadoop-common-project/hadoop-common:

        • Touchz.java should be renamed to Touch.java to avoid compilation everytime.
        • Record IO generated test files which were compiled everytime. We could modify RccTask.java to doCompile only when sourceFile is newer than the destination file
          These are just 2 seconds of the 24 seconds needed to build hadoop-common. So obviously not substantial. Will keep looking.

        I'm beginning to think running from JARs might not be the best idea to have a quick dev cycle. Maybe I should try running from the target/classes directories and skip building the jar altogether.

        Show
        Ravi Prakash added a comment - In hadoop-common-project/hadoop-common: Touchz.java should be renamed to Touch.java to avoid compilation everytime. Record IO generated test files which were compiled everytime. We could modify RccTask.java to doCompile only when sourceFile is newer than the destination file These are just 2 seconds of the 24 seconds needed to build hadoop-common. So obviously not substantial. Will keep looking. I'm beginning to think running from JARs might not be the best idea to have a quick dev cycle. Maybe I should try running from the target/classes directories and skip building the jar altogether.

          People

          • Assignee:
            Ravi Prakash
            Reporter:
            Ravi Prakash
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development