Hadoop Common
  1. Hadoop Common
  2. HADOOP-6200

glue together the different projects to make local builds easier

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 0.21.0
    • Fix Version/s: None
    • Component/s: build
    • Labels:
      None

      Description

      It's currently fairly tricky to get everything building/up to date in a single machine. We can and should improve this.

        Activity

        Hide
        steve_l added a comment -
        1. create a wiki page documenting current best practises, symlink tricks, etc.
        2. add the option of using ivy to retrieve artifacts, by patching up the ivy.xml files
        3. deal with any cycles in the dependencies, so allowing ivy to order multiple builds (this is tricky because of testing, we can do some tricks here)
        4. provide an optional main build file (which repository?) to build everything in the right order
        Show
        steve_l added a comment - create a wiki page documenting current best practises, symlink tricks, etc. add the option of using ivy to retrieve artifacts, by patching up the ivy.xml files deal with any cycles in the dependencies, so allowing ivy to order multiple builds (this is tricky because of testing, we can do some tricks here) provide an optional main build file (which repository?) to build everything in the right order
        Hide
        Todd Lipcon added a comment -

        deal with any cycles in the dependencies, so allowing ivy to order multiple builds (this is tricky because of testing, we can do some tricks here)

        Where might we have cyclic dependencies? It seems to me, if we have such, they should be hunted down and destroyed mercilessly rather than worked around in the build process, right?

        Show
        Todd Lipcon added a comment - deal with any cycles in the dependencies, so allowing ivy to order multiple builds (this is tricky because of testing, we can do some tricks here) Where might we have cyclic dependencies? It seems to me, if we have such, they should be hunted down and destroyed mercilessly rather than worked around in the build process, right?
        Hide
        steve_l added a comment -

        Hadoop-hdfs depends on hadoop-mapreduce for testing, hence, a cycle.

        Right now I am playing tricks with symlinks to hook up the lib directories, so what I build in one dir is automatically picked up by the adjacent project and documenting what I am doing for the hadoop wiki but its a bit complex.

        options

        1. flatten : pull out hadoop-hdfs.run-test-hdfs-with-mr bit and move to a subproject that depends on hadoop-hdfs and hadoop-mapreduce
        2. boostrap via the central repository. Rather than have copies of artifacts in the different bits of SVN, stick some alpha releases of everything up onto the central repository. Then you can use ivy to pull things in, so when I build hdfs the latest version of common gets picked up, and the latest version of mapreduce. If I publish locally, I get the version I ask for, but the default would be to get the last release on the central repo.

        I'm coming round in favour of #2, because it helps us debug the publishing process with, say, a fortnightly alpha release of the artifacts (PMC approval still needed, incidentally), so that when the time comes to do real beta releases, the POMs and such like are stable.

        Show
        steve_l added a comment - Hadoop-hdfs depends on hadoop-mapreduce for testing, hence, a cycle. Right now I am playing tricks with symlinks to hook up the lib directories, so what I build in one dir is automatically picked up by the adjacent project and documenting what I am doing for the hadoop wiki but its a bit complex. options flatten : pull out hadoop-hdfs.run-test-hdfs-with-mr bit and move to a subproject that depends on hadoop-hdfs and hadoop-mapreduce boostrap via the central repository. Rather than have copies of artifacts in the different bits of SVN, stick some alpha releases of everything up onto the central repository. Then you can use ivy to pull things in, so when I build hdfs the latest version of common gets picked up, and the latest version of mapreduce. If I publish locally, I get the version I ask for, but the default would be to get the last release on the central repo. I'm coming round in favour of #2, because it helps us debug the publishing process with, say, a fortnightly alpha release of the artifacts (PMC approval still needed, incidentally), so that when the time comes to do real beta releases, the POMs and such like are stable.
        Hide
        Doug Cutting added a comment -

        It's not actually a cycle, since tests are layered after. Each can be built independently. They can't be fully tested independently, but that's different. Perhaps we should separate tests that require mapreduce from those that do not.

        A simple lo-tek solution might be to reference ../mapreduce/build/ in the mapreduce test classpath and require that developers who wish to run those tests check things out in sibling directories. We could even then have single build target that builds and tests everything.

        Show
        Doug Cutting added a comment - It's not actually a cycle, since tests are layered after. Each can be built independently. They can't be fully tested independently, but that's different. Perhaps we should separate tests that require mapreduce from those that do not. A simple lo-tek solution might be to reference ../mapreduce/build/ in the mapreduce test classpath and require that developers who wish to run those tests check things out in sibling directories. We could even then have single build target that builds and tests everything.
        Hide
        steve_l added a comment -

        When I tried to bump up the artifact version number by way of a shared build.properties file, hdfs was not happy, as in "refuses to build the base JAR not happy". Therefore, a cycle exists in the jar build process, even if the dependencies only come together at test time.

        init:
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/classes
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/src
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/hdfs/WEB-INF
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/datanode/WEB-INF
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/secondary/WEB-INF
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/ant
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test/classes
            [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test/extraconf
            [touch] Creating /var/folders/6j/6jD1CUYiGs43jrHUr7BepU+++TI/-Tmp-/null2099585006
           [delete] Deleting: /var/folders/6j/6jD1CUYiGs43jrHUr7BepU+++TI/-Tmp-/null2099585006
             [copy] Copying 2 files to /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps
        
        BUILD FAILED
        /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build.xml:259: src '/Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/lib/hadoop-mapred-0.21.0-alpha-15.jar' doesn't exist.
        

        I've documented what I've done to get the build working:
        http://wiki.apache.org/hadoop/BuildingHadoopFromSVN

        This has

        • symbolic links to hook up the files. This is why you should be using a unix to build on.
        • a boot process where you don't flip the version marker on mapreduce until you've build hdfs.

        That wiki entry documents what we have today, its the starting point to what we have to do to simplify things.

        1. We could have a shared base build.xml used by all projects, externally pulled in by -hdfs and -mapreduce
        2. I would like to use ivy to glue together stuff locally and remotely. For this to work we need at least one alpha release of 0.21 up in the big ibiblio repository
        3. the hdfs build could be tweaked to only bail out at test-time if the mapred JAR is missing, because that is the only time it is needed.
        Show
        steve_l added a comment - When I tried to bump up the artifact version number by way of a shared build.properties file, hdfs was not happy, as in "refuses to build the base JAR not happy". Therefore, a cycle exists in the jar build process, even if the dependencies only come together at test time. init: [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/classes [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/src [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/hdfs/WEB-INF [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/datanode/WEB-INF [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps/secondary/WEB-INF [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/ant [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test/classes [mkdir] Created dir: /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/test/extraconf [touch] Creating / var /folders/6j/6jD1CUYiGs43jrHUr7BepU+++TI/-Tmp-/null2099585006 [delete] Deleting: / var /folders/6j/6jD1CUYiGs43jrHUr7BepU+++TI/-Tmp-/null2099585006 [copy] Copying 2 files to /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build/webapps BUILD FAILED /Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/build.xml:259: src '/Users/slo/Java/Hadoop/lifecycle/hadoop-hdfs/lib/hadoop-mapred-0.21.0-alpha-15.jar' doesn't exist. I've documented what I've done to get the build working: http://wiki.apache.org/hadoop/BuildingHadoopFromSVN This has symbolic links to hook up the files. This is why you should be using a unix to build on. a boot process where you don't flip the version marker on mapreduce until you've build hdfs. That wiki entry documents what we have today, its the starting point to what we have to do to simplify things. We could have a shared base build.xml used by all projects, externally pulled in by -hdfs and -mapreduce I would like to use ivy to glue together stuff locally and remotely. For this to work we need at least one alpha release of 0.21 up in the big ibiblio repository the hdfs build could be tweaked to only bail out at test-time if the mapred JAR is missing, because that is the only time it is needed.
        Hide
        steve_l added a comment -

        Actually, it turns out to really be a cycle. the hadoop-hdfs.init target requires hadoop-mapred to exist, because it unzips the webapps.
        <unzip src="$

        {lib.dir}

        /hadoop-mapred-$

        {hadoop-mr.version}

        .jar"
        dest="$

        {build.dir}

        ">
        <patternset>
        <include name="webapps/**"/>
        </patternset>
        </unzip>

        you cannot build hdfs without mapred, you cannot build mapred without hdfs
        Therefore, the graph of build dependencies is not technically acyclic.

        Show
        steve_l added a comment - Actually, it turns out to really be a cycle. the hadoop-hdfs.init target requires hadoop-mapred to exist, because it unzips the webapps. <unzip src="$ {lib.dir} /hadoop-mapred-$ {hadoop-mr.version} .jar" dest="$ {build.dir} "> <patternset> <include name="webapps/**"/> </patternset> </unzip> you cannot build hdfs without mapred, you cannot build mapred without hdfs Therefore, the graph of build dependencies is not technically acyclic.
        Hide
        Doug Cutting added a comment -

        > the hadoop-hdfs.init target requires hadoop-mapred to exist, because it unzips the webapps

        That seems like a bug we should fix, no?

        Show
        Doug Cutting added a comment - > the hadoop-hdfs.init target requires hadoop-mapred to exist, because it unzips the webapps That seems like a bug we should fix, no?
        Hide
        steve_l added a comment -

        having played with this locally,

        1. the unzip is only needed to get at the webapps/static content. This content could be moved into common
        2. the JARs are only needed to compile and test hdfs-with-mr

        It's fairly easy to skip all of these if the JARs arent found. The dependencies are still there, but you can
        at least build the JARs from scratch

        Show
        steve_l added a comment - having played with this locally, 1. the unzip is only needed to get at the webapps/static content. This content could be moved into common 2. the JARs are only needed to compile and test hdfs-with-mr It's fairly easy to skip all of these if the JARs arent found. The dependencies are still there, but you can at least build the JARs from scratch
        Hide
        Steve Loughran added a comment -

        migration to maven supercedes this

        Show
        Steve Loughran added a comment - migration to maven supercedes this

          People

          • Assignee:
            Steve Loughran
            Reporter:
            Steve Loughran
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development