Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2480

MR-279: mr app should not depend on hard-coded version of shuffle

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      mr-279, mrv2, pom

      Description

      The following commit introduced a dependency of shuffle with hard-coded version for mr app:

      commit 6f69742140516be7493c9a9177b81d0516cc9539
      Author: Vinod Kumar Vavilapalli <vinodkv@apache.org>
      Date:   Wed May 4 06:53:52 2011 +0000
      
          Adding user log handling for YARN. Making NM put the user-logs on DFS and providing log-dump tools. Contributed by Vinod Kumar Vavilapalli.
      

        Activity

        Hide
        Mahadev konar added a comment -

        I just committed this to MR-279 branch. thanks luke!

        Show
        Mahadev konar added a comment - I just committed this to MR-279 branch. thanks luke!
        Hide
        Luke Lu added a comment -

        To further clarify, the dependency, ideally, should not be on the implementation but rather on some kind of interface exposed by the shuffle service.

        That's exactly my point. It should not depend on a specific implementation class but rather an SPI that both the app and the shuffle implementation depend on. In this case, the SPI is the service id at the moment. If you rename the ShuffleHandler to ShuffleHandlerImpl the code would look wrong too.

        Now you have a runtime classpath dependency on a specific shuffle implementation for mr app. If you replace the jar with a different class and update the config, the NM will happily load the service but the mr app won't even run due to class not found exception.

        Show
        Luke Lu added a comment - To further clarify, the dependency, ideally, should not be on the implementation but rather on some kind of interface exposed by the shuffle service. That's exactly my point. It should not depend on a specific implementation class but rather an SPI that both the app and the shuffle implementation depend on. In this case, the SPI is the service id at the moment. If you rename the ShuffleHandler to ShuffleHandlerImpl the code would look wrong too. Now you have a runtime classpath dependency on a specific shuffle implementation for mr app. If you replace the jar with a different class and update the config, the NM will happily load the service but the mr app won't even run due to class not found exception.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        To further clarify, the dependency, ideally, should not be on the implementation but rather on some kind of interface exposed by the shuffle service. Today there isn't any interface separate from the implementation and hence the dependency on the implementation itself.

        Show
        Vinod Kumar Vavilapalli added a comment - To further clarify, the dependency, ideally, should not be on the implementation but rather on some kind of interface exposed by the shuffle service. Today there isn't any interface separate from the implementation and hence the dependency on the implementation itself.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        However a bigger issue is whether app (a user land module) should depend on a specific implementation shuffle (loaded into node manager). I think it should not. The MAPREDUCE_SERVICE_ID should probably resides in a MRServices class in mr common. I have the impression (based on previous conversations) that Chris was gonna refactor the NM service stuff later. It's probably OK as is now.

        No, that isn't quite correct. An app really SHOULD depend on a specific implementation of shuffle. An app is tied to a specific version of shuffle be it the request format or the authentication data. Shuffle Service isn't really 'known' to the NM, it is an abstract service started by NM.

        Show
        Vinod Kumar Vavilapalli added a comment - However a bigger issue is whether app (a user land module) should depend on a specific implementation shuffle (loaded into node manager). I think it should not. The MAPREDUCE_SERVICE_ID should probably resides in a MRServices class in mr common. I have the impression (based on previous conversations) that Chris was gonna refactor the NM service stuff later. It's probably OK as is now. No, that isn't quite correct. An app really SHOULD depend on a specific implementation of shuffle. An app is tied to a specific version of shuffle be it the request format or the authentication data. Shuffle Service isn't really 'known' to the NM, it is an abstract service started by NM.
        Hide
        Luke Lu added a comment -

        The v1 patch fix the hard-code version.

        However a bigger issue is whether app (a user land module) should depend on a specific implementation shuffle (loaded into node manager). I think it should not. The MAPREDUCE_SERVICE_ID should probably resides in a MRServices class in mr common. I have the impression (based on previous conversations) that Chris was gonna refactor the NM service stuff later. It's probably OK as is now.

        Show
        Luke Lu added a comment - The v1 patch fix the hard-code version. However a bigger issue is whether app (a user land module) should depend on a specific implementation shuffle (loaded into node manager). I think it should not. The MAPREDUCE_SERVICE_ID should probably resides in a MRServices class in mr common. I have the impression (based on previous conversations) that Chris was gonna refactor the NM service stuff later. It's probably OK as is now.
        Hide
        Luke Lu added a comment -

        Being able to specify hadoop-mapreduce.version and yarn.version is required by CI builds. To reproduce the problem:

        $ mvn -Dhadoop-mapreduce.version=1.0-test -Dyarn.version=1.0-test clean install -P-cbuild -DskipTests
        
        [ERROR] BUILD ERROR
        [INFO] ------------------------------------------------------------------------
        [INFO] Failed to resolve artifact.
        
        Missing:
        ----------
        1) org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:1.0-SNAPSHOT
        
          Try downloading the file manually from the project website.
        
          Then, install it using the command: 
              mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-mapreduce-client-shuffle -Dversion=1.0-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file
        
          Alternatively, if you host your own repository you can deploy the file there: 
              mvn deploy:deploy-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-mapreduce-client-shuffle -Dversion=1.0-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
        
          Path to dependency: 
                1) org.apache.hadoop:hadoop-mapreduce-client-app:jar:1.0-test
                2) org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:1.0-SNAPSHOT
        
        ----------
        1 required artifact is missing.
        
        for artifact: 
          org.apache.hadoop:hadoop-mapreduce-client-app:jar:1.0-test
        
        Show
        Luke Lu added a comment - Being able to specify hadoop-mapreduce.version and yarn.version is required by CI builds. To reproduce the problem: $ mvn -Dhadoop-mapreduce.version=1.0-test -Dyarn.version=1.0-test clean install -P-cbuild -DskipTests [ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] Failed to resolve artifact. Missing: ---------- 1) org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:1.0-SNAPSHOT Try downloading the file manually from the project website. Then, install it using the command: mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-mapreduce-client-shuffle -Dversion=1.0-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file Alternatively, if you host your own repository you can deploy the file there: mvn deploy:deploy-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-mapreduce-client-shuffle -Dversion=1.0-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] Path to dependency: 1) org.apache.hadoop:hadoop-mapreduce-client-app:jar:1.0-test 2) org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:1.0-SNAPSHOT ---------- 1 required artifact is missing. for artifact: org.apache.hadoop:hadoop-mapreduce-client-app:jar:1.0-test

          People

          • Assignee:
            Luke Lu
            Reporter:
            Luke Lu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development