Hadoop Common
  1. Hadoop Common
  2. HADOOP-8968

Add a flag to completely disable the worker version check

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The current logic in the TaskTracker and the DataNode to allow a relax version check with the JobTracker and NameNode works only if the versions of Hadoop are exactly the same.

      We should add a switch to disable version checking completely, to enable rolling upgrades between compatible versions (typically patch versions).

      1. HADOOP-8968.patch
        17 kB
        Alejandro Abdelnur
      2. HADOOP-8968.patch
        13 kB
        Alejandro Abdelnur
      3. HADOOP-8968.patch
        15 kB
        Alejandro Abdelnur
      4. HADOOP-8968.patch
        15 kB
        Alejandro Abdelnur
      5. HADOOP-8968.patch
        14 kB
        Alejandro Abdelnur

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550549/HADOOP-8968.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1668//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550549/HADOOP-8968.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1668//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          patch does not apply on trunk because it is for branch-1

          Show
          Alejandro Abdelnur added a comment - patch does not apply on trunk because it is for branch-1
          Hide
          Eli Collins added a comment - - edited

          The relaxed check is done after the strict check, thus no chance for the relaxed check to ever happen.

          The assertion check before the relaxed version check is not the strict check, the build version string contains the version string so we should never have the build version match w/o the version matching, that's why this is an assert.

          The relaxed version check works, it's purpose is to permit a TT to connect to a JT with a different build version but identical version. What you want is a TT that can connect to a JT with a different version. I'd do one of the following:

          1. Add a flag that disables the version check entirely
          2. Backport VersionInfo from HDFS-2983 rather than reimplement something similar here

          Either way I'd use the same mechanism across MR and HDFS (like HADOOP-8209).

          Show
          Eli Collins added a comment - - edited The relaxed check is done after the strict check, thus no chance for the relaxed check to ever happen. The assertion check before the relaxed version check is not the strict check, the build version string contains the version string so we should never have the build version match w/o the version matching, that's why this is an assert. The relaxed version check works, it's purpose is to permit a TT to connect to a JT with a different build version but identical version. What you want is a TT that can connect to a JT with a different version. I'd do one of the following: Add a flag that disables the version check entirely Backport VersionInfo from HDFS-2983 rather than reimplement something similar here Either way I'd use the same mechanism across MR and HDFS (like HADOOP-8209 ).
          Hide
          Alejandro Abdelnur added a comment -

          Eli, thanks for the review.

          Regarding the exact version mach on relaxed check, if a patch release (ie 1.2.0.2) is done then the current logic will not allow the TT to start. With the proposed patch, it would work. And giving flexibility, it would allow to change the precision.

          Regarding backporting VersionUtil (I assume you meant that) from HDFS-2983, I've looked at it first, but the compareVersions() method compares the whole version string, there is not way to define the precision.

          Regarding 'Either way..', the current patch does it for both MR and HDFS.

          Said this, I'd be OK with a full version disabling as well. I was just trying to constraint versions a bit.

          Show
          Alejandro Abdelnur added a comment - Eli, thanks for the review. Regarding the exact version mach on relaxed check, if a patch release (ie 1.2.0.2) is done then the current logic will not allow the TT to start. With the proposed patch, it would work. And giving flexibility, it would allow to change the precision. Regarding backporting VersionUtil (I assume you meant that) from HDFS-2983 , I've looked at it first, but the compareVersions() method compares the whole version string, there is not way to define the precision. Regarding 'Either way..', the current patch does it for both MR and HDFS. Said this, I'd be OK with a full version disabling as well. I was just trying to constraint versions a bit.
          Hide
          Alejandro Abdelnur added a comment -

          new version of the patch that adds a flag to completely disable version checking (updated summary & description accordingly).

          Show
          Alejandro Abdelnur added a comment - new version of the patch that adds a flag to completely disable version checking (updated summary & description accordingly).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550706/HADOOP-8968.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1671//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550706/HADOOP-8968.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1671//console This message is automatically generated.
          Hide
          Harsh J added a comment -

          I just went through these and the code plus tests look fine to me. +1.

          I understand that such a change isn't exactly required on the trunk-end, as we no longer utilize Hadoop-RPC there.

          Show
          Harsh J added a comment - I just went through these and the code plus tests look fine to me. +1. I understand that such a change isn't exactly required on the trunk-end, as we no longer utilize Hadoop-RPC there.
          Hide
          Suresh Srinivas added a comment -

          Alejandro, I can understand skipping version checks in point releases (with the convention - major.minor.point). Skipping the version check in this patch skips it for major and minor releases as well. What is the use case you are trying to address? If it is to allow point releases, should the checks be stricter?

          Also browsing the code, your log only prints revision when it skipos version check. Should version also be printed? (The names version and revision took me a while to understand )

          Show
          Suresh Srinivas added a comment - Alejandro, I can understand skipping version checks in point releases (with the convention - major.minor.point). Skipping the version check in this patch skips it for major and minor releases as well. What is the use case you are trying to address? If it is to allow point releases, should the checks be stricter? Also browsing the code, your log only prints revision when it skipos version check. Should version also be printed? (The names version and revision took me a while to understand )
          Hide
          Eli Collins added a comment -

          @Suresh,
          The use case for this change is allowing a rolling upgrade when there is not an exact version match, ie across minor releases in branch-1. Currently we disallow that, from HADOOP-8209:

          In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

          @Tucu,

          • Seems like we should still assert on invalid builds even if skip check flag is used
          • Should remove the "not" in "different versions are not permitted"
          • Nit, feel free to ignore, I'd rename this hadoop.skip.worker.version.check
            Otherwise looks good.
          Show
          Eli Collins added a comment - @Suresh, The use case for this change is allowing a rolling upgrade when there is not an exact version match, ie across minor releases in branch-1. Currently we disallow that, from HADOOP-8209 : In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else. @Tucu, Seems like we should still assert on invalid builds even if skip check flag is used Should remove the "not" in "different versions are not permitted" Nit, feel free to ignore, I'd rename this hadoop.skip.worker.version.check Otherwise looks good.
          Hide
          Alejandro Abdelnur added a comment -

          Sure, Eli, thanks for the feedback.

          Updated patch with following changes:

          • renamed 'no.check' to 'skip.check'
          • When 'skip.check' is TRUE we log both version and revision
          • removed 'not' from comment

          Eli, on your first bullet, about asserting an invalid build. I'd to leave it like it is, without the build check; this allows experimental patches to be put to try something. Also, depending on the SCM you are using (ie SVN or GIT) defining an invalid build may be difficult.

          Show
          Alejandro Abdelnur added a comment - Sure, Eli, thanks for the feedback. Updated patch with following changes: renamed 'no.check' to 'skip.check' When 'skip.check' is TRUE we log both version and revision removed 'not' from comment Eli, on your first bullet, about asserting an invalid build. I'd to leave it like it is, without the build check; this allows experimental patches to be put to try something. Also, depending on the SCM you are using (ie SVN or GIT) defining an invalid build may be difficult.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550837/HADOOP-8968.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1674//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550837/HADOOP-8968.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1674//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          My comment earlier

          Skipping the version check in this patch skips it for major and minor releases as well.


          Eli, after a cursory look, this patch turns off version match even for major release. Is that not true?

          Show
          Suresh Srinivas added a comment - My comment earlier Skipping the version check in this patch skips it for major and minor releases as well. Eli, after a cursory look, this patch turns off version match even for major release. Is that not true?
          Hide
          Eli Collins added a comment - - edited

          @Tucu, on the TT side, under what scenario could an experimental patch result in a build version that matches but a version that does not? The build version string should contain the version string, ie you'd have to manually edit the version info object for that not to be the case. On the DN side you could construct a build with matching revisions but not versions if you build from the same exact source but use different version strings, but I don't see why anyone would do this. For example, if you're doing an experimental patch then the builds would have different revisions, and to trigger this assert you need to match revisions but not versions. If we think these checks should in fact fire in cases we should make them runtime errors instead of assertion errors (which are intended for cases that should not happen), but I don't see any valid scenario where we'd hit them.

          @Suresh, correct, this patch adds an option to turn off the version match completely, ie the version string (with the embedded major, minor, point version) is ignored. V1 workers obviously can't connect to v2, and since we're not doing another major release off branch-1 the only way you could get a DN/TT to try connect to a different major version is to enable this option and use DNs/TTs from a 1.2.x build and have a NN or JT from a 0.20 build.

          Show
          Eli Collins added a comment - - edited @Tucu, on the TT side, under what scenario could an experimental patch result in a build version that matches but a version that does not? The build version string should contain the version string, ie you'd have to manually edit the version info object for that not to be the case. On the DN side you could construct a build with matching revisions but not versions if you build from the same exact source but use different version strings, but I don't see why anyone would do this. For example, if you're doing an experimental patch then the builds would have different revisions, and to trigger this assert you need to match revisions but not versions. If we think these checks should in fact fire in cases we should make them runtime errors instead of assertion errors (which are intended for cases that should not happen), but I don't see any valid scenario where we'd hit them. @Suresh, correct, this patch adds an option to turn off the version match completely, ie the version string (with the embedded major, minor, point version) is ignored. V1 workers obviously can't connect to v2, and since we're not doing another major release off branch-1 the only way you could get a DN/TT to try connect to a different major version is to enable this option and use DNs/TTs from a 1.2.x build and have a NN or JT from a 0.20 build.
          Hide
          Alejandro Abdelnur added a comment -

          updated patch. Renaming boolean variable from noVersionCheck to skipVersionCheck. And fixing TT messages to state 'build' instead 'revision'.

          Show
          Alejandro Abdelnur added a comment - updated patch. Renaming boolean variable from noVersionCheck to skipVersionCheck. And fixing TT messages to state 'build' instead 'revision'.
          Hide
          Alejandro Abdelnur added a comment -

          @Eli, you are right on buildversion matching and version not and so on. This was introduced by HADOOP-8209. I think that should be fixed. But it is different from the issue this JIRA is trying to solve, to allow different versions of the JT/TT DN/NN to work in order to enable a rolling upgrade between 2 different versions, yet compatible.

          Show
          Alejandro Abdelnur added a comment - @Eli, you are right on buildversion matching and version not and so on. This was introduced by HADOOP-8209 . I think that should be fixed. But it is different from the issue this JIRA is trying to solve, to allow different versions of the JT/TT DN/NN to work in order to enable a rolling upgrade between 2 different versions, yet compatible.
          Hide
          Alejandro Abdelnur added a comment -

          @Eli, got it on the assertion thingy. Uploaded a patch correcting that.

          Show
          Alejandro Abdelnur added a comment - @Eli, got it on the assertion thingy. Uploaded a patch correcting that.
          Hide
          Eli Collins added a comment -

          +1 lgtm. Thanks for the new rev Tucu.

          @Suresh, let me know if you have further comments, otherwise I'll commit this tomorrow.

          Show
          Eli Collins added a comment - +1 lgtm. Thanks for the new rev Tucu. @Suresh, let me know if you have further comments, otherwise I'll commit this tomorrow.
          Hide
          Suresh Srinivas added a comment -

          this patch adds an option to turn off the version match completely, ie the version string (with the embedded major, minor, point version) is ignored. V1 workers obviously can't connect to v2, and since we're not doing another major release off branch-1 the only way you could get a DN/TT to try connect to a different major version is to enable this option and use DNs/TTs from a 1.2.x build and have a NN or JT from a 0.20 build.

          Sounds reasonable.

          Show
          Suresh Srinivas added a comment - this patch adds an option to turn off the version match completely, ie the version string (with the embedded major, minor, point version) is ignored. V1 workers obviously can't connect to v2, and since we're not doing another major release off branch-1 the only way you could get a DN/TT to try connect to a different major version is to enable this option and use DNs/TTs from a 1.2.x build and have a NN or JT from a 0.20 build. Sounds reasonable.
          Hide
          Eli Collins added a comment -

          I've committed this. Thanks Tucu.

          Show
          Eli Collins added a comment - I've committed this. Thanks Tucu.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

            People

            • Assignee:
              Alejandro Abdelnur
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development