Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9788

Incompatible tag renumbering in HeartbeatResponseProto

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: rolling upgrades
    • Labels:
      None
    • Target Version/s:

      Description

      The HDFS-9426 patches for branch-2.7 and branch-2 used different tag numbers for rollingUpgradeStatusV2 in HeartbeatResponseProto:

      trunk/branch-2:

      message HeartbeatResponseProto {
        repeated DatanodeCommandProto cmds = 1; // Returned commands can be null
        required NNHAStatusHeartbeatProto haStatus = 2;
        optional RollingUpgradeStatusProto rollingUpgradeStatus = 3;
        optional uint64 fullBlockReportLeaseId = 4 [ default = 0 ];
        optional RollingUpgradeStatusProto rollingUpgradeStatusV2 = 5;
      }
      

      branch-2.7:

      message HeartbeatResponseProto {
        repeated DatanodeCommandProto cmds = 1; // Returned commands can be null
        required NNHAStatusHeartbeatProto haStatus = 2;
        optional RollingUpgradeStatusProto rollingUpgradeStatus = 3;
        optional RollingUpgradeStatusProto rollingUpgradeStatusV2 = 4;
      }
      

      This breaks rolling upgrade between 2.7 and a future 2.8. We need to renumber the tags to preserve wire compatibility.

        Activity

        Hide
        vinayrpet Vinayakumar B added a comment -

        Thanks Andrew Wang for the find and fix.

        Show
        vinayrpet Vinayakumar B added a comment - Thanks Andrew Wang for the find and fix.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9279 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9279/)
        HDFS-9788. Incompatible tag renumbering in HeartbeatResponseProto. (wang: rev aeb13ef2ab657a1c2f9d9ed5f80783f788c4953a)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9279 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9279/ ) HDFS-9788 . Incompatible tag renumbering in HeartbeatResponseProto. (wang: rev aeb13ef2ab657a1c2f9d9ed5f80783f788c4953a) hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        andrew.wang Andrew Wang added a comment -

        Pushed to trunk, branch-2, branch-2.8. Thanks again all.

        Show
        andrew.wang Andrew Wang added a comment - Pushed to trunk, branch-2, branch-2.8. Thanks again all.
        Hide
        andrew.wang Andrew Wang added a comment -

        Thanks for the quick turnarounds on reviews Uma and Arpit, I'll be committing this shortly.

        Show
        andrew.wang Andrew Wang added a comment - Thanks for the quick turnarounds on reviews Uma and Arpit, I'll be committing this shortly.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 12s trunk passed
        +1 compile 0m 39s trunk passed with JDK v1.8.0_72
        +1 compile 0m 40s trunk passed with JDK v1.7.0_95
        +1 mvnsite 0m 58s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 mvninstall 0m 52s the patch passed
        +1 compile 0m 40s the patch passed with JDK v1.8.0_72
        +1 cc 0m 40s the patch passed
        +1 javac 0m 40s the patch passed
        +1 compile 0m 42s the patch passed with JDK v1.7.0_95
        +1 cc 0m 42s the patch passed
        +1 javac 0m 42s the patch passed
        +1 mvnsite 0m 53s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        -1 unit 58m 24s hadoop-hdfs in the patch failed with JDK v1.8.0_72.
        -1 unit 53m 25s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 27s Patch does not generate ASF License warnings.
        125m 50s



        Reason Tests
        JDK v1.8.0_72 Failed junit tests hadoop.hdfs.TestFileAppend
          hadoop.hdfs.server.datanode.TestBlockScanner
          hadoop.fs.TestHdfsNativeCodeLoader
        JDK v1.7.0_95 Failed junit tests hadoop.hdfs.web.TestWebHDFS
          hadoop.fs.TestHdfsNativeCodeLoader
          hadoop.hdfs.TestReconstructStripedFile



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787340/HDFS-9788.001.patch
        JIRA Issue HDFS-9788
        Optional Tests asflicense compile cc mvnsite javac unit
        uname Linux 1ce1b67f2313 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c3641ed
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14444/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 77MB
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14444/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 12s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_72 +1 compile 0m 40s trunk passed with JDK v1.7.0_95 +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 14s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 40s the patch passed with JDK v1.8.0_72 +1 cc 0m 40s the patch passed +1 javac 0m 40s the patch passed +1 compile 0m 42s the patch passed with JDK v1.7.0_95 +1 cc 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 unit 58m 24s hadoop-hdfs in the patch failed with JDK v1.8.0_72. -1 unit 53m 25s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 27s Patch does not generate ASF License warnings. 125m 50s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.fs.TestHdfsNativeCodeLoader JDK v1.7.0_95 Failed junit tests hadoop.hdfs.web.TestWebHDFS   hadoop.fs.TestHdfsNativeCodeLoader   hadoop.hdfs.TestReconstructStripedFile Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787340/HDFS-9788.001.patch JIRA Issue HDFS-9788 Optional Tests asflicense compile cc mvnsite javac unit uname Linux 1ce1b67f2313 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c3641ed Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14444/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14444/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14444/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        I don't think the YARN-3330 script would have caught it. We'd need to diff proto files across release lines to detect such issues.

        Agreed. To catch this kind of issues, it has to run across versions. Not sure if we have better ways to achieve that automatically.

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - I don't think the YARN-3330 script would have caught it. We'd need to diff proto files across release lines to detect such issues. Agreed. To catch this kind of issues, it has to run across versions. Not sure if we have better ways to achieve that automatically.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Good catch. I don't think the YARN-3330 script would have caught it. We'd need to diff proto files across release lines to detect such issues.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Good catch. I don't think the YARN-3330 script would have caught it. We'd need to diff proto files across release lines to detect such issues.
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        Yeah. Got it. Actually minor version should not have incompatible changes, but HDFS-9426 change merged with differed versioning. That change itself be an incompatible change I think. . By noticing that we could have avoided it.

        This is a good reminder to check the PBs very carefully for each release. I wonder if there is a way to automate this check? I see YARN-3330 which has a python script, but looks like it's not reviewed yet.

        Agree if we can find some better way to find this numbering differences automatically. Big +1 for the ideas like in YARN-3330.

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - Yeah. Got it. Actually minor version should not have incompatible changes, but HDFS-9426 change merged with differed versioning. That change itself be an incompatible change I think. . By noticing that we could have avoided it. This is a good reminder to check the PBs very carefully for each release. I wonder if there is a way to automate this check? I see YARN-3330 which has a python script, but looks like it's not reviewed yet. Agree if we can find some better way to find this numbering differences automatically. Big +1 for the ideas like in YARN-3330 .
        Hide
        andrew.wang Andrew Wang added a comment -

        Thanks Uma. I think the rule is: never change the PB tag number when backporting. Tag numbers don't need to be monotonic, so we could have done rollingUpgradeStatusV2 = 5; in the branch-2.7 backport even though fullBlockReportLeaseId = 4; isn't present.

        This is a good reminder to check the PBs very carefully for each release. I wonder if there is a way to automate this check? I see YARN-3330 which has a python script, but looks like it's not reviewed yet.

        Show
        andrew.wang Andrew Wang added a comment - Thanks Uma. I think the rule is: never change the PB tag number when backporting. Tag numbers don't need to be monotonic, so we could have done rollingUpgradeStatusV2 = 5; in the branch-2.7 backport even though fullBlockReportLeaseId = 4; isn't present. This is a good reminder to check the PBs very carefully for each release. I wonder if there is a way to automate this check? I see YARN-3330 which has a python script, but looks like it's not reviewed yet.
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        Good catch Andrew. This patch should work.
        +1

        As HDFS-7923(the Jira which introduced fullBlockReportLeaseId=4 earlier) is targeted for 2.8, luckily we found this before 2.8 out. Safely we can change at this time.
        And we need not change in 2.7 versions ant thing as HDFS-9426(the JIRA which introduced differed numbering for rollingUpgradeStatusV2) was back ported with lower numbering already.So no compatibility issues there even though 2.7.2 out already.
        I have a question here. What is the best solution for this kind of issues in the future to avoid? Example if fullBlockReportLeaseId change targeted in 2.8 and if 2.8 is out already. Then changing the numbering would not be easy as this. How could we control such cases?

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - Good catch Andrew. This patch should work. +1 As HDFS-7923 (the Jira which introduced fullBlockReportLeaseId=4 earlier) is targeted for 2.8, luckily we found this before 2.8 out. Safely we can change at this time. And we need not change in 2.7 versions ant thing as HDFS-9426 (the JIRA which introduced differed numbering for rollingUpgradeStatusV2) was back ported with lower numbering already.So no compatibility issues there even though 2.7.2 out already. I have a question here. What is the best solution for this kind of issues in the future to avoid? Example if fullBlockReportLeaseId change targeted in 2.8 and if 2.8 is out already. Then changing the numbering would not be easy as this. How could we control such cases?
        Hide
        andrew.wang Andrew Wang added a comment -

        Simple patch attached. Kihwal Lee / Vinayakumar B could you review?

        Show
        andrew.wang Andrew Wang added a comment - Simple patch attached. Kihwal Lee / Vinayakumar B could you review?

          People

          • Assignee:
            andrew.wang Andrew Wang
            Reporter:
            andrew.wang Andrew Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development