Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7939

Two fsimage_rollback_* files are created which are not deleted after rollback.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      During checkpoint , if any failure in uploading to the remote Namenode then restarting Namenode with "rollingUpgrade started" option creates 2 fsimage_rollback_* at Active Namenode .

      On rolling upgrade rollback , initially created fsimage_rollback_* file is not been deleted.

        Activity

        Hide
        andreina J.Andreina added a comment -

        Step 1: NN1 is active , NN2 is standby.
        Step 2: Perform "hdfs dfsadmin rollingUpgrade prepare"
        Step 3: Active NN1 gone down.

        NN1:
        
        -rw-r--r-- 1 Rex users      67 Mar 17 17:35 edits_0000000000000000001-0000000000000000003
        -rw-r--r-- 1 Rex users 1048576 Mar 17 17:35 edits_inprogress_0000000000000000004
        -rw-r--r-- 1 Rex users     350 Mar 17 17:33 fsimage_0000000000000000000
        -rw-r--r-- 1 Rex users      62 Mar 17 17:33 fsimage_0000000000000000000.md5
        -rw-r--r-- 1 Rex users       2 Mar 17 17:35 seen_txid
        -rw-r--r-- 1 Rex users     206 Mar 17 17:33 VERSION
        
        NN2:
        
        -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000005
        -rw-r--r-- 1 Rex users     349 Mar 17 17:37 fsimage_0000000000000000000
        -rw-r--r-- 1 Rex users      62 Mar 17 17:37 fsimage_0000000000000000000.md5
        -rw-r--r-- 1 Rex users       2 Mar 17 17:37 seen_txid
        -rw-r--r-- 1 Rex users     205 Mar 17 17:37 VERSION
        

        Step 4: Restart NN2 with "rollingUpgrade started" option. ( Created fsimage_rollback_0000000000000000004, closed txn 5 and NN2 became Active. But not able to upload to NN1.)
        Step 5: Restart NN1 with "rollingUpgrade started" option. ( NN1 became standby)

        Issue :
        =======
        NN1 did checkpoint for one extra txn ( id: 5) and uploaded one more fsimage_rollback_0000000000000000005 to NN2
        On rollback , NN2 deletes only fsimage_rollback_0000000000000000005 , leaving behind fsimage_rollback_0000000000000000004 without deleting.

        NN2 :
        
        -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_0000000000000000005-0000000000000000005
        -rw-r--r-- 1 Rex users 1048576 Mar 17 17:39 edits_inprogress_0000000000000000006
        -rw-r--r-- 1 Rex users     349 Mar 17 17:37 fsimage_0000000000000000000
        -rw-r--r-- 1 Rex users      62 Mar 17 17:37 fsimage_0000000000000000000.md5
        -rw-r--r-- 1 Rex users     356 Mar 17 17:39 fsimage_rollback_0000000000000000004
        -rw-r--r-- 1 Rex users      71 Mar 17 17:39 fsimage_rollback_0000000000000000004.md5
        -rw-r--r-- 1 Rex users     356 Mar 17 17:39 fsimage_rollback_0000000000000000005
        -rw-r--r-- 1 Rex users      71 Mar 17 17:39 fsimage_rollback_0000000000000000005.md5
        -rw-r--r-- 1 Rex users       2 Mar 17 17:37 seen_txid
        -rw-r--r-- 1 Rex users     205 Mar 17 17:39 VERSION
        
        
        NN1 :
        
        -rw-r--r-- 1 Rex users      67 Mar 17 17:38 edits_0000000000000000001-0000000000000000003
        -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000004
        -rw-r--r-- 1 Rex users     349 Mar 17 17:36 fsimage_0000000000000000000
        -rw-r--r-- 1 Rex users      62 Mar 17 17:36 fsimage_0000000000000000000.md5
        -rw-r--r-- 1 Rex users     356 Mar 17 17:39 fsimage_rollback_0000000000000000005
        -rw-r--r-- 1 Rex users      71 Mar 17 17:39 fsimage_rollback_0000000000000000005.md5
        -rw-r--r-- 1 Rex users       2 Mar 17 17:38 seen_txid
        -rw-r--r-- 1 Rex users     205 Mar 17 17:39 VERSION
        
        Show
        andreina J.Andreina added a comment - Step 1: NN1 is active , NN2 is standby. Step 2: Perform "hdfs dfsadmin rollingUpgrade prepare" Step 3: Active NN1 gone down. NN1: -rw-r--r-- 1 Rex users 67 Mar 17 17:35 edits_0000000000000000001-0000000000000000003 -rw-r--r-- 1 Rex users 1048576 Mar 17 17:35 edits_inprogress_0000000000000000004 -rw-r--r-- 1 Rex users 350 Mar 17 17:33 fsimage_0000000000000000000 -rw-r--r-- 1 Rex users 62 Mar 17 17:33 fsimage_0000000000000000000.md5 -rw-r--r-- 1 Rex users 2 Mar 17 17:35 seen_txid -rw-r--r-- 1 Rex users 206 Mar 17 17:33 VERSION NN2: -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000005 -rw-r--r-- 1 Rex users 349 Mar 17 17:37 fsimage_0000000000000000000 -rw-r--r-- 1 Rex users 62 Mar 17 17:37 fsimage_0000000000000000000.md5 -rw-r--r-- 1 Rex users 2 Mar 17 17:37 seen_txid -rw-r--r-- 1 Rex users 205 Mar 17 17:37 VERSION Step 4: Restart NN2 with "rollingUpgrade started" option. ( Created fsimage_rollback_0000000000000000004, closed txn 5 and NN2 became Active. But not able to upload to NN1.) Step 5: Restart NN1 with "rollingUpgrade started" option. ( NN1 became standby) Issue : ======= NN1 did checkpoint for one extra txn ( id: 5) and uploaded one more fsimage_rollback_0000000000000000005 to NN2 On rollback , NN2 deletes only fsimage_rollback_0000000000000000005 , leaving behind fsimage_rollback_0000000000000000004 without deleting. NN2 : -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_0000000000000000005-0000000000000000005 -rw-r--r-- 1 Rex users 1048576 Mar 17 17:39 edits_inprogress_0000000000000000006 -rw-r--r-- 1 Rex users 349 Mar 17 17:37 fsimage_0000000000000000000 -rw-r--r-- 1 Rex users 62 Mar 17 17:37 fsimage_0000000000000000000.md5 -rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000004 -rw-r--r-- 1 Rex users 71 Mar 17 17:39 fsimage_rollback_0000000000000000004.md5 -rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000005 -rw-r--r-- 1 Rex users 71 Mar 17 17:39 fsimage_rollback_0000000000000000005.md5 -rw-r--r-- 1 Rex users 2 Mar 17 17:37 seen_txid -rw-r--r-- 1 Rex users 205 Mar 17 17:39 VERSION NN1 : -rw-r--r-- 1 Rex users 67 Mar 17 17:38 edits_0000000000000000001-0000000000000000003 -rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000004 -rw-r--r-- 1 Rex users 349 Mar 17 17:36 fsimage_0000000000000000000 -rw-r--r-- 1 Rex users 62 Mar 17 17:36 fsimage_0000000000000000000.md5 -rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000005 -rw-r--r-- 1 Rex users 71 Mar 17 17:39 fsimage_rollback_0000000000000000005.md5 -rw-r--r-- 1 Rex users 2 Mar 17 17:38 seen_txid -rw-r--r-- 1 Rex users 205 Mar 17 17:39 VERSION
        Hide
        andreina J.Andreina added a comment -

        Attached an initial patch to delete old fsimage_rollback_* file on rollback.

        Please review.

        Show
        andreina J.Andreina added a comment - Attached an initial patch to delete old fsimage_rollback_* file on rollback. Please review.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12708145/HDFS-7939.1.patch
        against trunk revision 85dc3c1.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708145/HDFS-7939.1.patch against trunk revision 85dc3c1. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10127//console This message is automatically generated.
        Hide
        andreina J.Andreina added a comment -

        Testcase failures are not related to this path.
        Please review the patch.

        Show
        andreina J.Andreina added a comment - Testcase failures are not related to this path. Please review the patch.
        Hide
        vinayrpet Vinayakumar B added a comment -

        +1 for the patch.
        Will commit soon

        Show
        vinayrpet Vinayakumar B added a comment - +1 for the patch. Will commit soon
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #7557 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7557/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #7557 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7557/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Hide
        vinayrpet Vinayakumar B added a comment -

        Committed to trunk and branch-2.

        Show
        vinayrpet Vinayakumar B added a comment - Committed to trunk and branch-2.
        Hide
        andreina J.Andreina added a comment -

        Thanks @Vinayakumar B for the commit.

        Show
        andreina J.Andreina added a comment - Thanks @Vinayakumar B for the commit.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #893 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/893/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #893 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/893/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #159 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/159/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #159 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/159/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2091 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2091/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2091 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2091/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #150 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/150/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #150 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/150/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #160 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/160/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #160 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/160/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2109 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2109/)
        HDFS-7939. Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2109 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2109/ ) HDFS-7939 . Two fsimage_rollback_* files are created which are not deleted after rollback. (Contributed by J.Andreina) (vinayakumarb: rev 987c9e12e184b35a5abab49f4188e22509ad63a5) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java

          People

          • Assignee:
            andreina J.Andreina
            Reporter:
            andreina J.Andreina
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development