Hadoop Common
  1. Hadoop Common
  2. HADOOP-8371

Hadoop 1.0.1 release - DFS rollback issues

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Not a Problem
    • Affects Version/s: 1.0.1
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
    • Environment:

      All tests were done on a single node cluster, that runs namenode, secondarynamenode, datanode, all on one machine, running Ubuntu 12.04

      Description

      See the next comment for details.

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          Test Setup

          All tests were done on a single node cluster, that runs namenode, secondarynamenode, datanode, all on one machine, running Ubuntu
          12.04.
          /usr/local/hadoop/ is a soft link to /usr/local/hadoop-0.20.203.0/
          /usr/local/hadoop-1.0.1 contains the upgrade version.

          Version - 0.20.203.0

          • Formatted name node.
          • Contents of {dfs.name.dir}/current/VERSION

            Tue May 08 08:08:57 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31


            * Contents of {dfs.name.dir}

            /previous.checkpoint/VERSION

            Tue May 08 08:03:35 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31

          • Copied a few test files into HDFS.
          • Output from "fs -lsr /" command

            hduser@ruff790:/usr/local/hadoop/bin$ ./hadoop dfs -lsr /
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test
            rw-rr- 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz
            rw-rr- 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser

          • Executed "hadoop dfsadmin -finalizeUpgrade" (I do not think this is required, but i do not think it should matter either).
          • Stopped DFS by executing "stop-dfs.sh"

          Version - 1.0.1

          Upgrade

          • Tried starting DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh
          • As expected the name node start failed due to a version mismatch.

            2012-05-08 08:22:38,166 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
            initialization failed.
            java.io.IOException:
            File system image contains an old layout version -31.
            An upgrade to version -32 is required.
            Please restart NameNode with -upgrade option.

          • Ran /usr/local/hadoop-1.0.1/bin/stop-dfs.sh to stop datanode and secondarynamenode.
          • Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -upgrade
          • Checked upgrade status by calling /usr/local/hadoop-1.0.1/bin/hadoop dfsadmin -upgradeProgress status

            Upgrade for version -32 has been completed.
            Upgrade is not finalized.

          • Contents of {dfs.name.dir}/current/VERSION

            #Tue May 08 08:25:51 EDT 2012
            namespaceID=350250898
            cTime=1336479951669
            storageType=NAME_NODE
            layoutVersion=-32


            * Contents of {dfs.name.dir}

            /previous.checkpoint/VERSION

            Tue May 08 08:03:35 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31

          • Contents of {dfs.name.dir}/previous/VERSION

            #Tue May 08 08:08:57 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31


            * Checked to make sure i can list the contents of DFS
            * Stop DFS.

            h2.Rollback
            * Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -rollback
            * As per contents of "hadoop-hduser-namenode-ruff790.log", rollback seems to have succeeded.

            012-05-08 08:37:41,799 INFO org.apache.hadoop.hdfs.server.common.Storage: Rolling back storage
            directory /usr/local/app/hadoop/tmp/dfs/name.
            new LV = -31; new CTime = 0
            2012-05-08 08:37:41,801 INFO org.apache.hadoop.hdfs.server.common.Storage: Rollback of
            /usr/local/app/hadoop/tmp/dfs/name is complete.


            * Contents of {dfs.name.dir}

            /current/VERSION

            Tue May 08 08:37:42 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31

          • Contents of {dfs.name.dir}

            /previous.checkpoint/VERSION

            #Tue May 08 08:08:57 EDT 2012
            namespaceID=350250898
            cTime=0
            storageType=NAME_NODE
            layoutVersion=-31

          • Checked to make sure i can list the contents of DFS

            hduser@ruff790:/usr/local/hadoop-1.0.1/bin$ ./hadoop dfs -lsr /
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test
            rw-rr- 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz
            rw-rr- 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user
            drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser

          • However at this point i could not browse the file system from WebUI. Then i realized that data node is not really running. From the data
            node log file it seems like it had shut down during the rollback process.

            012-05-08 08:37:57,953 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting
            down: org.apache.hadoop.ipc.RemoteException:
            org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Unregistered data node:
            127.0.0.1:50010
            at org.apache.hadoop.hdfs.server.namenode.NameNode.verifyRequest(NameNode.java:1077)

          • So i ran "stop-dfs.sh" to shut down namnode and secondarynamenode.
          • Next "start-dfs.sh" fails to start the name node, as expected, with a version mismatch error.

            2012-05-08 08:50:51,084 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
            initialization failed.
            java.io.IOException:
            File system image contains an old layout version -31.
            An upgrade to version -32 is required.
            Please restart NameNode with -upgrade option.

          • Shut everything down and go back to the old version.

          Version - 0.20.203.0 (Again)

          • Now that i have rolled back the "1.0.1" upgrade i thought i could go back to version 0.20.203.0
          • So i go back and run /usr/local/hadoop/bin/start-dfs.sh and namenode does not start up. It fails with error message:

            2012-05-08 08:57:09,261 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
            initialization failed.
            java.io.IOException: Unexpected version of the file system log file: -32. Current version = -31.

          Show
          Suresh Srinivas added a comment - Test Setup All tests were done on a single node cluster, that runs namenode, secondarynamenode, datanode, all on one machine, running Ubuntu 12.04. /usr/local/hadoop/ is a soft link to /usr/local/hadoop-0.20.203.0/ /usr/local/hadoop-1.0.1 contains the upgrade version. Version - 0.20.203.0 Formatted name node. Contents of {dfs.name.dir}/current/VERSION Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 * Contents of {dfs.name.dir} /previous.checkpoint/VERSION Tue May 08 08:03:35 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 Copied a few test files into HDFS. Output from "fs -lsr /" command hduser@ruff790:/usr/local/hadoop/bin$ ./hadoop dfs -lsr / drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test rw-r r - 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz rw-r r - 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser Executed "hadoop dfsadmin -finalizeUpgrade" (I do not think this is required, but i do not think it should matter either). Stopped DFS by executing "stop-dfs.sh" Version - 1.0.1 Upgrade Tried starting DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh As expected the name node start failed due to a version mismatch. 2012-05-08 08:22:38,166 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -31. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. Ran /usr/local/hadoop-1.0.1/bin/stop-dfs.sh to stop datanode and secondarynamenode. Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -upgrade Checked upgrade status by calling /usr/local/hadoop-1.0.1/bin/hadoop dfsadmin -upgradeProgress status Upgrade for version -32 has been completed. Upgrade is not finalized. Contents of {dfs.name.dir}/current/VERSION #Tue May 08 08:25:51 EDT 2012 namespaceID=350250898 cTime=1336479951669 storageType=NAME_NODE layoutVersion=-32 * Contents of {dfs.name.dir} /previous.checkpoint/VERSION Tue May 08 08:03:35 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 Contents of {dfs.name.dir}/previous/VERSION #Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 * Checked to make sure i can list the contents of DFS * Stop DFS. h2.Rollback * Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -rollback * As per contents of "hadoop-hduser-namenode-ruff790.log", rollback seems to have succeeded. 012-05-08 08:37:41,799 INFO org.apache.hadoop.hdfs.server.common.Storage: Rolling back storage directory /usr/local/app/hadoop/tmp/dfs/name. new LV = -31; new CTime = 0 2012-05-08 08:37:41,801 INFO org.apache.hadoop.hdfs.server.common.Storage: Rollback of /usr/local/app/hadoop/tmp/dfs/name is complete. * Contents of {dfs.name.dir} /current/VERSION Tue May 08 08:37:42 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 Contents of {dfs.name.dir} /previous.checkpoint/VERSION #Tue May 08 08:08:57 EDT 2012 namespaceID=350250898 cTime=0 storageType=NAME_NODE layoutVersion=-31 Checked to make sure i can list the contents of DFS hduser@ruff790:/usr/local/hadoop-1.0.1/bin$ ./hadoop dfs -lsr / drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /test rw-r r - 1 hduser supergroup 27574849 2012-05-08 08:04 /test/rr_archive_1655003175_1660003165.gz rw-r r - 1 hduser supergroup 18065179 2012-05-08 08:04 /test/twonkyportal.log.2011-12-03.rr.gz drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user drwxr-xr-x - hduser supergroup 0 2012-05-08 08:04 /user/hduser However at this point i could not browse the file system from WebUI. Then i realized that data node is not really running. From the data node log file it seems like it had shut down during the rollback process. 012-05-08 08:37:57,953 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Unregistered data node: 127.0.0.1:50010 at org.apache.hadoop.hdfs.server.namenode.NameNode.verifyRequest(NameNode.java:1077) So i ran "stop-dfs.sh" to shut down namnode and secondarynamenode. Next "start-dfs.sh" fails to start the name node, as expected, with a version mismatch error. 2012-05-08 08:50:51,084 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: File system image contains an old layout version -31. An upgrade to version -32 is required. Please restart NameNode with -upgrade option. Shut everything down and go back to the old version. Version - 0.20.203.0 (Again) Now that i have rolled back the "1.0.1" upgrade i thought i could go back to version 0.20.203.0 So i go back and run /usr/local/hadoop/bin/start-dfs.sh and namenode does not start up. It fails with error message: 2012-05-08 08:57:09,261 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: Unexpected version of the file system log file: -32. Current version = -31.
          Hide
          Suresh Srinivas added a comment -

          Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -rollback

          When you upgrade from v1 to v2, you do it by running start-dfs.sh -upgrade on v2. After upgrade, to rollback, you have to do start-dfs.sh -rollback on * v1 * version of the software and not * v2 * as you have done here. That is the reason why you are seeing the problem.

          We should still log a bug on why rollback was allowed from 1.0.1, which rolled back to namenode state from 0.20.203.

          Show
          Suresh Srinivas added a comment - Started DFS by running /usr/local/hadoop-1.0.1/bin/start-dfs.sh -rollback When you upgrade from v1 to v2, you do it by running start-dfs.sh -upgrade on v2. After upgrade, to rollback, you have to do start-dfs.sh -rollback on * v1 * version of the software and not * v2 * as you have done here. That is the reason why you are seeing the problem. We should still log a bug on why rollback was allowed from 1.0.1, which rolled back to namenode state from 0.20.203.
          Hide
          Giri added a comment -
          • I thought since 1.0.1 performed the changes to DFS to upgrade, it should be the one to perform the rollback.
          • I confirmed that if i go back to 0.20.203.0 and run the rollback it works fine. Thanks for the clarification.
          Show
          Giri added a comment - I thought since 1.0.1 performed the changes to DFS to upgrade, it should be the one to perform the rollback. I confirmed that if i go back to 0.20.203.0 and run the rollback it works fine. Thanks for the clarification.
          Hide
          Suresh Srinivas added a comment -

          Rollback is not a problem.

          However, I created a related bug HDFS-3393 to track the issue where rollback was allowed on the newer release.

          Show
          Suresh Srinivas added a comment - Rollback is not a problem. However, I created a related bug HDFS-3393 to track the issue where rollback was allowed on the newer release.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Giri
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development