Uploaded image for project: 'Apache HAWQ (Retired)'
  1. Apache HAWQ (Retired)
  2. HAWQ-1504

Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/

    XMLWordPrintableJSON

Details

    Description

      After setting up an environment using instructions provided under incubator-hawq/contrib/hawq-docker/, while trying to restart docker containers namenode hangs and tries a namenode -format during every start.

      Steps to reproduce this issue -

      • Navigate to incubator-hawq/contrib/hawq-docker
      • make stop
      • make start
      • docker exec -it centos7-namenode bash
      • ps -ef | grep java

      You can see namenode -format running.

      [gpadmin@centos7-namenode data]$ ps -ef | grep java
      hdfs        11    10  1 00:56 ?        00:00:06 /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.namenode.NameNode -format
      

      Since namenode -format runs in interactive mode and at this stage it is waiting for a (Yes/No) response, the namenode will remain stuck forever. This makes hdfs unavailable.

      Root cause of the problem -

      In the dockerfiles present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker directive ENTRYPOINT executes entrypoin.sh during startup.

      The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the following -

      if [ ! -d /tmp/hdfs/name/current ]; then
          su -l hdfs -c "hdfs namenode -format"
        fi
      

      My assumption is it looks for fsimage and edit logs. If they are not present the script assumes that this a first time initialization and namenode format should be done. However, path /tmp/hdfs/name/current does not exist on namenode.

      From namenode logs it is clear that fsimage and edit logs are written under /tmp/hadoop-hdfs/dfs/name/current.

      2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: No edit log streams selected.
      2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Planning to load image: FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000, cpktTxId=0000000000000000000)
      2017-07-18 00:55:20,995 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes.
      2017-07-18 00:55:21,064 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds.
      2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000
      2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
      2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1
      

      Thus wrong path in incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh causes namenode to hang during each restart of the containers making hdfs unavailable.

      Attachments

        Issue Links

          Activity

            People

              rlei Radar Da Lei
              outofmemory Shubham Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: