Uploaded image for project: 'Falcon'
  1. Falcon
  2. FALCON-2095

Hive Replication jobs are failing with UnknownHostException in NN HA

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • trunk
    • trunk
    • replication
    • None

    Description

      In NN HA, when I schedule a hive replication replication, it is failing with "java.net.UnknownHostException: mycluster1". In the error message primary is the source cluster Nameservice. Please see the complete stack trace.

      LogType:stderr
      Log Upload Time:Thu Jul 21 14:35:52 +0000 2016
      LogLength:7406
      Log Contents:
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/grid/0/hadoop/yarn/local/filecache/267/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/grid/0/hadoop/yarn/local/filecache/40/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      Error: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster1
              at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
              at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:429)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:207)
              at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2736)
              at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
              at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2770)
              at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2752)
              at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
              at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
              at org.apache.falcon.hive.util.EventUtils.initializeFS(EventUtils.java:145)
              at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:50)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
              at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
      Caused by: java.net.UnknownHostException: mycluster1
              ... 19 more
      
      

      PrimaryCluster:

      <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster" name="primaryCluster">
      <interfaces>
      <interface type="readonly" endpoint="webhdfs://mycluster1:20070" version="0.20.2"/>
      <interface type="write" endpoint="hdfs://mycluster1:8020" version="0.20.2"/>
      <interface type="execute" endpoint="mramasami-falcon-multi-ha-re-13.openstacklocal:8050" version="0.20.2"/>
      <interface type="workflow" endpoint="http://mramasami-falcon-multi-ha-re-12.openstacklocal:11000/oozie" version="3.1"/>
      <interface type="messaging" endpoint="tcp://mramasami-falcon-multi-ha-re-11.openstacklocal:61616?daemon=true" version="5.1.6"/>
      <interface type="registry" endpoint="thrift://mramasami-falcon-multi-ha-re-12.openstacklocal:9083" version="0.11.0"/>
      </interfaces>
      <locations>
      <location name="staging" path="/tmp/falcon-regression/staging"/>
      <location name="temp" path="/tmp"/>
      <location name="working" path="/tmp/falcon-regression/working"/>
      </locations>
      <ACL owner="hrt_qa" group="users" permission="0755"/>
      <properties>
      <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM"/>
      <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM"/>
      <property name="hive.metastore.sasl.enabled" value="true"/>
      <property name="hadoop.rpc.protection" value="authentication"/>
      <property name="hive.metastore.uris" value="thrift://mramasami-falcon-multi-ha-re-12.openstacklocal:9083"/>
      <property name="hive.server2.uri" value="hive2://mramasami-falcon-multi-ha-re-12.openstacklocal:10000"/>
      </properties>
      </cluster>
      

      BackupCluster:

      <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster" name="backupCluster">
      <interfaces>
      <interface type="readonly" endpoint="webhdfs://mycluster2:20070" version="0.20.2"/>
      <interface type="write" endpoint="hdfs://mycluster2:8020" version="0.20.2"/>
      <interface type="execute" endpoint="mramasami-falcon-multi-ha-re-5.openstacklocal:8050" version="0.20.2"/>
      <interface type="workflow" endpoint="http://mramasami-falcon-multi-ha-re-7.openstacklocal:11000/oozie" version="3.1"/>
      <interface type="messaging" endpoint="tcp://mramasami-falcon-multi-ha-re-2.openstacklocal:61616" version="5.1.6"/>
      <interface type="registry" endpoint="thrift://mramasami-falcon-multi-ha-re-7.openstacklocal:9083" version="0.11.0"/>
      </interfaces>
      <locations>
      <location name="staging" path="/tmp/falcon-regression/staging"/>
      <location name="temp" path="/tmp"/>
      <location name="working" path="/tmp/falcon-regression/working"/>
      </locations>
      <ACL owner="hrt_qa" group="users" permission="0755"/>
      <properties>
      <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM"/>
      <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM"/>
      <property name="hive.metastore.sasl.enabled" value="true"/>
      <property name="hadoop.rpc.protection" value="authentication"/>
      <property name="hive.metastore.uris" value="thrift://mramasami-falcon-multi-ha-re-7.openstacklocal:9083"/>
      <property name="hive.server2.uri" value="hive2://mramasami-falcon-multi-ha-re-7.openstacklocal:10000"/>
      </properties>
      </cluster>
      
      

      Hive Property File:

      jobClusterName = primaryCluster
      jobValidityStart=2016-05-09T06:25Z
      jobValidityEnd=2016-05-09T08:00Z
      jobFrequency = days(1)
      sourceCluster = primaryCluster
      sourceDatabases = default
      sourceHiveServer2Uri = hive2://mramasami-falcon-multi-ha-re-12.openstacklocal:10000
      targetCluster = backupCluster
      targetHiveServer2Uri = hive2://mramasami-falcon-multi-ha-re-7.openstacklocal:10000
      jobAclOwner = hrt_qa
      jobAclGroup = users
      jobAclPermission = *
      distcpMapBandwidth = 100
      extensionName = hive-mirroring
      sourceStagingPath = /tmp/falcon-regression/staging
      targetStagingPath = /tmp/falcon-regression/staging
      sourceTables = stock_data_hivedr
      sourceHive2KerberosPrincipal = hive/_HOST@EXAMPLE.COM
      targetHive2KerberosPrincipal = hive/_HOST@EXAMPLE.COM
      

      Attachments

        Activity

          People

            venkatnrangan Venkat Ranganathan
            murali.msse Murali Ramasami
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: