Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20077

hcat command should follow same pattern as hive cli for getting HBase jars

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0, 2.3.2
    • 4.0.0-alpha-1
    • HCatalog
    • None

    Description

      Currently the hcat command adds HBase jars to the classpath by using find to walk the directories under $HBASE_HOME/lib.

      # Look for HBase in a BigTop-compatible way. Avoid thrift version
      # conflict with modern versions of HBase.
      HBASE_HOME=${HBASE_HOME:-"/usr/lib/hbase"}
      HBASE_CONF_DIR=${HBASE_CONF_DIR:-"${HBASE_HOME}/conf"}
      if [ -d ${HBASE_HOME} ] ; then
         for jar in $(find $HBASE_HOME -name '*.jar' -not -name '*thrift*'); do
            HBASE_CLASSPATH=$HBASE_CLASSPATH:${jar}
         done
         export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CLASSPATH}"
      fi
      if [ -d $HBASE_CONF_DIR ] ; then
          HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CONF_DIR}"
      fi
      

      This is incorrect as that path contains jars for a mixture of purposes; hbase client jars, hbase server jars, and hbase shell specific jars. The inclusion of unneeded jars is mostly innocuous until the upcoming HBase 2.1.0 release. That release will have HBASE-20615 and HBASE-19735, which will mean most client facing installations will have a number of shaded client artifacts present.

      With those changes in place, the current implementation will include in the hcat runtime a mix of shaded and non-shaded hbase artifacts that include some Hadoop classes rewritten to use a shaded version of protobuf. When these mix with other Hadoop classes in the classpath that have not been rewritten hcat will fail with errors that look like this:

      Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google.protobuf.Message
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:225)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
              at com.sun.proxy.$Proxy28.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:875)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
              at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
              at com.sun.proxy.$Proxy29.getFileInfo(Unknown Source)
              at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1643)
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
              at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1492)
              at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1507)
              at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1668)
              at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:686)
              at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:625)
              at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:557)
              at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:524)
              at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:149)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
      

      To fix this issue in a way that transparently handles HBase versions the hcat command should take the same approach as the hive cli and ask HBase for the appropriate set of jars:

        # HBase detection. Need bin/hbase and a conf dir for building classpath entries.
        # Start with BigTop defaults for HBASE_HOME and HBASE_CONF_DIR.
        HBASE_HOME=${HBASE_HOME:-"/usr/lib/hbase"}
        HBASE_CONF_DIR=${HBASE_CONF_DIR:-"/etc/hbase/conf"}
        if [[ ! -d $HBASE_CONF_DIR ]] ; then
          # not explicitly set, nor in BigTop location. Try looking in HBASE_HOME.
          HBASE_CONF_DIR="$HBASE_HOME/conf"
        fi
      
        # perhaps we've located the HBase config. if so, include it on classpath.
        if [[ -d $HBASE_CONF_DIR ]] ; then
          export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${HBASE_CONF_DIR}"
        fi
      
        # look for the hbase script. First check HBASE_HOME and then ask PATH.
        if [[ -e $HBASE_HOME/bin/hbase ]] ; then
          HBASE_BIN="$HBASE_HOME/bin/hbase"
        fi
        HBASE_BIN=${HBASE_BIN:-"$(which hbase)"}
      
        # perhaps we've located HBase. If so, include its details on the classpath
        if [[ -n $HBASE_BIN ]] ; then
          # exclude ZK, PB, and Guava (See HIVE-2055)
          # depends on HBASE-8438 (hbase-0.94.14+, hbase-0.96.1+) for `hbase mapredcp` command
          for x in $($HBASE_BIN mapredcp 2>&2 | tr ':' '\n') ; do
            if [[ $x == *zookeeper* || $x == *protobuf-java* || $x == *guava* ]] ; then
              continue
            fi
            # TODO: should these should be added to AUX_PARAM as well?
            export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:${x}"
          done
        fi
      

      Attachments

        1. HIVE-20077.1.patch
          3 kB
          Sean Busbey
        2. HIVE-20077.0.patch
          3 kB
          Sean Busbey

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            busbey Sean Busbey Assign to me
            busbey Sean Busbey
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment