Hive
  1. Hive
  2. HIVE-4515

"select count(*) from table" query on hive-0.10.0, hbase-0.94.7 integration throws exceptions

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Invalid
    • Affects Version/s: 0.10.0, 0.11.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None
    • Environment:

      hive-0.10.0, hive-0.11.0
      hbase-0.94.7, hbase-0.94.6.1
      zookeeper-3.4.3
      hadoop-1.0.4

      centos-5.7

      Description

      After integration hive-0.10.0+hbase-0.94.7, these commands could be executed successfully:

      create table
      insert overwrite table
      select * from table
      
      However, when execute "select count(*) from table", throws exception:
      hive> select count(*) from test;         
      Total MapReduce jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks determined at compile time: 1
      In order to change the average load for a reducer (in bytes):
        set hive.exec.reducers.bytes.per.reducer=<number>
      In order to limit the maximum number of reducers:
        set hive.exec.reducers.max=<number>
      In order to set a constant number of reducers:
        set mapred.reduce.tasks=<number>
      Starting Job = job_201305061042_0028, Tracking URL = http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
      Kill Command = /opt/modules/hadoop/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305061042_0028
      Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
      2013-05-07 18:41:42,649 Stage-1 map = 0%,  reduce = 0%
      2013-05-07 18:42:14,789 Stage-1 map = 100%,  reduce = 100%
      Ended Job = job_201305061042_0028 with errors
      Error during job, obtaining debugging information...
      Job Tracking URL: http://master0:50030/jobdetails.jsp?jobid=job_201305061042_0028
      Examining task ID: task_201305061042_0028_m_000002 (and more) from job job_201305061042_0028
      
      Task with the most failures(4): 
      -----
      Task ID:
        task_201305061042_0028_m_000000
      
      URL:
        http://master0:50030/taskdetails.jsp?jobid=job_201305061042_0028&tipid=task_201305061042_0028_m_000000
      -----
      Diagnostic Messages for this Task:
      java.lang.NegativeArraySizeException: -1
      	at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
      	at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
      	at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
      	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      
      
      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
      MapReduce Jobs Launched: 
      Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
      Total MapReduce CPU Time Spent: 0 msec
      
      ==================================================================
      The log of tasktracker:
      
      stderr logs
      
      13/05/07 18:43:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
      13/05/07 18:43:20 INFO mapred.TaskRunner: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/distcache/1073284782955556390_-1298160740_2123690974/master0/tmp/hive-hadoop/hive_2013-05-07_18-41-30_290_832140779606816147/-mr-10003/fd22448b-e923-498c-bc00-2164ca68447d <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/HIVE_PLANfd22448b-e923-498c-bc00-2164ca68447d
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javolution <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/javolution
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/org <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/org
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/hive-exec-log4j.properties <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/hive-exec-log4j.properties
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/META-INF <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/META-INF
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/.job.jar.crc <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/.job.jar.crc
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/job.jar <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/job.jar
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javax <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/javax
      13/05/07 18:43:20 INFO filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/jars/javaewah <- /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201305061042_0028/attempt_201305061042_0028_m_000000_0/work/javaewah
      13/05/07 18:43:20 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
      13/05/07 18:43:21 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: MapTask metrics system started
      13/05/07 18:43:21 INFO impl.MetricsSourceAdapter: MBean for source ugi registered.
      13/05/07 18:43:21 WARN impl.MetricsSystemImpl: Source name ugi already exists!
      13/05/07 18:43:21 INFO impl.MetricsSourceAdapter: MBean for source jvm registered.
      13/05/07 18:43:21 INFO util.ProcessTree: setsid exited with exit code 0
      13/05/07 18:43:21 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2d7aece8
      13/05/07 18:43:21 INFO mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
      13/05/07 18:43:21 INFO nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
      13/05/07 18:43:21 INFO nativeio.NativeIO: Got UserName hadoop for UID 1001 from the native implementation
      13/05/07 18:43:21 WARN mapred.Child: Error running child
      java.lang.NegativeArraySizeException: -1
      	at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:148)
      	at org.apache.hadoop.hbase.mapreduce.TableSplit.readFields(TableSplit.java:133)
      	at org.apache.hadoop.hive.hbase.HBaseSplit.readFields(HBaseSplit.java:53)
      	at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:150)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
      	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:396)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      13/05/07 18:43:21 INFO mapred.Task: Runnning cleanup for the task
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: Stopping MapTask metrics system...
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: Stopping metrics source ugi(org.apache.hadoop.security.UgiInstrumentation)
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: Stopping metrics source jvm(org.apache.hadoop.metrics2.source.JvmMetricsSource)
      13/05/07 18:43:21 INFO impl.MetricsSystemImpl: MapTask metrics system stopped.
      

        Issue Links

          Activity

          Hide
          Yash Sharma added a comment -

          Is there any time line decided for this issue, or any other workaround for it.
          I am stuck with it for a while.

          Show
          Yash Sharma added a comment - Is there any time line decided for this issue, or any other workaround for it. I am stuck with it for a while.
          Hide
          sridhar added a comment -

          Hi,

          We are using Hbase Avro tables and would like to access to data using Hive. So i modified the code in HBaseStorageHandler, LazyHBaseRow, LazyHBaseCellMap to provide support for Avro schema parsing. All works perfectly and able to see the data with basic query like; select * .... But when i query the hive table with any filter or select only some columns i see the same error that was reported above.
          Also the exact same error is noticed when we access the data in hive using the original HBaseStorage Handler. So do not think this is something introduced by my changes to code.
          So wanted check if there is any work around or fix available for this. We are using CDH 4.4.
          Any suggestions?

          Show
          sridhar added a comment - Hi, We are using Hbase Avro tables and would like to access to data using Hive. So i modified the code in HBaseStorageHandler, LazyHBaseRow, LazyHBaseCellMap to provide support for Avro schema parsing. All works perfectly and able to see the data with basic query like; select * .... But when i query the hive table with any filter or select only some columns i see the same error that was reported above. Also the exact same error is noticed when we access the data in hive using the original HBaseStorage Handler. So do not think this is something introduced by my changes to code. So wanted check if there is any work around or fix available for this. We are using CDH 4.4. Any suggestions?
          Hide
          Swarnim Kulkarni added a comment -

          I think this is more specific to CDH hive release, probably due to some incompatible dependencies. I wasn't able to reproduce this with apache stack.

          As a side note for your avro support of HBase, you might as well look into the patch on HIVE-6147. It attempts to solve the sam problem.

          Show
          Swarnim Kulkarni added a comment - I think this is more specific to CDH hive release, probably due to some incompatible dependencies. I wasn't able to reproduce this with apache stack. As a side note for your avro support of HBase, you might as well look into the patch on HIVE-6147 . It attempts to solve the sam problem.
          Hide
          sridhar added a comment -

          Thank you Swarnim. I will follow up with the Cloudera and see if we have any resolution.

          Show
          sridhar added a comment - Thank you Swarnim. I will follow up with the Cloudera and see if we have any resolution.
          Hide
          Ashutosh Chauhan added a comment -

          Resolving this as invalid, per Swarnim Kulkarni previous comment. Feel free to reopen if you are able to repro this on trunk.

          Show
          Ashutosh Chauhan added a comment - Resolving this as invalid, per Swarnim Kulkarni previous comment. Feel free to reopen if you are able to repro this on trunk.

            People

            • Assignee:
              Swarnim Kulkarni
              Reporter:
              Yanhui Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development