Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14857

select count(*) fails with tez over cassandra

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Hello,

      We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) and we have tez as our engine by default.

      I have a table in cassandra, and I use the driver hive-cassandra to do selects over it. This is the table

      CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
      

      And I have only 3 partitions

      campaign_id sid name ts
      45sqdqs sqsd dea NULL
      QSHJKA sqsd dea NULL
      45s-qs sqsd dea NULL

      At the moment to do a "select count ( * )" over table using hive like that (tez is our engine by default)

       hive -e "select count(*) from table1;" 

      I got this error:

      Status: Failed
      Vertex failed, vertexName=Map 1, 
      vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
      taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
      failed, info=[Error: Failure while running 
      task:java.lang.RuntimeException: 
      org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
      actual length: 9223372036854775711
         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
         at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 actual length: 9223372036854775711
         at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
         at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
         at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
         at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
         at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
         at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
         at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
         at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
         at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
         at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
         at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
         ... 14 more
      

      So far I understand, in readfields we are getting more data that we are expecting. But considering the size of the table( only 3 records), I dont think the data is a problem.

      Another thing to add is that if I do a "select *", it works perfectly fine with tez. Using the engine mp, select count ( * ) and select * work fine as well.

      We are using hortonworks version 2.3.2

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              carlo_4002 jean carlo rivera ura
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: