Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10598

Vectorization borks when column is added to table.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Vectorization
    • None

    Description

      Consider the following table definition:

      create table foobar ( foo string, bar string ) partitioned by (dt string) stored as orc;
      alter table foobar add partition( dt='20150101' ) ;
      

      Say the partition has the following data:

      1	one	20150101
      2	two	20150101
      3	three	20150101
      

      If a new column is added to the table-schema (and the partition continues to have the old schema), vectorized read from the old partitions fail thus:

      alter table foobar add columns( goo string );
      select count(1) from foobar;
      
      stacktrace
      java.lang.Exception: java.lang.RuntimeException: Error creating a batch
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
      Caused by: java.lang.RuntimeException: Error creating a batch
      	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:114)
      	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:52)
      	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:84)
      	at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.createValue(CombineHiveRecordReader.java:42)
      	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.createValue(HadoopShimsSecure.java:156)
      	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.createValue(MapTask.java:180)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: No type entry found for column 3 in map {4=Long}
      	at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:632)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:343)
      	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:112)
      	... 14 more
      

      Attachments

        1. HIVE-10598.01.patch
          103 kB
          Matt McCline
        2. HIVE-10598.02.patch
          314 kB
          Matt McCline
        3. HIVE-10598.03.patch
          53 kB
          Matt McCline
        4. HIVE-10598.04.patch
          55 kB
          Matt McCline
        5. HIVE-10598.05.patch
          56 kB
          Matt McCline
        6. HIVE-10598.06.patch
          292 kB
          Matt McCline

        Activity

          People

            mmccline Matt McCline
            mithun Mithun Radhakrishnan
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: