Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25611

OOM when running MERGE query on wide transactional table with many buckets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      Running a MERGE statement over a wide transactional/ACID table with many buckets leads to OutOfMemoryError during the execution of the query.

      A step-by-step reproducer is attached to the case ( merge_wide_acid_bucketed_table.q wide_table_100_char_cols.csv ) but the main idea is outlined below.

      CREATE TABLE wide_table_txt (
      w_id_col    int,
      w_char_col0 char(20),
      ...
      w_char_col99 char(20)) STORED AS ORC TBLPROPERTIES ('transactional'='true')
      -- Load data into the table in a way that it gets bucketed
      
      CREATE TABLE simple_table_txt (id int, name char(20)) STORED AS TEXTFILE;
      -- Load data into simple_table_txt overlapping with the data in wide_table_txt
      
      MERGE INTO wide_table_orc target USING simple_table_txt source ON (target.w_id_col = source.id)
      WHEN MATCHED THEN UPDATE SET w_char_col0 = source.name
      WHEN NOT MATCHED THEN INSERT (w_id_col, w_char_col1) VALUES (source.id, 'Actual value does not matter');
      

      A sample stacktrace showing the memory pressure is given below:

      java.lang.OutOfMemoryError: GC overhead limit exceeded
              at org.apache.orc.OrcProto$RowIndexEntry$Builder.create(OrcProto.java:8962) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.OrcProto$RowIndexEntry$Builder.access$12100(OrcProto.java:8931) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.OrcProto$RowIndexEntry.newBuilder(OrcProto.java:8915) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriterBase.<init>(TreeWriterBase.java:98) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StringBaseTreeWriter.<init>(StringBaseTreeWriter.java:66) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.CharTreeWriter.<init>(CharTreeWriter.java:40) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:163) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StructTreeWriter.<init>(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StructTreeWriter.<init>(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:216) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:95) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:396) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.initWriter(OrcRecordUpdater.java:615) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSimpleEvent(OrcRecordUpdater.java:442) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:495) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:519) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1200) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:399) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:256) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:277) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) ~[tez-runtime-internals-0.10.1.jar:0.10.1]
      2021-10-12T10:31:57,723  INFO [TezTR-775147_1_3_4_0_0] task.TezTaskRunner2: Received notification of a  failure  which will cause the task to die
      java.lang.OutOfMemoryError: GC overhead limit exceeded
              at org.apache.orc.OrcProto$RowIndexEntry$Builder.create(OrcProto.java:8962) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.OrcProto$RowIndexEntry$Builder.access$12100(OrcProto.java:8931) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.OrcProto$RowIndexEntry.newBuilder(OrcProto.java:8915) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriterBase.<init>(TreeWriterBase.java:98) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StringBaseTreeWriter.<init>(StringBaseTreeWriter.java:66) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.CharTreeWriter.<init>(CharTreeWriter.java:40) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:163) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StructTreeWriter.<init>(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.StructTreeWriter.<init>(StructTreeWriter.java:41) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.createSubtree(TreeWriter.java:181) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.writer.TreeWriter$Factory.create(TreeWriter.java:133) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:216) ~[orc-core-1.6.9.jar:1.6.9]
              at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:95) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:396) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.initWriter(OrcRecordUpdater.java:615) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSimpleEvent(OrcRecordUpdater.java:442) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addSplitUpdateEvent(OrcRecordUpdater.java:495) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.update(OrcRecordUpdater.java:519) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1200) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:497) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:399) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:256) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:311) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
      

      Attachments

        1. wide_table_100_char_cols.csv
          2.01 MB
          Stamatis Zampetakis
        2. merge_wide_acid_bucketed_table.q
          4 kB
          Stamatis Zampetakis
        3. merge_query_plan.txt
          114 kB
          Stamatis Zampetakis

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: