Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26828

Fix OOM for hybridgrace_hashjoin_2.q

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0.0-alpha-2
    • 4.0.0
    • Test, Tez
    • None

    Description

      hybridgrace_hashjoin_2.q test was disabled because it was failing with OOM transiently (from flaky_test output, in case it disappears):

      < Status: Failed
      < Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
      < #### A masked pattern was here ####
      < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded
      < Serialization trace:
      < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
      < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
      < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
      < #### A masked pattern was here ####
      < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
      < #### A masked pattern was here ####
      < ]
      < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
      < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
      < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
      < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
      < [Masked Vertex killed due to OTHER_VERTEX_FAILURE]
      < DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
      < FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml
      < #### A masked pattern was here ####
      < Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.OutOfMemoryError: GC overhead limit exceeded
      < Serialization trace:
      < childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
      < childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
      < aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
      < #### A masked pattern was here ####
      < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
      < #### A masked pattern was here ####
      < ][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5
      < PREHOOK: query: SELECT COUNT( * )
      < FROM src1 x
      < JOIN srcpart z1 ON (x.key = z1.key)
      < JOIN src y1 ON (x.key = y1.key)
      < JOIN srcpart z2 ON (x.value = z2.value)
      < JOIN src y2 ON (x.value = y2.value)
      < WHERE z1.key < 'zzzzzzzz' AND z2.key < 'zzzzzzzzzz'
      < AND y1.value < 'zzzzzzzz' AND y2.value < 'zzzzzzzzzz'
      < PREHOOK: type: QUERY
      < PREHOOK: Input: default@src
      < PREHOOK: Input: default@src1
      < PREHOOK: Input: default@srcpart
      < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11
      < PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12
      < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=11
      < PREHOOK: Input: default@srcpart@ds=2008-04-09/hr=12
      < PREHOOK: Output: hdfs://### HDFS PATH ###

      The aim of this ticket is to investigate the issue, fix it and re-enable the test.

      The problem seems to lie in the deserialization of the computed tez dag plan.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              asolimando Alessandro Solimando
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: