Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.9.0
-
None
Description
baos = new ByteArrayOutputStream(); dos = new DataOutputStream(baos); keySerializer.open(dos); valSerializer.open(dos);
This is a known performance issue as documented in HADOOP-10694
Both ByteArrayOutputStream::write() and DataOutputStream::write() have lock prefix calls in them, because they are object synchronized methods.
Recommended solution is to replicate the Hive NonSync impls similar to HADOOP-10694
TezTaskRunner [RUNNABLE] *** java.io.DataOutputStream.write(byte[], int, int) DataOutputStream.java:107 org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(Writable) TezBytesWritableSerialization.java:123 org.apache.tez.runtime.library.common.serializer.TezBytesWritableSerialization$TezBytesWritableSerializer.serialize(Object) TezBytesWritableSerialization.java:110 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(Object, Object, int) UnorderedPartitionedKVWriter.java:295 org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(Object, Object) UnorderedPartitionedKVWriter.java:257 org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(Object, Object) TezProcessor.java:232 org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkCommonOperator.collect(BytesWritable, Writable) VectorReduceSinkCommonOperator.java:432 org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkCommonOperator.process(Object, int) VectorReduceSinkCommonOperator.java:397 org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) Operator.java:837 org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(Object, int) VectorSelectOperator.java:144 org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) Operator.java:837 org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(Object, int) VectorFilterOperator.java:121 org.apache.hadoop.hive.ql.exec.Operator.forward(Object, ObjectInspector) Operator.java:837 org.apache.hadoop.hive.ql.exec.TableScanOperator.process(Object, int) TableScanOperator.java:130 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) VectorMapOperator.java:796 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) MapRecordSource.java:86 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord() MapRecordSource.java:70 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run() MapRecordProcessor.java:361 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(Map, Map) TezProcessor.java:172 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(Map, Map) TezProcessor.java:160 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() LogicalIOProcessorRuntimeTask.java:370 org.apache.tez.runtime.task.TaskRunner2Callable$1.run() TaskRunner2Callable.java:73 org.apache.tez.runtime.task.TaskRunner2Callable$1.run() TaskRunner2Callable.java:61 java.security.AccessController.doPrivileged(PrivilegedExceptionAction, AccessControlContext) AccessController.java (native) javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) Subject.java:422 org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) UserGroupInformation.java:1657 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() TaskRunner2Callable.java:61 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() TaskRunner2Callable.java:37 org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36 java.util.concurrent.FutureTask.run() FutureTask.java:266 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1142 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617 java.lang.Thread.run() Thread.java:745