Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.1
-
None
-
None
-
None
Description
We are using DBInputFormat in some of our mapreduce jobs and we found issues in production where other applications writing to the database would cause the hadoop job to fail with a deadlock.
On inspection of the code, we found that the transaction isolation level is set to SERIALIZABLE. This seems like it's too aggressive for most hadoop use cases. At the very minimum, this seems like a more relaxed mode should be allowed through configuration.
Reference stack trace (or part of it, I don't have the full map task logs at hand anymore) when this deadlock occurs:
java.io.IOException: Deadlock found when trying to get lock; try restarting transaction at org.apache.hadoop.mapreduce.lib.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139) at opower.kiji.mapreduce.DelegateKijiMapper.doRun(DelegateKijiMapper.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at