It is currently impossible to get partial results in mapreduce mapper jobs.
When setting setAllowPartialResults(true) for scan jobs, they still fail with OOME on large rows.
The reason is that Scan field allowPartialResults is lost during job creation:
1. User creates a Job and sets a scan object via TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts a result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.
2. When the job starts - method TableInputFormat.setConfig retrieves a scan string from config and converts it to Scan object by calling TableMapReduceUtil.convertStringToScan - which results in a Scan object with a field allowPartialResults always set to false.
I have tried to experiment and modify a TableInputFormat method setConfig() by forcing all scans to allow partial results and after this all jobs succeeded with no more OOME and I also noticed that mappers began to get partial results (Result.isPartial()).
My use case is very simple - I just have large rows and expect a mapper to get them partially - to get same rowid several times with different key/value records.
This would allow me not to worry about implementing my own result partitioning solution, which i would encounter in case the big amount of result key values could be transparently returned for a single large row.
And from the other side - if a Scan object can return several records for the same rowid (partial results), perhaps the mapper should do the same.