Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.2.5
-
None
Description
Bulk Import with -Dimport.bulk.output=/HFILES -Dimport.bulk.hasLargeResult=true always results in
Error: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.hbase.util.MapReduceExtendedCell, received org.apache.hadoop.hbase.IndividualBytesFieldCell at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.hbase.mapreduce.Import$CellSortImporter.map(Import.java:423) at org.apache.hadoop.hbase.mapreduce.Import$CellSortImporter.map(Import.java:394) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
The Problem is that in org.apache.hadoop.hbase.mapreduce.Import.CellSortImporter#map the Cell coming from value.rawCells() (which is of type org.apache.hadoop.hbase.IndividualBytesFieldCell) is directly written to the context without wrapping it into MapReduceExtendedCell (like it happens at CellImporter#map).
IMHO line 423 must look like this:
context.write(new CellWritableComparable(ret), new MapReduceExtendedCell(ret));
Furthermore it seems to me that this is also wrong in all subsequent versions