Stall happens down in SequenceFile in the first call to getPos inside readRecordLength. I tried the johano patch from
HADOOP-2172 that restores the positional cache but that didn't seem to be the issue here.
Here is data to support my assertion.
I wrote a little program to make a MapFile of 1M records. I then did 1M random reads from same file. Below are timings from a 0.15.0 and TRUNK as of this afternoon run.
For the below test using TRUNK r604352, I amended the test so it output a log message every 100k reads:
After 20 minutes it still hadn't printed out the 'read 100k messages' (I had to leave – will fill in final figures later)
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Priority||Major [ 3 ]||Blocker [ 1 ]|
|Assignee||stack [ stack ]|
|Fix Version/s||0.16.0 [ 12312740 ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|