As mentioned by Ashutosh this is a reopen of https://issues.apache.org/jira/browse/PIG-766 because there is still a problem which causes that PIG scales only by memory.
For convenience here comes the last entry of the
Yes the same and some similar traces:
1. Are you getting the exact same stack trace as mentioned in the jira?
2. Which operations are you doing in your query - join, group-by, any other ?
3. What load/store func are you using to read and write data? PigStorage or your own ?
4. What is your data size and memory available to your tasks?
5. Do you have very large records in your dataset, like hundreds of MB for one record ?
It would be great if you can paste here the script from which you get this exception.
As we started to test the transformation (see below) the OutOfMemory-Error first occured at input-datasets of about 150MB.
Increasing the Memory for the child-vms by setting mapred.child.java.opts to 600m fixed it for a while.
When using larger input-dataset the problem reappears.
A CSV-File, ~14GB Dataset, ~100,000,000 Records per Dataset, ~145 Byte per Record
|Field||Original Value||New Value|
|Assignee||Thejas M Nair [ thejas ]|
|Fix Version/s||0.8.0 [ 12314562 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Duplicate [ 3 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|