[PIG-3669] Wrong Usage of Scalar can bring down Namenode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

People often get confused and end up using . after a join instead of :: (~~PIG-2134~~) to reference fields of the join. Pig takes it to be a scalar and tries to read the whole join data assuming there is only one record and then fails with "scalar has more than one row in the output". ReadScalars uses InterStorage to read the scalar data. It uses InterInputFormat which extends FileInputFormat and for getSplits(), there is a listStatus on all the files in the dir to construct the splits and then InterRecordReader is used on the splits to read the data. When the data is really huge with lot of files, this can cause the namenode to go out of memory and crash and especially with the recent optimization change in hadoop 0.23/2.x that uses listLocatedStatus instead of listStatus + getBlockLocations in FileInputFormat to reduce number of calls to Namenode.

In the particular case we encountered, the join output had 6.5K files. On the job that did ReadScalars, it had 8K+ tasks (with speculative execution) and all of them doing listStatus for the 6.5K files caused NN queue to fill up with huge responses (block locations makes the response even bigger). It also saturated the network, causing responses to build up in NN without being sent finally leading it to crash with OOM.

Attachments

Issue Links

is related to

PIG-2629 Wrong Usage of Scalar which is null causes high namenode operation

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Rohini Palaniswamy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/Jan/14 19:46

Updated:: 15/Jan/14 19:54