Pig
  1. Pig
  2. PIG-1421

[Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Because Pig call setLocation() on LoadFunc API on both frontent and backend, and Zebra makes name node access in its implementation, name node becomes irresponsive because of the number of name node calls.

      1. PIG-1421.patch
        10 kB
        Xuefu Zhang

        Activity

        Xuefu Zhang created issue -
        Xuefu Zhang made changes -
        Field Original Value New Value
        Attachment jira1421.patch [ 12444708 ]
        Hide
        Xuefu Zhang added a comment -

        Fix the issue by making sure that when setLocation() is called, no name node access is conducted.

        Show
        Xuefu Zhang added a comment - Fix the issue by making sure that when setLocation() is called, no name node access is conducted.
        Xuefu Zhang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hadoop Flags [Reviewed]
        Xuefu Zhang made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Xuefu Zhang made changes -
        Attachment jira1421.patch [ 12444708 ]
        Hide
        Xuefu Zhang added a comment -

        Fix includes:

        1. Make setLocation() light weight and make sure no name node access. Note that setLocation() was a new API on LoadFunc introduced in 0.7. UDFContext is used for some cases.
        2. Remove code for setting properties (INPUT_FE and INPUT_DELETED_CGS) in TableInputFormat because it's ineffective.
        3. Move the logic in #2 to TableInputFormat.setInputPaths() and make sure that it's only done once (Because setInputPaths() are called multiple times in PIG code path).
        4. Remove unnecessary list status calls in Zebra IO layer.
        5. Remove the code that makes name node calls for sorted table in Pig code path.
        6. Make sure that clob check is only done on the front end.

        Show
        Xuefu Zhang added a comment - Fix includes: 1. Make setLocation() light weight and make sure no name node access. Note that setLocation() was a new API on LoadFunc introduced in 0.7. UDFContext is used for some cases. 2. Remove code for setting properties (INPUT_FE and INPUT_DELETED_CGS) in TableInputFormat because it's ineffective. 3. Move the logic in #2 to TableInputFormat.setInputPaths() and make sure that it's only done once (Because setInputPaths() are called multiple times in PIG code path). 4. Remove unnecessary list status calls in Zebra IO layer. 5. Remove the code that makes name node calls for sorted table in Pig code path. 6. Make sure that clob check is only done on the front end.
        Xuefu Zhang made changes -
        Attachment PIG-1421.patch [ 12444727 ]
        Xuefu Zhang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Yan Zhou added a comment -

        Local Hudson results are as follows:

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no tests are needed for this patch.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        No test case is added as the problem is related to excessive name node calls on a real cluster. We manually check the fix so that name node works without any hiccups.

        Show
        Yan Zhou added a comment - Local Hudson results are as follows: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. No test case is added as the problem is related to excessive name node calls on a real cluster. We manually check the fix so that name node works without any hiccups.
        Hide
        Yan Zhou added a comment -

        +1

        Show
        Yan Zhou added a comment - +1
        Hide
        Xuefu Zhang added a comment -

        Original problem happens only in stressed scenario. It's difficult to provide a unit test case to cover this. With this, hudson result can be ignored.

        Show
        Xuefu Zhang added a comment - Original problem happens only in stressed scenario. It's difficult to provide a unit test case to cover this. With this, hudson result can be ignored.
        Hide
        Yan Zhou added a comment -

        committed to the trunk and the 0.7 branch

        Show
        Yan Zhou added a comment - committed to the trunk and the 0.7 branch
        Yan Zhou made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Xuefu Zhang
            Reporter:
            Xuefu Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development