Pig
  1. Pig
  2. PIG-1207

[zebra] Data sanity check should be performed at the end of writing instead of later at query time

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently the equity check of number of rows across different column groups are performed by the query. And the error info is sketchy and only emits a "Column groups are not evenly distributed", or worse, throws an IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd and BasicTable.getKey, which are called just before BasicTable.getValue, only checks the first column group in projection and any discrepancy of the number of rows per file cross multiple column groups in projection could have BasicTable.atEnd return false and BasicTable.getKey return a key normally but another column group already exaust its current file and the call to its CGScanner.getCGValue throw the exception.

      This check should also be performed at the end of writing and the error info should be more informational.

      1. PIG-1207.patch
        3 kB
        Yan Zhou
      2. PIG-1207.patch
        3 kB
        Yan Zhou

        Activity

        Hide
        Yan Zhou added a comment -

        The same patch based on current trunk

        Show
        Yan Zhou added a comment - The same patch based on current trunk
        Hide
        Gaurav Jain added a comment -

        Looks good

        +1

        Show
        Gaurav Jain added a comment - Looks good +1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12438300/PIG-1207.patch
        against trunk revision 921185.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438300/PIG-1207.patch against trunk revision 921185. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/238/console This message is automatically generated.
        Hide
        Yan Zhou added a comment -

        This is sanity check at end of writing. Existing writing tests already have a good coverage and no new tests need to be introduced.

        Show
        Yan Zhou added a comment - This is sanity check at end of writing. Existing writing tests already have a good coverage and no new tests need to be introduced.
        Hide
        Yan Zhou added a comment -

        Patch committed to the trunk.

        Show
        Yan Zhou added a comment - Patch committed to the trunk.

          People

          • Assignee:
            Yan Zhou
            Reporter:
            Yan Zhou
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development