Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1207

[zebra] Data sanity check should be performed at the end of writing instead of later at query time

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Description

      Currently the equity check of number of rows across different column groups are performed by the query. And the error info is sketchy and only emits a "Column groups are not evenly distributed", or worse, throws an IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd and BasicTable.getKey, which are called just before BasicTable.getValue, only checks the first column group in projection and any discrepancy of the number of rows per file cross multiple column groups in projection could have BasicTable.atEnd return false and BasicTable.getKey return a key normally but another column group already exaust its current file and the call to its CGScanner.getCGValue throw the exception.

      This check should also be performed at the end of writing and the error info should be more informational.

      Attachments

        1. PIG-1207.patch
          3 kB
          Yan Zhou
        2. PIG-1207.patch
          3 kB
          Yan Zhou

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yanz Yan Zhou
            yanz Yan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment