Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4471

ASSERT Bag is not empty and/or is within a specified size range.



    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • internal-udfs, parser
    • None


      In PIG-3367 the ASSERT keyword was created.
      The current implementation allows for checking in each record in the bag if the value of a column is valid (and fail the job if it is not).

      We did several experiments and found that an empty bag (0 tuples) always succeeds. We need to ensure that a bag has been loaded correctly.

      Proposed enhancements:

      1. Allow the ASSERT statement to check if a bag is empty.
        A = LOAD 'data' AS (a0:int,a1:int,a2:int);
        ASSERT A NOT EMPTY, 'The A bag may not be empty';
      2. Allow the ASSERT statement to check if a bag has more than (or less than) a specific number of tuples.
        A = LOAD 'data' AS (a0:int,a1:int,a2:int);
        ASSERT SIZE A > 100, 'The A bag is not big enough';
        ASSERT SIZE A < 1000, 'The A bag is too big';
        • For me this may be an approximating implementation. i.e. if I say it must have at least 5M tuples then it may still return 'is valid' if it has 4.9M tuples.

      NOTE: The syntax I show is just to give an idea on what I want to do.




            Unassigned Unassigned
            nielsbasjes Niels Basjes
            0 Vote for this issue
            1 Start watching this issue