Pig
  1. Pig
  2. PIG-420

Limit on nothing functionality

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Pig 2.0 implements the limit feature but as a standalone statement.

      Limit is very useful in debug mode where we could run queries on smaller amount of data (faster and on fewer nodes) to iron out issues but in the production mode we would like to run through all the data. It would be good to have a easy "switch" between debug and prod mode using the limit statement without having to change the underlying code templates. Given that LIMIT is a separate standalone statement it gets hard to parametrize the code.

      For instance a query template might look like,
      A = LOAD '...';
      B = LIMIT A $N;
      C = FOREACH B ....

      In debug mode, we would like to set the variable $N to 100 but in prod mode we would like to set it to a 'special value' that would not apply LIMIT and letting us run it on all the data.

        Activity

        Anand Murugappan created issue -
        Anand Murugappan made changes -
        Field Original Value New Value
        Description Pig 2.0 implements the limit feature but as a standalone statement.

        Limit is very useful in debug mode where we could run queries on smaller amount of data (faster and on fewer nodes) to iron out issues but in the production mode we would like to run through all the data. It would be good to have a easy "switch" between debug and prod mode using the limit statement without having to change the underlying code templates. Given that LIMIT is a separate standalone statement it gets hardware to parametrize the code.

        For instance a query template might look like,
        A = LOAD '...';
        B = LIMIT A $N;
        C = FOREACH B ....

        In debug mode, we would like to set the variable $N to 100 but in prod mode we would like to set it to a 'special value' that would not apply LIMIT and letting us run it on all the data.
        Pig 2.0 implements the limit feature but as a standalone statement.

        Limit is very useful in debug mode where we could run queries on smaller amount of data (faster and on fewer nodes) to iron out issues but in the production mode we would like to run through all the data. It would be good to have a easy "switch" between debug and prod mode using the limit statement without having to change the underlying code templates. Given that LIMIT is a separate standalone statement it gets hard to parametrize the code.

        For instance a query template might look like,
        A = LOAD '...';
        B = LIMIT A $N;
        C = FOREACH B ....

        In debug mode, we would like to set the variable $N to 100 but in prod mode we would like to set it to a 'special value' that would not apply LIMIT and letting us run it on all the data.

          People

          • Assignee:
            Unassigned
            Reporter:
            Anand Murugappan
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development