Pig
  1. Pig
  2. PIG-1237

Piggybank MutliStorage - specify field to write in output

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I've made a modification to the piggy bank MutliStorage class that allows to optionally specify the index of the field in each tuple to write to output.
      This feature allows to have records with metadata like seqno, time of upload etc, and then to combine files from these records into one but without the metadata.
      e.g.
      1: date type seq1 data
      2: date type seq2 data

      then write output grouped by type and ordered by sequence:
      data
      data

      1. PIG-1237.patch
        3 kB
        Gerrit Jansen van Vuuren

        Activity

        Hide
        Gerrit Jansen van Vuuren added a comment -

        I forgot to mention that I've also added some logic to replace any '/' characters with '_' these might be in the field used to select the output file names.
        The reason for doing this is that we had this field as a file name of the original source from where our logs came from. Without this change the files were written
        to directories like /usr/share/tomcat5/logs/mylog.log instead of [outputdir]/mylog.log or [outputdir]/usr_share_tomcat4_logs_mylog.log

        Show
        Gerrit Jansen van Vuuren added a comment - I forgot to mention that I've also added some logic to replace any '/' characters with '_' these might be in the field used to select the output file names. The reason for doing this is that we had this field as a file name of the original source from where our logs came from. Without this change the files were written to directories like /usr/share/tomcat5/logs/mylog.log instead of [outputdir] /mylog.log or [outputdir] /usr_share_tomcat4_logs_mylog.log
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435697/PIG-1237.patch
        against trunk revision 909210.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435697/PIG-1237.patch against trunk revision 909210. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/203/console This message is automatically generated.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Gerrit,
        Sorry this fell through the cracks! Just noticed this ticket.

        The ability to specify just one column seems very limited. Perhaps instead one could optionally specify whether to materialize the splitField? I think this would accomplish the same thing in a more general manner.

        Also perhaps this warrants a second constructor, as introducing new arguments to the existing one will break backwards compatibility.

        Show
        Dmitriy V. Ryaboy added a comment - Gerrit, Sorry this fell through the cracks! Just noticed this ticket. The ability to specify just one column seems very limited. Perhaps instead one could optionally specify whether to materialize the splitField? I think this would accomplish the same thing in a more general manner. Also perhaps this warrants a second constructor, as introducing new arguments to the existing one will break backwards compatibility.
        Hide
        Alan Gates added a comment -

        Returning patch to open pending response to Dmitriy's comments.

        Show
        Alan Gates added a comment - Returning patch to open pending response to Dmitriy's comments.

          People

          • Assignee:
            Gerrit Jansen van Vuuren
            Reporter:
            Gerrit Jansen van Vuuren
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development