Pig
  1. Pig
  2. PIG-2857

Add a -tagPath option to PigStorage

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      We recently added a "-tagSource" option to PigStorage, which allows us to add filenames from which records come to the returned tuples.

      Often, users want the whole path, not just the source file. I propose we add a "-tagPath" option to do this.

      1. PIG-2857_3.patch
        17 kB
        Cheolsoo Park
      2. PIG-2857_2.patch
        15 kB
        Prashant Kommireddi
      3. PIG-2857_1.patch
        13 kB
        Prashant Kommireddi
      4. PIG-2857.patch
        11 kB
        Prashant Kommireddi

        Activity

        Hide
        Cheolsoo Park added a comment -

        +1.

        Committed to trunk. Thanks Prashant!

        Show
        Cheolsoo Park added a comment - +1. Committed to trunk. Thanks Prashant!
        Hide
        Prashant Kommireddi added a comment -

        Go ahead. Thanks Cheolsoo

        Show
        Prashant Kommireddi added a comment - Go ahead. Thanks Cheolsoo
        Hide
        Cheolsoo Park added a comment -

        Hi Prashant,

        Thank you very much. I verified:

        • The doc builds fine.
        • ant test-commit passes.
        • ant test -Dtestcase=TestPigStorage passes.

        I am attaching a new patch where I removed tabs. I also made a minor change to the PigStorage option parsing code as follows:

        from

        +            if (configuredOptions.hasOption("tagsource")) {
        +                mLog.warn("'-tagsource' is deprecated. Use '-tagFile' instead.");
        +            }
                     isSchemaOn = configuredOptions.hasOption("schema");
                     dontLoadSchema = configuredOptions.hasOption("noschema");
        -            tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH);
        +            // Remove -tagsource in 0.13. For backward compatibility we need
        +            // tagsource to be supported until at least 0.12
        +            tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE) || configuredOptions.hasOption("tagsource");
        +            tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH);
        

        to

        -            tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH);
        +            tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE);
        +            tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH);
        +            // TODO: Remove -tagsource in 0.13. For backward compatibility, we
        +            // need tagsource to be supported until at least 0.12
        +            if (configuredOptions.hasOption("tagsource")) {
        +                mLog.warn("'-tagsource' is deprecated. Use '-tagFile' instead.");
        +                tagFile = true;
        +            }
        

        If you're fine with the change, I will go ahead commit it.

        Show
        Cheolsoo Park added a comment - Hi Prashant, Thank you very much. I verified: The doc builds fine. ant test-commit passes. ant test -Dtestcase=TestPigStorage passes. I am attaching a new patch where I removed tabs. I also made a minor change to the PigStorage option parsing code as follows: from + if (configuredOptions.hasOption( "tagsource" )) { + mLog.warn( "'-tagsource' is deprecated. Use '-tagFile' instead." ); + } isSchemaOn = configuredOptions.hasOption( "schema" ); dontLoadSchema = configuredOptions.hasOption( "noschema" ); - tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH); + // Remove -tagsource in 0.13. For backward compatibility we need + // tagsource to be supported until at least 0.12 + tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE) || configuredOptions.hasOption( "tagsource" ); + tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH); to - tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH); + tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE); + tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH); + // TODO: Remove -tagsource in 0.13. For backward compatibility, we + // need tagsource to be supported until at least 0.12 + if (configuredOptions.hasOption( "tagsource" )) { + mLog.warn( "'-tagsource' is deprecated. Use '-tagFile' instead." ); + tagFile = true ; + } If you're fine with the change, I will go ahead commit it.
        Hide
        Prashant Kommireddi added a comment -

        Thanks Cheolsoo. I have updated the patch with your feedback incorporated, except that I wasn't sure about where the tabs were present.

        Show
        Prashant Kommireddi added a comment - Thanks Cheolsoo. I have updated the patch with your feedback incorporated, except that I wasn't sure about where the tabs were present.
        Hide
        Cheolsoo Park added a comment -

        Overall looks good to me. Two comments:

        • Can you update the doc (func.xml) too? We probably should keep tagsource yet indicate it as depreciated in the doc.
        • Can you update the comment of testPigStorageSourceTagSchema()? It still mentions tagsource.
              /** 
               * This is for testing source tagging option on PigStorage. When a user
               * specifies '-tagsource' as an option, PigStorage must prepend the input
               * source path to the tuple and "INPUT_FILE_NAME" to schema.
               * 
               * @throws Exception
               */
          
        • Can you update the following line in testPigStorageSourceTagValue()? It still mentions tagsource.
          assertEquals("tagsource value must be part-m-00000", inputFileName, storeFileName);
          
        • Can you add some comment to code that tests -tagPath in testPigStorageSourceTagSchema() just to be clear?
        • Can you remove tabs in the patch?

        Thanks!

        Show
        Cheolsoo Park added a comment - Overall looks good to me. Two comments: Can you update the doc ( func.xml ) too? We probably should keep tagsource yet indicate it as depreciated in the doc. Can you update the comment of testPigStorageSourceTagSchema() ? It still mentions tagsource . /** * This is for testing source tagging option on PigStorage. When a user * specifies '-tagsource' as an option, PigStorage must prepend the input * source path to the tuple and "INPUT_FILE_NAME" to schema. * * @ throws Exception */ Can you update the following line in testPigStorageSourceTagValue() ? It still mentions tagsource . assertEquals( "tagsource value must be part-m-00000" , inputFileName, storeFileName); Can you add some comment to code that tests -tagPath in testPigStorageSourceTagSchema() just to be clear? Can you remove tabs in the patch? Thanks!
        Hide
        Prashant Kommireddi added a comment -

        Ping!

        Show
        Prashant Kommireddi added a comment - Ping!
        Hide
        Prashant Kommireddi added a comment -

        Had made changes to Utils.java that I did not add to the previous patch. Attaching an updated patch.

        Show
        Prashant Kommireddi added a comment - Had made changes to Utils.java that I did not add to the previous patch. Attaching an updated patch.
        Hide
        Prashant Kommireddi added a comment -

        Coming back to this. I think we should keep 'tagsource' in the next release for backward compatibility and as you suggested, log the deprecation message. May be we can remove 'tagsource' for 0.13. Adding a patch that now uses '-tagFile' for source filename and '-tagPath' for source path. I have modified tests accordingly to handle the same.

        Show
        Prashant Kommireddi added a comment - Coming back to this. I think we should keep 'tagsource' in the next release for backward compatibility and as you suggested, log the deprecation message. May be we can remove 'tagsource' for 0.13. Adding a patch that now uses '-tagFile' for source filename and '-tagPath' for source path. I have modified tests accordingly to handle the same.
        Hide
        Dmitriy V. Ryaboy added a comment -

        I think so. And perhaps we can deprecate tagSource (log a message to that effect), and rename it to -tagFile (as opposed to -tagPath) ?

        Show
        Dmitriy V. Ryaboy added a comment - I think so. And perhaps we can deprecate tagSource (log a message to that effect), and rename it to -tagFile (as opposed to -tagPath) ?
        Hide
        Prashant Kommireddi added a comment -

        Should only 1 of "tagSource" or "tagpath" options be allowed?

        Show
        Prashant Kommireddi added a comment - Should only 1 of "tagSource" or "tagpath" options be allowed?

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development