Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2541

Automatic record provenance (source tagging) for PigStorage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.1
    • 0.10.0, 0.11
    • impl
    • None
    • Reviewed
    • Hide
      We add a new option -tagsource to PigStorage. With this flag, we can get the INPUT_FILE_NAME as the first column of the output data. eg:

      a = load '1.txt' using PigStorage('\t', '-tagsource');
      Show
      We add a new option -tagsource to PigStorage. With this flag, we can get the INPUT_FILE_NAME as the first column of the output data. eg: a = load '1.txt' using PigStorage('\t', '-tagsource');

    Description

      There are a lot of interests in knowing where the data comes from when loading from a directory (or a set of directories). One can do it manually (see https://cwiki.apache.org/confluence/display/PIG/FAQ). But it will be more convenient for users if we implement this in the PigStorage with a command line option (e.g., pig.source.tagging=true/false) to turn it on/off. By default it will be off.

      Attachments

        1. PIG-2541_2.patch
          7 kB
          Prashant Kommireddi
        2. PIG-2541_3.patch
          11 kB
          Prashant Kommireddi
        3. PIG-2541.doc.patch
          1 kB
          Daniel Dai
        4. PIG-2541.patch
          3 kB
          Prashant Kommireddi

        Activity

          People

            prkommireddi Prashant Kommireddi
            rding Richard Ding
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: