Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-55

Allow user control over split creation

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.0.0
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      I have a dataset in HDFS that's stored in a file per column that I'd like to access from pig. This means I can't use LoadFunc to get at the data as it only allows the loader access to a single input stream at a time. To handle this usage, I've broken the existing split creation code out into a few classes and interfaces, and allowed user specified load functions to be used in place of the existing code.

        Attachments

        1. replaceable_PigSplit.diff
          41 kB
          Charlie Groves
        2. replaceable_PigSplit_v2.diff
          34 kB
          Charlie Groves
        3. pig_chunker_split.patch
          48 kB
          Charlie Groves
        4. pig_chunker_split_v7.patch
          71 kB
          Charlie Groves
        5. pig_chunker_split_v6.patch
          68 kB
          Charlie Groves
        6. pig_chunker_split_v5.patch
          68 kB
          Charlie Groves
        7. pig_chunker_split_v4.patch
          67 kB
          Charlie Groves
        8. pig_chunker_split_v3.patch
          61 kB
          Charlie Groves
        9. pig_chunker_split_v2.patch
          51 kB
          Charlie Groves

          Activity

            People

            • Assignee:
              groves Charlie Groves
              Reporter:
              groves Charlie Groves
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: