Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3619

Provide XPath function

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF.

      Example usage of this the XPath UDF would be:

      extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title');
      

      The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above.

        Attachments

        1. xpath.patch
          13 kB
          Saad Patel

          Activity

            People

            • Assignee:
              saadp Saad Patel
              Reporter:
              saadp Saad Patel
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: