Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3619

Provide XPath function

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • piggybank
    • None
    • Patch Available

    Description

      Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF.

      Example usage of this the XPath UDF would be:

      extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title');
      

      The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above.

      Attachments

        1. xpath.patch
          13 kB
          Saad Patel

        Activity

          People

            saadp Saad Patel
            saadp Saad Patel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: