Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4355

Piggybank: XPath cant handle namespace in xpath, nor can it return more than one match

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: piggybank
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      If you pass an xpath that contains a namespace the XPath UDF will always fail to match.

      It would be better to either silently remove the namespace or provide a parameter that will remove it.

      The reason it is desirable to ignore xpath's with namespaces is that many xml tools when selecting an xpath provide the namespace. It makes cutting & pasting into a pig script painful if you need to manually remove it.

      Additionally XPath only returns the first match. It is often desirable to return all matches and allow for a flattening to process multiple records. An XPathAll would be useful to have.

      A patch is available as a git pullrequest at
      https://github.com/apache/pig/pull/14

        Attachments

        1. 14.diff
          43 kB
          Jianyong Dai

          Activity

            People

            • Assignee:
              cavanaug John Cavanaugh
              Reporter:
              cavanaug John Cavanaugh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: