Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3104

XMLLoader return Pig tuple/map/bag representation of the DOM of XML documents

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0, 0.11
    • 0.18.0
    • internal-udfs, piggybank
    • None

    Description

      I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.

      -------

      characters = load 'example.xml' using XMLLoader('character');
      describe characters

      {properties:map[], name:chararray, born:datetime, qualification:chararray}

      -------
      <book id="b0836217462" available="true">
      <isbn>
      0836217462
      </isbn>
      <title lang="en">
      Being a Dog Is a Full-Time Job
      </title>
      <author id="CMS">
      <name>
      Charles M Schulz
      </name>
      <born>
      1922-11-26
      </born>
      <dead>
      2000-02-12
      </dead>
      </author>
      <character id="PP">
      <name>
      Peppermint Patty
      </name>
      <born>
      1966-08-22
      </born>
      <qualification>
      bold, brash and tomboyish
      </qualification>
      </character>
      <character id="Snoopy">
      <name>
      Snoopy
      </name>
      <born>
      1950-10-04
      </born>
      <qualification>
      extroverted beagle
      </qualification>
      </character>
      <character id="Schroeder">
      <name>
      Schroeder
      </name>
      <born>
      1951-05-30
      </born>
      <qualification>
      brought classical music to the Peanuts strip
      </qualification>
      </character>
      <character id="Lucy">
      <name>
      Lucy
      </name>
      <born>
      1952-03-03
      </born>
      <qualification>
      bossy, crabby and selfish
      </qualification>
      </character>
      </book>
      </library>

      Attachments

        Activity

          People

            daijy Daniel Dai
            russell.jurney Russell Jurney
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: