Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.10.0, 0.11
-
None
Description
I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
-------
characters = load 'example.xml' using XMLLoader('character');
describe characters
-------
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="PP">
<name>
Peppermint Patty
</name>
<born>
1966-08-22
</born>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<born>
1951-05-30
</born>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<born>
1952-03-03
</born>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
</book>
</library>