Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1842

Improve Scalability of the XMLLoader for large datasets such as wikipedia

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0, 0.8.0, 0.9.0
    • 0.8.1
    • impl
    • None
    • Patch Available

    Description

      The current XMLLoader for Pig, does not work well for large datasets such as the wikipedia dataset. Each mapper reads in the entire XML file resulting in extermely slow run times.

      Viraj

      Attachments

        1. PIG-1842_1.patch
          19 kB
          Vivek Padmanabhan
        2. PIG-1842_2.patch
          16 kB
          Vivek Padmanabhan
        3. TEST-org.apache.pig.piggybank.test.storage.TestXMLLoader.txt
          40 kB
          Alan Gates

        Activity

          People

            vivekp Vivek Padmanabhan
            viraj Viraj Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: