Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1842

Improve Scalability of the XMLLoader for large datasets such as wikipedia

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0, 0.8.0, 0.9.0
    • Fix Version/s: 0.8.1
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      The current XMLLoader for Pig, does not work well for large datasets such as the wikipedia dataset. Each mapper reads in the entire XML file resulting in extermely slow run times.

      Viraj

        Attachments

        1. TEST-org.apache.pig.piggybank.test.storage.TestXMLLoader.txt
          40 kB
          Alan Gates
        2. PIG-1842_2.patch
          16 kB
          Vivek Padmanabhan
        3. PIG-1842_1.patch
          19 kB
          Vivek Padmanabhan

          Activity

            People

            • Assignee:
              vivekp Vivek Padmanabhan
              Reporter:
              viraj Viraj Bhat
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: