Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4242

For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining of every line

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • piggybank
    • None
    • Patch Available
    • Reviewed

    Description

      XMLLoader finds the first matching position for the required tag, but applies this offset for all following lines as well until the closing tag. This causes content losses for indented xml formats with multiline contents such as the wikipedia xml dump:

      — example input —

          <page>Look, 
      not a thing is missing.</page>
      

      — current ouput —

      <page>Look, a thing is missing.</page>
      

      — expected ouput —

      <page>Look, not a thing is missing.</page>
      

      Attachments

        1. XMLLoaderMissingContent.patch
          3 kB
          Geza Radics

        Activity

          People

            holdfenytolvaj Geza Radics
            holdfenytolvaj Geza Radics
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: