Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3373

XMLLoader returns non-matching nodes when a tag name spans through the block boundary

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • site
    • 0.13.0
    • piggybank
    • Patch Available
    • Hide
      I added a new patch that fixes this bug. It turned out that this bug happens only when the input file is .bz2 compressed and the non-matching tag spans two file splits in the compressed file. Since it's almost impossible to tailor an example that has this bug since the compression is virtually non-deterministic, I included a random generator that generates this test case.
      I don't like the idea of discovering a bug using this randomly generated file since, by definition, it's non-deterministic, I attached the test file for reference.
      The fix is still the same as the previous patch, but this time, the test fails without this fix.
      Show
      I added a new patch that fixes this bug. It turned out that this bug happens only when the input file is .bz2 compressed and the non-matching tag spans two file splits in the compressed file. Since it's almost impossible to tailor an example that has this bug since the compression is virtually non-deterministic, I included a random generator that generates this test case. I don't like the idea of discovering a bug using this randomly generated file since, by definition, it's non-deterministic, I attached the test file for reference. The fix is still the same as the previous patch, but this time, the test fails without this fix.

    Description

      When node start tag spans two blocks this tag is returned even if it is not of the type.
      Example: For the following input file

      <event id="3423">
      <ev
      -------- BLOCK BOUNDARY
      entually id="dfasd">

      XMLoader with tag type 'event' should return only the first one but it actually returns both of them

      Attachments

        1. test-file-2.xml.bz2
          119 kB
          Ahmed Eldawy
        2. PIG3373.patch
          4 kB
          Ahmed Eldawy
        3. PIG3373_3.patch
          7 kB
          Ahmed Eldawy
        4. PIG3373_2.patch
          3 kB
          Ahmed Eldawy
        5. PIG3373_1.patch
          4 kB
          Ahmed Eldawy
        6. bad-file.xml.bz2
          209 kB
          Ahmed Eldawy

        Issue Links

          Activity

            People

              aseldawy Ahmed Eldawy
              aseldawy Ahmed Eldawy
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: