PDFBox
  1. PDFBox
  2. PDFBOX-1512

TextPositionComparator is not compatible with Java 7

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.1, 2.0.0
    • Fix Version/s: 1.8.8, 2.0.0
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      Java 7

      Description

      The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract!

      I think the problem is with this check:

      if ( yDifference < .1 ||
      (pos2YBottom >= pos1YTop && pos2YBottom <= pos1YBottom) ||
      (pos1YBottom >= pos2YTop && pos1YBottom <= pos2YBottom))

      as it violates the contract requirement:

      The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.

      Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.

      Java 7 now is strict and throws exceptions when the contract is violated.

      1. WFI_PDFParser_TextPostionComparator.txt
        3 kB
        SCHAEFER B.S.
      2. TopoOverlap.txt
        0.0 kB
        Maruan Sahyoun
      3. TopoOverlap.pdf
        18 kB
        Maruan Sahyoun
      4. TopoContained.txt
        0.0 kB
        Maruan Sahyoun
      5. TopoContained.pdf
        19 kB
        Maruan Sahyoun
      6. Topo.txt
        0.0 kB
        Maruan Sahyoun
      7. Topo.pdf
        17 kB
        Maruan Sahyoun
      8. TextPositionComparator.java
        3 kB
        Benjamin Papez
      9. quicksort.patch
        8 kB
        Uwe
      10. immo-kurier_arsenal_93x62.pdf
        1.63 MB
        Hannes Erven
      11. illustration-of-inconsistent-sorting.png
        3 kB
        Hannes Erven
      12. FOP-2252.pdf
        205 kB
        Tilman Hausherr

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Andreas Lehmkühler
              Reporter:
              Benjamin Papez
            • Votes:
              12 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development