Fop
  1. Fop
  2. FOP-1133

Hyphenation does not play well with preserved linefeed-treatment or white-space-treatment

    Details

    • Type: Bug Bug
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: 0.90
    • Fix Version/s: None
    • Component/s: unqualified
    • Labels:
      None
    • Environment:
      Operating System: other
      Platform: Other
    • External issue ID:
      38264

      Description

      When combining the attributes linefeed-treatment="preserve" and
      hyphenate="true", I get some really strange result.

      In fact, it seems that both attribute are applied to the text in turn, which
      DUPLICATES the text in the output. And both outputs are wrong...

      I attach an example FO and the resulting PDF.

      1. hyphen2.fo
        6 kB
        Franck Schmidlin
      2. test-hyphen.pdf
        3 kB
        Franck Schmidlin
      3. white-space-treatment_preserve_hyphenate_true.fo
        0.9 kB
        Vincent Hennebert

        Issue Links

          Activity

          Franck Schmidlin created issue -
          Hide
          Franck Schmidlin added a comment -

          Attachment hyphen2.fo has been added with description: an FO file that demonstrates the problem.

          Show
          Franck Schmidlin added a comment - Attachment hyphen2.fo has been added with description: an FO file that demonstrates the problem.
          Hide
          Franck Schmidlin added a comment -

          Attachment test-hyphen.pdf has been added with description: the PDF output of the hyphen2.fo

          Show
          Franck Schmidlin added a comment - Attachment test-hyphen.pdf has been added with description: the PDF output of the hyphen2.fo
          Hide
          Simon Pepping added a comment -

          This problem is also present in subversion HEAD, rev. 367760

          Show
          Simon Pepping added a comment - This problem is also present in subversion HEAD, rev. 367760
          Hide
          Andreas L. Delmelle added a comment -

          Also interesting to note: if one encloses the content of the second block in testcase
          'block_hyphenation_linefeed-preserve.xml' with an fo:inline, then
          LineLayoutManager.findHyphenationPoints() throws a NullPointerException (line 1486), due to an
          Update being added earlier which has null for an inlineLM...

          Looking closer, I'm wondering whether the strange effect of duplication may have something to with:
          a) a block containing preserved linefeeds generates a Paragraph of Paragraphs
          b) findOptimalBreakingPoints() is called in a loop that iterates /backwards/ over the sub-paragraphs,
          while
          c) findHyphenationPoints() iterates /forwards/ over each sub-paragraph individually

          This opens up the possibility that findHyphenationPoints() adds Updates to the updateList with indices
          that refer to the last sub-paragraph, and those indices are later, in the outer loop, interpreted as
          positions in the first sub-paragraph --or worse, in the super-paragraph?

          Show
          Andreas L. Delmelle added a comment - Also interesting to note: if one encloses the content of the second block in testcase 'block_hyphenation_linefeed-preserve.xml' with an fo:inline, then LineLayoutManager.findHyphenationPoints() throws a NullPointerException (line 1486), due to an Update being added earlier which has null for an inlineLM... Looking closer, I'm wondering whether the strange effect of duplication may have something to with: a) a block containing preserved linefeeds generates a Paragraph of Paragraphs b) findOptimalBreakingPoints() is called in a loop that iterates /backwards/ over the sub-paragraphs, while c) findHyphenationPoints() iterates /forwards/ over each sub-paragraph individually This opens up the possibility that findHyphenationPoints() adds Updates to the updateList with indices that refer to the last sub-paragraph, and those indices are later, in the outer loop, interpreted as positions in the first sub-paragraph --or worse, in the super-paragraph?
          Hide
          Vincent Hennebert added a comment -

          Another problem related to hyphenation and preserved white-space: when
          white-space-treatment is set to "preserve", words are hyphenated correctly but
          the hyphen does not show up.

          Show
          Vincent Hennebert added a comment - Another problem related to hyphenation and preserved white-space: when white-space-treatment is set to "preserve", words are hyphenated correctly but the hyphen does not show up.
          Hide
          Vincent Hennebert added a comment -

          Attachment white-space-treatment_preserve_hyphenate_true.fo has been added with description: Hyphens do not show up when white-space-treatment="preserve"

          Show
          Vincent Hennebert added a comment - Attachment white-space-treatment_preserve_hyphenate_true.fo has been added with description: Hyphens do not show up when white-space-treatment="preserve"
          Hide
          Andreas L. Delmelle added a comment -
              • FOP-1472 has been marked as a duplicate of this bug. ***
          Show
          Andreas L. Delmelle added a comment - FOP-1472 has been marked as a duplicate of this bug. ***
          Hide
          Andreas L. Delmelle added a comment -

          In the meantime, managed to track down the source of the problem with linefeed-treatment="preserve".
          Nothing inherently wrong with the hyphenation loop itself. After the hyphenation-points have been determined, and the updates are processed is where it goes wrong.

          See LineLayoutManager.findHyphenationPoints(), second main loop. For each Paragraph, the corresponding TextLayoutManager.applyChanges() and .getChangedKnuthElements() are used.
          Checking the implementations for those latter two methods reveals that they do not take into account that they can be called multiple times for the same instance. The former always sets the 'returnedIndex' member to 0, which leads to the duplication if the latter is called twice. Each subparagraph in the main paragraph is replaced by a copy of the main paragraph...

          Now still looking for a solution :/

          Show
          Andreas L. Delmelle added a comment - In the meantime, managed to track down the source of the problem with linefeed-treatment="preserve". Nothing inherently wrong with the hyphenation loop itself. After the hyphenation-points have been determined, and the updates are processed is where it goes wrong. See LineLayoutManager.findHyphenationPoints(), second main loop. For each Paragraph, the corresponding TextLayoutManager.applyChanges() and .getChangedKnuthElements() are used. Checking the implementations for those latter two methods reveals that they do not take into account that they can be called multiple times for the same instance. The former always sets the 'returnedIndex' member to 0, which leads to the duplication if the latter is called twice. Each subparagraph in the main paragraph is replaced by a copy of the main paragraph... Now still looking for a solution :/
          Hide
          Andreas L. Delmelle added a comment -

          Trying to gain more understanding of this issue, and as I see it, the full story wrt linefeed-treatment='preserve' and hyphenate='true' is:

          1) for blocks of text containing preserved linefeeds, the TextLayoutManager actually generates multiple Paragraphs (see TextLM.getNextKnuthElements() -> in case of an explicit break, the 'current' sequence is ended, and a new one is added to the returnList)
          2) the optimal line-breaks are determined by the LineLayoutManager per Paragraph ( see LineLM.createLineBreaks() )
          3) the hyphenation-points are determined for each Paragraph in the same loop ( see LineLM.findOptimalBreakingPoints() )
          4) BUT: the integration of hyphenation-points (applyChanges() and getChangedKnuthElements()) operate on the TextLayoutManager instance as a whole.

          => the entire content generated by the TextLM in question is copied as many times as there are paragraphs/preserved linefeeds in the source

          Mainly TextLM.getChangedKnuthElements() is a bit problematic in this regard: every time this is called, it generates an element-list based on the complete set of AreaInfos for the LM. In LineLM.findHyphenationPoints(), each of the original paragraphs is replaced by that list.

          I already tried to change that method to take into account the position-indices of the first and last element in the parameter oldList. This already gets me somewhat further, but still far from committable...

          Show
          Andreas L. Delmelle added a comment - Trying to gain more understanding of this issue, and as I see it, the full story wrt linefeed-treatment='preserve' and hyphenate='true' is: 1) for blocks of text containing preserved linefeeds, the TextLayoutManager actually generates multiple Paragraphs (see TextLM.getNextKnuthElements() -> in case of an explicit break, the 'current' sequence is ended, and a new one is added to the returnList) 2) the optimal line-breaks are determined by the LineLayoutManager per Paragraph ( see LineLM.createLineBreaks() ) 3) the hyphenation-points are determined for each Paragraph in the same loop ( see LineLM.findOptimalBreakingPoints() ) 4) BUT: the integration of hyphenation-points (applyChanges() and getChangedKnuthElements()) operate on the TextLayoutManager instance as a whole. => the entire content generated by the TextLM in question is copied as many times as there are paragraphs/preserved linefeeds in the source Mainly TextLM.getChangedKnuthElements() is a bit problematic in this regard: every time this is called, it generates an element-list based on the complete set of AreaInfos for the LM. In LineLM.findHyphenationPoints(), each of the original paragraphs is replaced by that list. I already tried to change that method to take into account the position-indices of the first and last element in the parameter oldList. This already gets me somewhat further, but still far from committable...
          Hide
          Andreas L. Delmelle added a comment -

          Status update:

          The main difficulty seems to be that the principal iteration in LineLM.createLineBreaks() iterates in reverse order. As a result, applyChanges() is called first for the last Paragraph if the TextLM generates multiple paragraphs.
          Now, while we can keep track of the changed position indices and limit both applyChanges() and getChangedKnuthElements() to operate only on the portion corresponding to oldList, by the time the next-to-last paragraph is processed, the changed positions for the last one should again be modified to take into account added/removed areas for the changes to the preceding one.

          I made such changes locally, and this does avoid the duplication, however, keeping track of the bounding indices is turning out to be quite a pain. As soon as the first paragraph has hyphenation points, the positions pointing into the later paragraphs will be wrong...

          Show
          Andreas L. Delmelle added a comment - Status update: The main difficulty seems to be that the principal iteration in LineLM.createLineBreaks() iterates in reverse order. As a result, applyChanges() is called first for the last Paragraph if the TextLM generates multiple paragraphs. Now, while we can keep track of the changed position indices and limit both applyChanges() and getChangedKnuthElements() to operate only on the portion corresponding to oldList, by the time the next-to-last paragraph is processed, the changed positions for the last one should again be modified to take into account added/removed areas for the changes to the preceding one. I made such changes locally, and this does avoid the duplication, however, keeping track of the bounding indices is turning out to be quite a pain. As soon as the first paragraph has hyphenation points, the positions pointing into the later paragraphs will be wrong...
          Hide
          Andreas L. Delmelle added a comment -

          Both issues fixed in r1039188:

          • combination of linefeed-preserve and hyphenation failed for the reasons described in earlier comments. After having inverted the main loop in LineLM.createLineBreaks() (see r956271), the fix was to modify TextLM.applyChanges() and TextLM.getChangedKnuthElements() to account for the fact that they can be called multiple times for the same instance.
            Additionally, needed to make sure LineLM.hyphenationPerformed is only set if the last paragraph has been hyphenated. Otherwise, hyphenation would be bypassed for all paragraphs following the first preserved linefeed in a block. After modification, hyphenation is only bypassed in case of a re-entry due to changing page-ipd.
          • combination of white-space-treatment="preserve" and hyphenation failed due to an oversight that has probably been present for a while. See LineLM.addInlineArea(), around line 1515: lastLM was only set in case white-space-treatment is not "preserve". If white-space was preserved, this caused the call to LayoutContext.setFlags() some 70-75 lines further down to set LAST_AREA to false (childLM == lastLM), which in turn caused TextLM to ignore the hyphenation character when building the area.
            Fix was to make sure that lastLM always points to the LM of the last KnuthElement in the sequence to be processed.
          Show
          Andreas L. Delmelle added a comment - Both issues fixed in r1039188: combination of linefeed-preserve and hyphenation failed for the reasons described in earlier comments. After having inverted the main loop in LineLM.createLineBreaks() (see r956271), the fix was to modify TextLM.applyChanges() and TextLM.getChangedKnuthElements() to account for the fact that they can be called multiple times for the same instance. Additionally, needed to make sure LineLM.hyphenationPerformed is only set if the last paragraph has been hyphenated. Otherwise, hyphenation would be bypassed for all paragraphs following the first preserved linefeed in a block. After modification, hyphenation is only bypassed in case of a re-entry due to changing page-ipd. combination of white-space-treatment="preserve" and hyphenation failed due to an oversight that has probably been present for a while. See LineLM.addInlineArea(), around line 1515: lastLM was only set in case white-space-treatment is not "preserve". If white-space was preserved, this caused the call to LayoutContext.setFlags() some 70-75 lines further down to set LAST_AREA to false (childLM == lastLM), which in turn caused TextLM to ignore the hyphenation character when building the area. Fix was to make sure that lastLM always points to the LM of the last KnuthElement in the sequence to be processed.
          Hide
          Glenn Adams added a comment -

          batch transition to closed; if someone wishes to restore one of these to resolved in order to perform a verification step, then feel free to do so

          Show
          Glenn Adams added a comment - batch transition to closed; if someone wishes to restore one of these to resolved in order to perform a verification step, then feel free to do so

            People

            • Assignee:
              fop-dev
              Reporter:
              Franck Schmidlin
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development