Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7387

Something wrong with how "File Formats" link is generated in docs/index.html - can cause precommit to fail on some systems

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (7.0), 6.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm not sure what's going on, but here's what I've figured out while poking at things with Ishan to try and figure out why ant precommit fails for him on a clean checkout of master...

      • on my machine, with a clean checkout, the generated index.html file has lines that look like this...
        <li>
        <a href="core/org/apache/lucene/codecs/lucene62
        /package-summary.html#package.description">File Formats</a>: Guide to the supported index format used by Lucene.  This can be customized by using <a href="core/org/apache/lucene/codecs/package-summary.html#package.description">an alternate codec</a>.</li>
        <li>
        

        ...note there is a newline in the href after lucene62

      • on ishan's machine, with a clean checkout, the same line looks like this...
        <li>
        <a href="core/org/apache/lucene/codecs/lucene62%0A/package-summary.html#package.description">File Formats</a>: Guide to the supported index format used by Lucene.  This can be customized by using <a href="core/org/apache/lucene/codecs/package-summary.html#package.description">an alternate codec</a>.</li>
        <li>
        

        ...note that he has a URL escaped 'NO-BREAK SPACE' (U+00A0) character in href attribute.

      • on my machine, ant documentation-lint doesn't complain about the newline in the href attribute when checking links.
      • on ishan's machine, ant documentation-lint most certainly complains about the 'NO-BREAK SPACE'...
        ...
        -documentation-lint:
             [echo] checking for broken html...
            [jtidy] Checking for broken html (such as invalid tags)...
           [delete] Deleting directory /home/ishan/code/chatman-lucene-solr/lucene/build/jtidy_tmp
             [echo] Checking for broken links...
             [exec] 
             [exec] Crawl/parse...
             [exec] 
             [exec] Verify...
             [exec] 
             [exec] file:///build/docs/index.html
             [exec]   BROKEN LINK: file:///build/docs/core/org/apache/lucene/codecs/lucene62%0A/package-summary.html
             [exec] 
             [exec] Broken javadocs links were found!
        BUILD FAILED
        
        

      Raising the following questions...

      • How is either a newline or a 'NO-BREAK SPACE' getting introduced into the $defaultCodecPackage variable that index.xsl uses to generate that href attribute?
      • why doesn't documentation-lint complain that the href has a newline in it on my system?

        Activity

        Hide
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited

        Just downgraded from ant 1.9.6 (that was preinstalled with Fedora 23) to 1.9.4, and ant documentation-lint passed. However, it seems like a genuine bug and shouldn't have passed. I see a newline with 1.9.4 (doc lint passes), and the NO-BREAK SPACE character with 1.9.6 (doc lint fails).

        Show
        ichattopadhyaya Ishan Chattopadhyaya added a comment - - edited Just downgraded from ant 1.9.6 (that was preinstalled with Fedora 23) to 1.9.4, and ant documentation-lint passed. However, it seems like a genuine bug and shouldn't have passed. I see a newline with 1.9.4 (doc lint passes), and the NO-BREAK SPACE character with 1.9.6 (doc lint fails).
        Hide
        hossman Hoss Man added a comment - - edited

        The source of the newline is the original newline in Codec.java ... the way we're using <containsregex/> to only pass through the line we want, and to replace the entire line with only the codec name doesn't do anything to remove the newline ... oddly enough removing $ from the pattern and using flags="s" to get the final . to match (and thus ignore) the line ending doesn't seem to help.

        In this patch I've added an <deletecharacters/> to remove the newline, preceded by an explicit <fixcrlf/> to ensure \n is the only thing we might have at the end of that line, regardless of the platform defaults.


        This doesn't explain why Ant 1.9.4 was converting the newline to a non-breaking space (probably something changed in the xslt tag?) but honestly i don't care as long as we fix the root problem.

        My bigger concern is why documentation-lint isn't failing if/when our links have newlines in them like this?

        Show
        hossman Hoss Man added a comment - - edited The source of the newline is the original newline in Codec.java ... the way we're using <containsregex/> to only pass through the line we want, and to replace the entire line with only the codec name doesn't do anything to remove the newline ... oddly enough removing $ from the pattern and using flags="s" to get the final . to match (and thus ignore) the line ending doesn't seem to help. In this patch I've added an <deletecharacters/> to remove the newline, preceded by an explicit <fixcrlf/> to ensure \n is the only thing we might have at the end of that line, regardless of the platform defaults. This doesn't explain why Ant 1.9.4 was converting the newline to a non-breaking space (probably something changed in the xslt tag?) but honestly i don't care as long as we fix the root problem. My bigger concern is why documentation-lint isn't failing if/when our links have newlines in them like this?
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 38a67e25ae872d921107896e359da5364040ba79 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=38a67e2 ]

        LUCENE-7387: fix defaultCodec in build.xml to account for the line ending

        this not only fixes the link in the javadoc to be correct, but also gets precommit working with ant 1.9.6

        (cherry picked from commit 280cbfd8fb70376be3d32902baa629baf0b66e00)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 38a67e25ae872d921107896e359da5364040ba79 in lucene-solr's branch refs/heads/branch_6x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=38a67e2 ] LUCENE-7387 : fix defaultCodec in build.xml to account for the line ending this not only fixes the link in the javadoc to be correct, but also gets precommit working with ant 1.9.6 (cherry picked from commit 280cbfd8fb70376be3d32902baa629baf0b66e00)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 280cbfd8fb70376be3d32902baa629baf0b66e00 in lucene-solr's branch refs/heads/master from Chris Hostetter
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=280cbfd ]

        LUCENE-7387: fix defaultCodec in build.xml to account for the line ending

        this not only fixes the link in the javadoc to be correct, but also gets precommit working with ant 1.9.6

        Show
        jira-bot ASF subversion and git services added a comment - Commit 280cbfd8fb70376be3d32902baa629baf0b66e00 in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=280cbfd ] LUCENE-7387 : fix defaultCodec in build.xml to account for the line ending this not only fixes the link in the javadoc to be correct, but also gets precommit working with ant 1.9.6

          People

          • Assignee:
            hossman Hoss Man
            Reporter:
            hossman Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development