Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12746

Ref Guide HTML output should adhere to more standard HTML5



    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • 7.6, 8.0
    • documentation
    • None


      The default HTML produced by Jekyll/Asciidoctor adds a lot of extra <div> tags to the content which break up our content into very small chunks. This is acceptable to a casual website reader as far as it goes, but any Reader view in a browser or another type of content extraction system that uses a similar "readability" scoring algorithm is going to either miss a lot of content or fail to display the page entirely.

      To see what I mean, take a page like https://lucene.apache.org/solr/guide/7_4/language-analysis.html and enable Reader View in your browser (I used Firefox; Steve Rowe told me offline Safari would not even offer the option on the page for him). You will notice a lot of missing content. It's almost like someone selected sentences at random.

      Asciidoctor has a long-standing issue to provide a better more semantic-oriented HTML5 output, but it has not been resolved yet: https://github.com/asciidoctor/asciidoctor/issues/242

      Asciidoctor does provide a way to override the default output templates by providing your own in Slim, HAML, ERB or any other template language supported by Tilt (none of which I know yet). There are some samples available via the Asciidoctor project which we can borrow, but it's otherwise unknown as of yet what parts of the output are causing the worst of the problems. This issue is to explore how to fix it to improve this part of the HTML reading experience.




            ctargett Cassandra Targett
            ctargett Cassandra Targett
            0 Vote for this issue
            2 Start watching this issue