Maven Doxia
  1. Maven Doxia
  2. DOXIA-386

Snippet Macro: Reference file does not support UTF-8 file format to generate the page garbage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.2
    • Fix Version/s: 1.6
    • Component/s: Core
    • Labels:
      None
    • Environment:
      windows7 zh_CN

      Description

              <plugin>
                <artifactId>maven-site-plugin</artifactId>
                <version>2.1</version>
                <configuration>
                  <locales>zh_CN</locales>
                  <inputEncoding>UTF-8</inputEncoding>
                  <outputEncoding>UTF-8</outputEncoding>
                </configuration>
              </plugin>
      

      my sample apt file:

      %{snippet|file=target/site/reference/html/sample.html|verbatim=false}
      

      sample.html:

      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE html
        PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml"><head><title>&#20013;&#25991;</title></head><body></body></html>
      

      org.apache.maven.doxia.macro.snippet.SnippetReader

      readLines:

       reader = new BufferedReader(new InputStreamReader(source.openStream()));
      

      use InputStreamReader(InputStream in)
      change to:

       InputStreamReader(InputStream in, Charset cs)
      
      1. DOXIA-386.patch
        3 kB
        Michael Osipov

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open In Progress In Progress
          1320d 13h 3m 1 Michael Osipov 19/Nov/13 15:35
          In Progress In Progress Closed Closed
          7m 24s 1 Michael Osipov 19/Nov/13 15:42
          Mark Thomas made changes -
          Workflow jira [ 12956984 ] Default workflow, editable Closed status [ 12993999 ]
          Mark Thomas made changes -
          Link This issue relates to MSKINS-85 [ MSKINS-85 ]
          Mark Thomas made changes -
          Project Import Sun Apr 05 23:17:25 UTC 2015 [ 1428275845026 ]
          Mark Thomas made changes -
          Workflow jira [ 12719811 ] Default workflow, editable Closed status [ 12748468 ]
          Mark Thomas made changes -
          Link This issue relates to MSKINS-85 [ MSKINS-85 ]
          Mark Thomas made changes -
          Project Import Sun Apr 05 09:30:24 UTC 2015 [ 1428226224715 ]
          Michael Osipov made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          Hide
          Michael Osipov added a comment -

          Fixed with r1543585.
          Documented with r1543586.

          Show
          Michael Osipov added a comment - Fixed with r1543585. Documented with r1543586.
          Michael Osipov made changes -
          Fix Version/s 1.6 [ 19820 ]
          Michael Osipov made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Michael Osipov made changes -
          Fix Version/s 1.6 [ 19820 ]
          Michael Osipov made changes -
          Fix Version/s 1.6 [ 19820 ]
          Michael Osipov made changes -
          Attachment DOXIA-386.patch [ 64520 ]
          Hide
          Michael Osipov added a comment -

          A patch which fixes the encoding issue with one glitch: You need to specify the encoding directly in the snippet.

          Show
          Michael Osipov added a comment - A patch which fixes the encoding issue with one glitch: You need to specify the encoding directly in the snippet.
          Michael Osipov made changes -
          Assignee Michael Osipov [ michael-o ]
          Michael Osipov made changes -
          Link This issue relates to MSKINS-85 [ MSKINS-85 ]
          Hide
          Michael Osipov added a comment - - edited

          After an investigation, I have found this spot: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.maven.doxia/doxia-site-renderer/1.4/org/apache/maven/doxia/siterenderer/DefaultSiteRenderer.java#406
          So we would need to pass the encoding all way down. It would require to change a lot of change. I have an lighter patch for that. I simply added an encoding parameter to the snippet macro which works as desired, though I dislike what the sink does. It turns everything above 7 bit into a entity reference but all chars get passed.

          Should I upload the patch and then apply after approval?

          Show
          Michael Osipov added a comment - - edited After an investigation, I have found this spot: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.maven.doxia/doxia-site-renderer/1.4/org/apache/maven/doxia/siterenderer/DefaultSiteRenderer.java#406 So we would need to pass the encoding all way down. It would require to change a lot of change. I have an lighter patch for that. I simply added an encoding parameter to the snippet macro which works as desired, though I dislike what the sink does. It turns everything above 7 bit into a entity reference but all chars get passed. Should I upload the patch and then apply after approval?
          Hide
          Michael Osipov added a comment -

          Just stumbled upon this with MSKINS-85. Lukas, why can't we add an encoding parameter to the snippet which one could set with a Velocity variable. This would be an easy fix.

          Show
          Michael Osipov added a comment - Just stumbled upon this with MSKINS-85 . Lukas, why can't we add an encoding parameter to the snippet which one could set with a Velocity variable. This would be an easy fix.
          Robert Scholte made changes -
          Field Original Value New Value
          Description         <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>2.1</version>
                    <configuration>
                      <locales>zh_CN</locales>
                      <inputEncoding>UTF-8</inputEncoding>
                      <outputEncoding>UTF-8</outputEncoding>
                    </configuration>
                  </plugin>

          my sample apt file:
          %{snippet|file=target/site/reference/html/sample.html|verbatim=false}

          sample.html:
          <?xml version="1.0" encoding="UTF-8"?>
          <!DOCTYPE html
            PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
          <html xmlns="http://www.w3.org/1999/xhtml"><head><title>&#20013;&#25991;</title></head><body></body></html>


          org.apache.maven.doxia.macro.snippet.SnippetReader

          readLines:
           reader = new BufferedReader(new InputStreamReader(source.openStream()));
          use InputStreamReader(InputStream in)
          change to:
           InputStreamReader(InputStream in, Charset cs)

          {code:xml}
                  <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>2.1</version>
                    <configuration>
                      <locales>zh_CN</locales>
                      <inputEncoding>UTF-8</inputEncoding>
                      <outputEncoding>UTF-8</outputEncoding>
                    </configuration>
                  </plugin>
          {code}
          my sample apt file:
          {noformat}
          %{snippet|file=target/site/reference/html/sample.html|verbatim=false}
          {noformat}

          sample.html:
          {code:xml}
          <?xml version="1.0" encoding="UTF-8"?>
          <!DOCTYPE html
            PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
          <html xmlns="http://www.w3.org/1999/xhtml"><head><title>&#20013;&#25991;</title></head><body></body></html>
          {code}

          {{org.apache.maven.doxia.macro.snippet.SnippetReader}}

          readLines:
          {code}
           reader = new BufferedReader(new InputStreamReader(source.openStream()));
          {code}
          use {{InputStreamReader(InputStream in)}}
          change to:
          {code}
           InputStreamReader(InputStream in, Charset cs)
          {code}
          Hide
          Lukas Theussl added a comment -

          Problem is there is no direct way to pass the encoding to the parser. This needs a more general solution.

          Show
          Lukas Theussl added a comment - Problem is there is no direct way to pass the encoding to the parser. This needs a more general solution.
          pinghe created issue -

            People

            • Assignee:
              Michael Osipov
              Reporter:
              pinghe
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development