Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.1.2
    • Fix Version/s: 10.8.1.2
    • Component/s: Documentation
    • Labels:
      None

      Description

      The Hudson job that builds the documentation sometimes fails. See these threads on derby-dev:

      http://mail-archives.apache.org/mod_mbox/db-derby-dev/201101.mbox/%3C129452085.1211295467008379.JavaMail.hudson@aegis%3E
      http://mail-archives.apache.org/mod_mbox/db-derby-dev/201102.mbox/%3C1799570032.14811296675968251.JavaMail.hudson@aegis%3E

      The failure happens when processing the Brazilian Portuguese translation of the reference manual.

      I'm able to reproduce the failure by building the pt_BR.html ant target with locale set to C. It builds fine if the locale is en_US.UTF-8.

      1. dita-build-instructions.diff
        5 kB
        Knut Anders Hatlen
      2. encoding.diff
        2 kB
        Knut Anders Hatlen
      3. antoutput.log.gz
        89 kB
        Knut Anders Hatlen

        Activity

        Hide
        Knut Anders Hatlen added a comment -

        Thanks, Kim. Committed revision 1067138. The changes should show up on the web site in a couple of hours.

        Show
        Knut Anders Hatlen added a comment - Thanks, Kim. Committed revision 1067138. The changes should show up on the web site in a couple of hours.
        Hide
        Kim Haase added a comment -

        Removing the LANG variable step from the build instructions makes sense to me. I really appreciate your diagnosing and fixing this problem, Knut!

        Show
        Kim Haase added a comment - Removing the LANG variable step from the build instructions makes sense to me. I really appreciate your diagnosing and fixing this problem, Knut!
        Hide
        Knut Anders Hatlen added a comment -

        In case we decide that setting the LANG variable can now be removed from the build instructions, I'm attaching the patch dita-build-instructions.diff which makes this change to the website.

        Show
        Knut Anders Hatlen added a comment - In case we decide that setting the LANG variable can now be removed from the build instructions, I'm attaching the patch dita-build-instructions.diff which makes this change to the website.
        Hide
        Knut Anders Hatlen added a comment -

        Committed revision 1066789. Hopefully this will make the Hudson build run cleanly also when it's running on ubuntu1.

        Since this seems to have fixed the garbling of non-ascii characters too, perhaps we should remove the requirement to set the LANG environment variable from the build instructions? The discussion about the problems with building in the C locale, and adding the UTF-8 requirement to the build instructions, is recorded in DERBY-4547 and DERBY-4556. I think that if the non-ascii characters show up fine now, there are no other know problems with building in locales with another default file encoding.

        Show
        Knut Anders Hatlen added a comment - Committed revision 1066789. Hopefully this will make the Hudson build run cleanly also when it's running on ubuntu1. Since this seems to have fixed the garbling of non-ascii characters too, perhaps we should remove the requirement to set the LANG environment variable from the build instructions? The discussion about the problems with building in the C locale, and adding the UTF-8 requirement to the build instructions, is recorded in DERBY-4547 and DERBY-4556 . I think that if the non-ascii characters show up fine now, there are no other know problems with building in locales with another default file encoding.
        Hide
        Knut Anders Hatlen added a comment - - edited

        The problem seems to be caused by the copy and move ant tasks that take a filterset or a filterchain. If no encoding is specified in the build script, ant assumes the files are encoded using the platform's default encoding when it does the filtering. This causes the UTF-8 BOMs to be garbled when the files are copied in a non-UTF-8 locale.

        See http://ant.apache.org/manual/Tasks/copy.html#encoding for a description of the problem.

        The attached patch specifies UTF-8 encoding for the copy and move tasks that do filtering, which makes it build again in my environment.

        As a side effect, this also seems to fix the garbling of non-ascii characters that we saw even before the build started failing. So now I'm able to build the Japanese manuals in the C locale and get output that looks Japanese to me (the question marks are gone).

        Show
        Knut Anders Hatlen added a comment - - edited The problem seems to be caused by the copy and move ant tasks that take a filterset or a filterchain. If no encoding is specified in the build script, ant assumes the files are encoded using the platform's default encoding when it does the filtering. This causes the UTF-8 BOMs to be garbled when the files are copied in a non-UTF-8 locale. See http://ant.apache.org/manual/Tasks/copy.html#encoding for a description of the problem. The attached patch specifies UTF-8 encoding for the copy and move tasks that do filtering, which makes it build again in my environment. As a side effect, this also seems to fix the garbling of non-ascii characters that we saw even before the build started failing. So now I'm able to build the Japanese manuals in the C locale and get output that looks Japanese to me (the question marks are gone).
        Hide
        Knut Anders Hatlen added a comment -

        The Hudson build apparently sets the locale as specified in the build instructions:

        https://hudson.apache.org/hudson/job/Derby-docs/75/console
        + export LANG=en_US.utf8
        + LANG=en_US.utf8
        + export LC_ALL=en_US.utf8
        + LC_ALL=en_US.utf8

        But maybe the en_US.utf8 locale isn't installed on that slave node? At least, the generated docs show question marks instead of non-ascii characters. For example, this file generated by the build shows '??' instead of 'à': https://hudson.apache.org/hudson/job/Derby-docs/ws/trunk/out/devguide/tdevdvlpcollation.html (this file will be replaced the next time the job is triggered, so the link may not show the problem if you look at it later).

        Show
        Knut Anders Hatlen added a comment - The Hudson build apparently sets the locale as specified in the build instructions: https://hudson.apache.org/hudson/job/Derby-docs/75/console + export LANG=en_US.utf8 + LANG=en_US.utf8 + export LC_ALL=en_US.utf8 + LC_ALL=en_US.utf8 But maybe the en_US.utf8 locale isn't installed on that slave node? At least, the generated docs show question marks instead of non-ascii characters. For example, this file generated by the build shows '??' instead of 'à': https://hudson.apache.org/hudson/job/Derby-docs/ws/trunk/out/devguide/tdevdvlpcollation.html (this file will be replaced the next time the job is triggered, so the link may not show the problem if you look at it later).
        Hide
        Knut Anders Hatlen added a comment -

        I did a binary search through the svn history and found that the docs build started failing in the C locale after this commit:

        ------------------------------------------------------------------------
        r1023023 | rhillegas | 2010-10-15 19:13:54 +0200 (Fri, 15 Oct 2010) | 1 line

        DERBY-4851: parameterize copyright years in the user docs so that the master release script can replace them with the current year when we build release distributions.
        ------------------------------------------------------------------------

        However, this commit didn't change any of the byte order markers that the current build is complaining about. And even before this commit, when building with a non-UTF-8 locale, all non-ascii characters were displayed as question marks, so it didn't really work before that either. (Which is probably why http://db.apache.org/derby/manuals/dita.html says that we should use a UTF-8 locale when building the docs.)

        Show
        Knut Anders Hatlen added a comment - I did a binary search through the svn history and found that the docs build started failing in the C locale after this commit: ------------------------------------------------------------------------ r1023023 | rhillegas | 2010-10-15 19:13:54 +0200 (Fri, 15 Oct 2010) | 1 line DERBY-4851 : parameterize copyright years in the user docs so that the master release script can replace them with the current year when we build release distributions. ------------------------------------------------------------------------ However, this commit didn't change any of the byte order markers that the current build is complaining about. And even before this commit, when building with a non-UTF-8 locale, all non-ascii characters were displayed as question marks, so it didn't really work before that either. (Which is probably why http://db.apache.org/derby/manuals/dita.html says that we should use a UTF-8 locale when building the docs.)
        Hide
        Knut Anders Hatlen added a comment -

        One of the dita files in the pt_BR translation (src/pt_BR/ref/rrefjta18596.dita) starts with a UTF-8 BOM. If I remove the BOM, the manual builds just fine. I see there are some dita files with BOMs in the ja_JP translation too, so we may have the same problem there.

        Show
        Knut Anders Hatlen added a comment - One of the dita files in the pt_BR translation (src/pt_BR/ref/rrefjta18596.dita) starts with a UTF-8 BOM. If I remove the BOM, the manual builds just fine. I see there are some dita files with BOMs in the ja_JP translation too, so we may have the same problem there.
        Hide
        Knut Anders Hatlen added a comment -

        Attaching the output from one of the failed Hudson builds.

        Show
        Knut Anders Hatlen added a comment - Attaching the output from one of the failed Hudson builds.

          People

          • Assignee:
            Knut Anders Hatlen
            Reporter:
            Knut Anders Hatlen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development