Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-814

javacc on Win32 (cygwin) creates wrong line endings - fix them with 'ant replace'

    Details

    • Type: Task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: general/build
    • Labels:
      None
    • Environment:

      Windows, Cygwin

    • Lucene Fields:
      Patch Available

      Description

      "ant javacc" in Windows/Cygwin generates files with wrong line endings (\r or \r\n instead of *Nix's \n).
      I managed to get rid of those using perl -p -e 's/(\r\n|\n|\r)/\n/g'
      Some useful info on line ending issues is in http://en.wikipedia.org/wiki/Newline

      After wasting some time to get rid of those, I modified javacc-QueryParser build.xml task to take care of that.
      So now QueryParser files created with "ant javacc" are fixed (if required) to have \n as line ends.

      Should probably do that also for the other javacc targets: javacc-HTMLParser and javacc-StandardAnalyzer

      1. 814.javacc.fixcrlf.common-build.patch
        0.5 kB
        Doron Cohen
      2. 814.javacc.fixcrlf.common-build.patch
        0.9 kB
        Doron Cohen
      3. 814.javacc.fixcrlf.common-build.patch
        0.8 kB
        Doron Cohen
      4. 814.javacc.fixcrlf.common-build.patch
        0.7 kB
        Doron Cohen
      5. 814.javacc.line.ends.patch
        1 kB
        Doron Cohen
      6. LUCENE-814.txt
        0.9 kB
        Steven Parkes
      7. LUCENE-814.txt
        2 kB
        Steven Parkes

        Activity

        Hide
        hossman Hoss Man added a comment -

        a few comments in no particular order...

        1) there is a core ant task called <fixCRLF> that should be able to do what your macro does

        2) this may be related to the fact that svn:eol-style for QueryParser.jj is set to native, yet cygwin likes files to be unix style – this causes problems with bash scripts in cygwin, so depending on how hte javacc task works (since it requires you to specify a javacchome i'm guessing it does a system exex) under cygwin it may be a similar issue (ie: fixable by changing the svn:eol-style to be explicit)

        Show
        hossman Hoss Man added a comment - a few comments in no particular order... 1) there is a core ant task called <fixCRLF> that should be able to do what your macro does 2) this may be related to the fact that svn:eol-style for QueryParser.jj is set to native, yet cygwin likes files to be unix style – this causes problems with bash scripts in cygwin, so depending on how hte javacc task works (since it requires you to specify a javacchome i'm guessing it does a system exex) under cygwin it may be a similar issue (ie: fixable by changing the svn:eol-style to be explicit)
        Hide
        michaelbusch Michael Busch added a comment -

        Doron,

        I tried your patch and it works fine for me. The generated files don't have wrong line endings anymore. Haven't tried Hoss' suggestions yet.

        Show
        michaelbusch Michael Busch added a comment - Doron, I tried your patch and it works fine for me. The generated files don't have wrong line endings anymore. Haven't tried Hoss' suggestions yet.
        Hide
        doronc Doron Cohen added a comment -

        Hoss is right about both:

        1) "ant fixcrlf" is much simpler to use,and does the job. I wish I have discovered that task before fighting with the replace tasks.

        2) QueryParser.jj has svn:eol-style native and hence when extracted under DOS, has Windows eols. There, "ant javacc" which runs as a Win app, would create Windows eols, all dandy. But if Cygwin is used for checkout, jj has Unix eols, however here "ant javacc" still runs as Win app, and the result file is a mix of eol styles.

        I think there is no fixed (non native) setting for jj that would sattisfy all situations. Also, I think this is why Eclipse svn plugin (subclipse) sometimes shows a file as entirely modified (comparing to repository head) - may be related to checkout with cygwin while Eclipse runs as Win app.

        So I see three options:
        1 - leave as is (do not fix)
        2 - add target to build.xml to allow easily fixing eol (what one can do with sed)
        3 - fix eol-style to be the same as that of the jj file (worth the effort?)

        Preferences?

        Show
        doronc Doron Cohen added a comment - Hoss is right about both: 1) "ant fixcrlf" is much simpler to use,and does the job. I wish I have discovered that task before fighting with the replace tasks. 2) QueryParser.jj has svn:eol-style native and hence when extracted under DOS, has Windows eols. There, "ant javacc" which runs as a Win app, would create Windows eols, all dandy. But if Cygwin is used for checkout, jj has Unix eols, however here "ant javacc" still runs as Win app, and the result file is a mix of eol styles. I think there is no fixed (non native) setting for jj that would sattisfy all situations. Also, I think this is why Eclipse svn plugin (subclipse) sometimes shows a file as entirely modified (comparing to repository head) - may be related to checkout with cygwin while Eclipse runs as Win app. So I see three options: 1 - leave as is (do not fix) 2 - add target to build.xml to allow easily fixing eol (what one can do with sed) 3 - fix eol-style to be the same as that of the jj file (worth the effort?) Preferences?
        Hide
        hossman Hoss Man added a comment -

        i think the right solution is to leave the .jj and .java files with svn:eol-style=native, then modify any target which regenerates a .java file from a.jj file to use <fixCRLF> without specifying either the "eol" or "cr" attributes ... the default of <fixCRLF> is to convert the file to the native format ... so it should synx up.

        ...but i haven't tested this (no access to windows/cygwin) so i'm not positive it works.

        if that doesn't work, the safest thing to do might be to set svn:eol-style=CRLF (or any other non variable value) for all of hte generated but commited .java files, and then modify the ant task that generates them to call <fixCRLF eol="CRLF"> on those files.

        (incidently: are you sure that if you do everything in cygwin – both the checkout and running "ant javacc" – you don't wind up with unix style line endings on all files?)

        Show
        hossman Hoss Man added a comment - i think the right solution is to leave the .jj and .java files with svn:eol-style=native, then modify any target which regenerates a .java file from a.jj file to use <fixCRLF> without specifying either the "eol" or "cr" attributes ... the default of <fixCRLF> is to convert the file to the native format ... so it should synx up. ...but i haven't tested this (no access to windows/cygwin) so i'm not positive it works. if that doesn't work, the safest thing to do might be to set svn:eol-style=CRLF (or any other non variable value) for all of hte generated but commited .java files, and then modify the ant task that generates them to call <fixCRLF eol="CRLF"> on those files. (incidently: are you sure that if you do everything in cygwin – both the checkout and running "ant javacc" – you don't wind up with unix style line endings on all files?)
        Hide
        steven_parkes Steven Parkes added a comment -

        Patch that uses fixcrlf and adds a couple of extra gen'd files.

        Show
        steven_parkes Steven Parkes added a comment - Patch that uses fixcrlf and adds a couple of extra gen'd files.
        Hide
        steven_parkes Steven Parkes added a comment -

        I've attached a patch that has the fixcrlf tasks (I wanted to check them under cygwin, to see if they work with mixed crlf/lf files, not just all crlf or all lf.) This seems to give the correct results on both windows/cygwin and linux.

        To Chris's comment: even if you do everything in cygwin, you'll get mixed output files because cygwin doesn't have any impact on java. Cygwin doesn't include java in anyway: you use the normal windows binaries. cygwin bash can call java, javac, ant, etc., but the windows java binaries still don't know anything about cygwin's crlf/lf mapping stuff.

        And just to make things more confusing, it is possible to install cygwin in "dos mode' where it (supposedly) uses crlf rather lf. I have no experience with this. I did go so far as to try to install it this way to test but failed miserably: the two installations fought with each other, so I gave up. I haven't heard of anyone running cygwin in crlf mode anyway.

        Show
        steven_parkes Steven Parkes added a comment - I've attached a patch that has the fixcrlf tasks (I wanted to check them under cygwin, to see if they work with mixed crlf/lf files, not just all crlf or all lf.) This seems to give the correct results on both windows/cygwin and linux. To Chris's comment: even if you do everything in cygwin, you'll get mixed output files because cygwin doesn't have any impact on java. Cygwin doesn't include java in anyway: you use the normal windows binaries. cygwin bash can call java, javac, ant, etc., but the windows java binaries still don't know anything about cygwin's crlf/lf mapping stuff. And just to make things more confusing, it is possible to install cygwin in "dos mode' where it (supposedly) uses crlf rather lf. I have no experience with this. I did go so far as to try to install it this way to test but failed miserably: the two installations fought with each other, so I gave up. I haven't heard of anyone running cygwin in crlf mode anyway.
        Hide
        hossman Hoss Man added a comment -

        > To Chris's comment: even if you do everything in cygwin, you'll get mixed output files
        > because cygwin doesn't have any impact on java. Cygwin doesn't include java in anyway:

        ...hmmm... that's unfortunate, I figured cygwin would influence line.separator.

        actually ... couldn't we set the line.separator sysproperty explicitly before executing the javacc tasks? ... but how would we know what to set it too? i guess we'd have to inspect the source file to see what svn thinks the "native" eol is. that's probably just as much work as fixcrlf.

        Show
        hossman Hoss Man added a comment - > To Chris's comment: even if you do everything in cygwin, you'll get mixed output files > because cygwin doesn't have any impact on java. Cygwin doesn't include java in anyway: ...hmmm... that's unfortunate, I figured cygwin would influence line.separator. actually ... couldn't we set the line.separator sysproperty explicitly before executing the javacc tasks? ... but how would we know what to set it too? i guess we'd have to inspect the source file to see what svn thinks the "native" eol is. that's probably just as much work as fixcrlf.
        Hide
        doronc Doron Cohen added a comment -

        Steven, thanks, this works for me too.

        I usually run ant from Emacs (Win) rather from cygwin, so after Chris's comment I thought that this was the cause. But I checked again - in this regard running from Emacs and from Cygwin shell behaves the same.

        So I think this is the right approach (as Chris suggested: not specifying crlf style, falling to native style).

        One thing with the Java files created by javaCC - QueryParserConstants.java can also be created by the JavaCC parser. Truth is I don't know enough about JavaCC to tell which other files it can generate. So I moved that logic to the macro in common-build, so that crlf are fixed for all the Java files that were modified by (current run of) JavaCC.

        Attaching 814.javacc.fixcrlf.common-build.patch with this approach.

        Show
        doronc Doron Cohen added a comment - Steven, thanks, this works for me too. I usually run ant from Emacs (Win) rather from cygwin, so after Chris's comment I thought that this was the cause. But I checked again - in this regard running from Emacs and from Cygwin shell behaves the same. So I think this is the right approach (as Chris suggested: not specifying crlf style, falling to native style). One thing with the Java files created by javaCC - QueryParserConstants.java can also be created by the JavaCC parser. Truth is I don't know enough about JavaCC to tell which other files it can generate. So I moved that logic to the macro in common-build, so that crlf are fixed for all the Java files that were modified by (current run of) JavaCC. Attaching 814.javacc.fixcrlf.common-build.patch with this approach.
        Hide
        steven_parkes Steven Parkes added a comment -

        That's cool. Didn't know about the modified task.

        Using it does want to create a cache.properties file and it leaves it out there. This version uses other than the default name (in case that name is used anywhere else?) and also removes it when it's done. It also deletes it before it begins. Is that a good thing? Don't know what would happen if it had an old version ahead of time.

        Show
        steven_parkes Steven Parkes added a comment - That's cool. Didn't know about the modified task. Using it does want to create a cache.properties file and it leaves it out there. This version uses other than the default name (in case that name is used anywhere else?) and also removes it when it's done. It also deletes it before it begins. Is that a good thing? Don't know what would happen if it had an old version ahead of time.
        Hide
        doronc Doron Cohen added a comment -

        That's right about the properties file left there.

        I tried the patch and I think the selector is not working as documented - it seems that "modified" does not work correctly with specifying a cache file - it takes all the files. To see this run "ant -debug javacc" (or "ant -debug javacc-QueryParser").

        I didn't find a bug on this in ant's bugzila, but I'm sure I read something about that somewhere... upgrading ant from 1.6.5 to 1.7.0 did not fix this.

        So it may be safer to record the system time before calling javacc and then selecting files dated after that time? - updating 814.javacc.fixcrlf.common-build.patch to do it this way.

        Show
        doronc Doron Cohen added a comment - That's right about the properties file left there. I tried the patch and I think the selector is not working as documented - it seems that "modified" does not work correctly with specifying a cache file - it takes all the files. To see this run "ant -debug javacc" (or "ant -debug javacc-QueryParser"). I didn't find a bug on this in ant's bugzila, but I'm sure I read something about that somewhere... upgrading ant from 1.6.5 to 1.7.0 did not fix this. So it may be safer to record the system time before calling javacc and then selecting files dated after that time? - updating 814.javacc.fixcrlf.common-build.patch to do it this way.
        Hide
        hossman Hoss Man added a comment -

        Doron: if you are seeing problems with the modified selector, it may be because of the modification time granularity issue on windows (which is discussed in the <date> selector, but not the modified selector)

        personally, i don't think it's the end of the world to enumerate hte files we expect javacc to create in the build.xml, but if you want to be craftier then that, i would suggest using the <containsregexp> selector to look for "Generated By:JavaCC" instead of one based on the <date> .. that way people who have the misfortune of editing some file right as the task is running won't wonder why all of their line endings are different.

        Show
        hossman Hoss Man added a comment - Doron: if you are seeing problems with the modified selector, it may be because of the modification time granularity issue on windows (which is discussed in the <date> selector, but not the modified selector) personally, i don't think it's the end of the world to enumerate hte files we expect javacc to create in the build.xml, but if you want to be craftier then that, i would suggest using the <containsregexp> selector to look for "Generated By:JavaCC" instead of one based on the <date> .. that way people who have the misfortune of editing some file right as the task is running won't wonder why all of their line endings are different.
        Hide
        doronc Doron Cohen added a comment -

        The modified selector worked ok with the default cache file, so I thught the problem was with using a different cache file.

        I also suspected the modification time granularity, but adding a sleep between the first and second date selector did not overcome that. Actually, I thought that this granularity issue would cause the oposite problem - changed files to appear as if they did not change, thus added the delay.

        I also considered the regexp "Genereated By JavaCC" selector, but then found out that there are more files (7) containing this string than files that are currently generated by JavaCC task (4) - removing all those files, running javacc-QueryParser did not recreate all of them.

        grep -ni "generated by:.*javacc" *.java
        CharStream.java:1:/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
        ParseException.java:1:/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
        QueryParser.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParser.java */
        QueryParserConstants.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParserConstants.java */
        QueryParserTokenManager.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParserTokenManager.java */
        Token.java:1:/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
        TokenMgrError.java:1:/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */

        So it seemed unreliable to rely on the presense of this string.

        Show
        doronc Doron Cohen added a comment - The modified selector worked ok with the default cache file, so I thught the problem was with using a different cache file. I also suspected the modification time granularity, but adding a sleep between the first and second date selector did not overcome that. Actually, I thought that this granularity issue would cause the oposite problem - changed files to appear as if they did not change, thus added the delay. I also considered the regexp "Genereated By JavaCC" selector, but then found out that there are more files (7) containing this string than files that are currently generated by JavaCC task (4) - removing all those files, running javacc-QueryParser did not recreate all of them. grep -ni "generated by:.*javacc" *.java CharStream.java:1:/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */ ParseException.java:1:/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */ QueryParser.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParser.java */ QueryParserConstants.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParserConstants.java */ QueryParserTokenManager.java:1:/* Generated By:JavaCC: Do not edit this line. QueryParserTokenManager.java */ Token.java:1:/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */ TokenMgrError.java:1:/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */ So it seemed unreliable to rely on the presense of this string.
        Hide
        hossman Hoss Man added a comment -

        Hmmm... if there are files that say "Generated By:JavaCC" which JavaCC does not generate, then perhaps we should remove that line from those files (making the regex a feasible solution)

        It leaves me wondering: why do those files say they were generated by JavaCC if they aren't? ... were they only generated with JavaCC 3.0? are they no longer needed using 3.2? do we depend on them even though they are (evidently) falling out of sync as QueryParser.jj evolves and QueryParser.java changes?

        Show
        hossman Hoss Man added a comment - Hmmm... if there are files that say "Generated By:JavaCC" which JavaCC does not generate, then perhaps we should remove that line from those files (making the regex a feasible solution) It leaves me wondering: why do those files say they were generated by JavaCC if they aren't? ... were they only generated with JavaCC 3.0? are they no longer needed using 3.2? do we depend on them even though they are (evidently) falling out of sync as QueryParser.jj evolves and QueryParser.java changes?
        Hide
        doronc Doron Cohen added a comment -

        I need to refine this claim - if I delete 4 of these 7 files - CharStream.java, ParseException.java, Token.java, TokenMgrError.java - and run "ant javacc-QueryParser" - those files are not created. But if I then modify QueryParser.jj or delete QueryParser.java, all missing files are created (though 3 of them remain unmodified if they exist). So when you run "and javacc", the "clean-javacc" target guarantees that any missing/outdated file out of these 7 files is created/updated.

        By JavaCC's FAQ "What files does JavaCC produce?" http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#tth_sEc2.1 - some of the files are generated only if they do not exist, allowing to edit these ("boiler plate") files without losing the edits by mistake.

        So it would be safest to require both - regexp and the modification time.
        The updated patch do both.

        Show
        doronc Doron Cohen added a comment - I need to refine this claim - if I delete 4 of these 7 files - CharStream.java, ParseException.java, Token.java, TokenMgrError.java - and run "ant javacc-QueryParser" - those files are not created. But if I then modify QueryParser.jj or delete QueryParser.java, all missing files are created (though 3 of them remain unmodified if they exist). So when you run "and javacc", the "clean-javacc" target guarantees that any missing/outdated file out of these 7 files is created/updated. By JavaCC's FAQ "What files does JavaCC produce?" http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm#tth_sEc2.1 - some of the files are generated only if they do not exist, allowing to edit these ("boiler plate") files without losing the edits by mistake. So it would be safest to require both - regexp and the modification time. The updated patch do both.
        Hide
        steven_parkes Steven Parkes added a comment -

        What's the reasoning behind the 100ms sleep? If it's to handle FAT file systems, I thought their resolution was 2s, so 100ms isn't guaranteed to be enough?

        I guess it depends on whether you know what javacc is doing. This article
        http://msdn2.microsoft.com/en-us/library/ms724290.aspx
        http://tinyurl.com/3xp6ou
        says create time is good to 10ms but that write time is only good to 2s. If javacc is rewriting, this might not work but if it always deletes and creates, it sounds like it would be. (I don't have a FAT system around to test this on).

        I'm beginning to wonder if this isn't overkill. We already list a bunch of files (a list I'm not longer sure is complete) for the clean-javacc task.

        Show
        steven_parkes Steven Parkes added a comment - What's the reasoning behind the 100ms sleep? If it's to handle FAT file systems, I thought their resolution was 2s, so 100ms isn't guaranteed to be enough? I guess it depends on whether you know what javacc is doing. This article http://msdn2.microsoft.com/en-us/library/ms724290.aspx http://tinyurl.com/3xp6ou says create time is good to 10ms but that write time is only good to 2s. If javacc is rewriting, this might not work but if it always deletes and creates, it sounds like it would be. (I don't have a FAT system around to test this on). I'm beginning to wonder if this isn't overkill. We already list a bunch of files (a list I'm not longer sure is complete) for the clean-javacc task.
        Hide
        doronc Doron Cohen added a comment -

        The sleep aims at systems with rough clock resolution (5ms). Sleep 5ms would probably suffice, but sleep 100ms seemed safer.

        Preferring cautious (don't fix where not desired/required) over completness (developers running this on Fat32 (if such exist)) seems ok to me.

        Show
        doronc Doron Cohen added a comment - The sleep aims at systems with rough clock resolution (5ms). Sleep 5ms would probably suffice, but sleep 100ms seemed safer. Preferring cautious (don't fix where not desired/required) over completness (developers running this on Fat32 (if such exist)) seems ok to me.
        Hide
        steven_parkes Steven Parkes added a comment -

        Seems like a reasonable trade off to me.

        Show
        steven_parkes Steven Parkes added a comment - Seems like a reasonable trade off to me.
        Hide
        hossman Hoss Man added a comment -

        i'm still not sure why modification time matters ... there's no risk in running <fixCRLF> on a file that's already "ok" right?

        on a related note: if javacc will recreate those files as needed, is there anyreason not to use the same regex matcher in the clear-javacc target? (yes it would delete more files then it does currently, but those files will be created and then we don't have a magic list of files that might get out of date)

        Show
        hossman Hoss Man added a comment - i'm still not sure why modification time matters ... there's no risk in running <fixCRLF> on a file that's already "ok" right? on a related note: if javacc will recreate those files as needed, is there anyreason not to use the same regex matcher in the clear-javacc target? (yes it would delete more files then it does currently, but those files will be created and then we don't have a magic list of files that might get out of date)
        Hide
        doronc Doron Cohen added a comment -

        It was nice if javaCC had a 'native-eol' option. It doesn't, and the modification time is used in attempt to imitate the availability of such an option.

        But I think you are right, there should be no harm in fixing crlf for all the files containing the "generated" string - I'll go and commit it without the date check.

        For the javacc-clean comment - having it deleting all the files containing this string will almost work - except for ParseException in standardTokenizer - where a modified version is maintained. So we could modify javacc-clean to delete all Java files containing that "generated" string, except for ParseException in standardTokenizer. But if we want to do this, it rather be in a separate issue. (Also, I noticed that contrib/surround has no javacc-clean, and contrib/misc/o.a.l/queryParser/precedence has no javacc target.)

        Show
        doronc Doron Cohen added a comment - It was nice if javaCC had a 'native-eol' option. It doesn't, and the modification time is used in attempt to imitate the availability of such an option. But I think you are right, there should be no harm in fixing crlf for all the files containing the "generated" string - I'll go and commit it without the date check. For the javacc-clean comment - having it deleting all the files containing this string will almost work - except for ParseException in standardTokenizer - where a modified version is maintained. So we could modify javacc-clean to delete all Java files containing that "generated" string, except for ParseException in standardTokenizer. But if we want to do this, it rather be in a separate issue. (Also, I noticed that contrib/surround has no javacc-clean, and contrib/misc/o.a.l/queryParser/precedence has no javacc target.)
        Hide
        doronc Doron Cohen added a comment -

        Committed.
        Steven and Chris thanks for your help!

        Show
        doronc Doron Cohen added a comment - Committed. Steven and Chris thanks for your help!

          People

          • Assignee:
            doronc Doron Cohen
            Reporter:
            doronc Doron Cohen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development