|
[
Permlink
| « Hide
]
Rory Winston added a comment - 25/Feb/08 06:21 PM
Hmmm. This is a tough one. With no year information in the date string, SimpleDateFormat clears the internal Calendar and sets the YEAR field to 1970, the epoch (the default). Normally, this will have no impact on date parsing, except when the year in question is a leap year, as 1970 wasn't one. Thus, SDF thinks that Feb 29th should be riolled over to March 1st in lenient mode, and in strict mode (the default for Net parsing), it simply returns null. Of course, the real culprit is the horrible Java date/time package implementation. Using something like Joda-time for this works as expected.
Something like this would fix it (I've tried it), but I think it's nasty:
if (short date validation fails) Any comments? I'm tempted to mark this as WONTFIX. The ftp directory listing output on Unix systems generally corresponds to that produced by "ls -l".
The man page for ls (e.g. on FreeBSD) says: If the modification time of the file is more than 6 months in the past or So if the year is missing, why not set it according to the rule above? >Any comments? I'm tempted to mark this as WONTFIX.
Unfortunately it will not work well, because of lack of crossover year consideration. Ex: If a file was created in Dec-2007 and current date was Jan-2008, the missing year would be 2007 (not current year). I guess this issue isn't caused by a JDK implementation. A fix for this would be greatly appreciated.
Rory: If not in the Commons Net, where should this be addressed in your opinion? It is a JDK implementation issue. That's the whole point of the analysis.
If the fie was created in Dec 2007, and it is now Jan 2008, then the year will be omitted on Unix systems.
Since Dec is less that 6 months behind Jan, and > 6 months ahead, the omitted year must be 2007. Pseudocode: If year is not present That should work. I do think this will only happen in the case of Feb 29th , in which case we dont need to check the year, as it will always be the current year.
Agreed Feb 29 cannot be the previous year as it is more than 6 months from the end of the year.
It could potentially be the next year - if the system clock has been turned back, or the file is future-dated. However the most common case is that Feb 29 will be the current year. Something like this would work, if the initial recent date format parsing fails:
if (recentDateFormat != null) { pp = new ParsePosition(0); int year = Calendar.getInstance().get(Calendar.YEAR); String timeStampStrPlusYear = timestampStr + " " + year; SimpleDateFormat hackFormatter = new SimpleDateFormat(recentDateFormat.toPattern() + " yyyy"); hackFormatter.setLenient(false); parsed = hackFormatter.parse(timeStampStrPlusYear, pp); } if (parsed != null && pp.getIndex() == timestampStr.length() + 5) { working.setTime(parsed); } Although that is really hacky. Ok....I've added a "fix" on the 2.0 dev branch for testing.....does anyone feel brave enough to check it out, build it, and try a test to see if it works on their system?
$ svn checkout http://svn.apache.org/repos/asf/commons/proper/net/branches/NET_2_0 Rory, thank you for the quick follow-up.
In my environment, I got it to work when I also provided DateFormatSymbols to the constructor to make sure the formatting is done in the same locale. Please see below for proposed changes: Index: FTPTimestampParserImpl.java =================================================================== --- FTPTimestampParserImpl.java (revision 631446) +++ FTPTimestampParserImpl.java (working copy) @@ -104,8 +104,9 @@ pp = new ParsePosition(0); int year = Calendar.getInstance().get(Calendar.YEAR); String timeStampStrPlusYear = timestampStr + " " + year; - SimpleDateFormat hackFormatter = new SimpleDateFormat(recentDateFormat.toPattern() + " yyyy"); + SimpleDateFormat hackFormatter = new SimpleDateFormat(recentDateFormat.toPattern() + " yyyy", recentDateFormat.getDateFormatSymbols()); hackFormatter.setLenient(false); + hackFormatter.setTimeZone(recentDateFormat.getTimeZone()); parsed = hackFormatter.parse(timeStampStrPlusYear, pp); } if (parsed != null && pp.getIndex() == timestampStr.length() + 5) { Thanks HONMA - came across this when our production system refused to retrieve any files for today. This fix works a treat.
A combination of this issue and this one https://issues.apache.org/bugzilla/show_bug.cgi?id=41724
I hope these fixes in both projects will soon reach a release. I'm not in a position to ship the a 3rd party library from subversion as part of our companies product, and this bug is critical for me.
I need a workaround for this, unfortunately my attempts are getting more and more hacky, my original attempt involved using the deprecated API to create my own subclass of UnixFTPEntryParser, and then I overrode parseTimestamp() to simply return null as for my purposes I do not need the timestamp. Unfortunately the code in UnixFTPEntryParser.parseFTPEntry(...) calls super.parseTimestamp(...) which makes this approach fail. I'm also quite perturbed that the API to plug in your own parser has been deprecated and no alternative method to provide your own parser has been implemented, it's naive to say the least to assume that the commons-net implementation will support all possible list formats in existence. I've figured out the the real problem is setLenient (false). A lenient SDF parses Feb 29 correctly. I wonder if is there any reason to use non-lenient SDF? What if just remove setLenient(false)?
Attachted is a somewhat crude but effective work-around: "Feb 29" is replaced by "Feb 28" and after parsing the day is incremented.
Please consider the following.
1. This bug is not specific to leap year. This will also occur if a file has a timestamp for the non-existent hour during a DST switch. 2. If you setLenient(True), Feb 29 will be returned as March 1. The current implementation will then see March 1 (on Feb 29th) as the previous year. > On March 9, 2008 I expect all files created between 2:00am and 2:59am to have a similar issue.
Actually, these timestamps won't have a problem since they existed in 1970. However, trying to parse "Apr 26 02:01" with SimpleDateFormat (using format "MMM d HH:mm") in non-lenient mode does in fact result in the same ParseException being thrown since that was the DST cutover in 1970. Guys
Thanks for all the comments and suggestions. I think that I may go for a combination of my suggested fix, Honma's enhancements, and a GregorianCalendar::isLeapYear() check that somebody else suggested. In regards to the question about setLenient(), its actually important that we use strict parsing, otherwise dates that the date parser regards as invalid may be rolled over to the nearest valid date, which we don't want. I'll check out the scenario that Terrance has raised before committing a final fix. I cant reproduce the problem with Apr 26 02:01 - parses fine for me.
I'll add the isLeapYear() check, as I think the only instance where this is a problem is on a leap year day. Seems to me that a lot of the problems have arisen because the array item that is returned for invalid dates is null.
This might possibly still happen in some cases, so may I suggest that the code at least returns a valid entry? At least then the file name and most attributes would be available. The attached JUnit test case shows that this problem will reoccur on any date/time that doesn't exist in the default year 1970. Specifically, April 26 at 2am (the time at which DST was put into affect in 1970) is unparseable by SimpleDateFormat:
java.text.ParseException: Unparseable date: "Apr 26 02:00" This issue may be specific to the default timezone of your system so the test case explicitly calls sets the time zone to "US/Central" which is my default and exhibits the ParseException failure. It looks like the bug occurs only under jdk 1.5 and above.
It does not appeir under jdk 1.3.1 and 1.4.2. I tested it with SUN jdk. The fix will be released as 1.4.2, I hope, not just added to the 2.0 trunk?
I have added the fix on both the 1.5.0 and 2.0 branches. I have also removed the specific leap year check, as Brian's test case shows that other parsing ambiguities can occur.
Current code (FTPTimestampParserImpl r633380) does not work.
For example: FTPTimestampParserImpl parser = new FTPTimestampParserImpl(); generates 01-Mar-2008 20:04:00 I think the code needs to add the year (current or +/- 1 year) to the string to be parsed. Hi Sebb
This works fine for me on net-1.5 and net-2.0. Can you step into the code and make sure that it is hitting the point where it adds the year element (which it should be doing already)? Just found that the test works on Java 1.5 but fails with Java 1.4 (and probably 1.3)
Presumably something has changed in the Java 1.5 run-time library. Since net 1.5 is targetted at Java 1.3 the problem needs to be fixed. There also clearly need to be test cases for Feb 29. public void testParseFeb29() throws Exception { FTPTimestampParserImpl parser = new FTPTimestampParserImpl(); Calendar cal; cal=parser.parseTimestamp("feb 29 20:04"); int dom = cal.get(Calendar.DAY_OF_MONTH); int mon = cal.get(Calendar.MONTH); if (dom != 29 || mon != Calendar.FEBRUARY){ mon++; fail("failed to parse Feb 29; gave mon="+mon+" dom="+dom); } } This is related to
Note that FreeBSD shows past and future dates without the year if the date is within 6 months of the current date, for example: -rw-r--r-- 1 user Domain Users 0 Sep 9 2007 200709091234.tmp -rw-r--r-- 1 user Domain Users 0 Sep 10 12:34 200709101234.tmp -rw-r--r-- 1 user Domain Users 0 Sep 7 12:34 200809071234.tmp -rw-r--r-- 1 user Domain Users 0 Sep 8 2008 200809081234.tmp Sun Mar 9 15:05:09 EDT 2008 # date when listing was obtained The file names show the actual timestamp used to "touch" the files. Looks like the interval that is used is 26 weeks. Patch which appears to solve the Feb 29 problem on Java 1.4.
Can also be applied to NET_2.0 branch if desired. The patch changes the processing so the short form is always parsed with the current year appended. So the short form format string(s) could potentially be changed to include the year. I applied the patch, and it works fine for Feb.29.
But I see a problem when it is Dec.31 on the client computer, server is in a different time zone and shows a date as Jan.01 – in this case the result date is rolled back a whole year. Here is a test case, that can be added to FTPTimestampParserImplTest.java, and fails with both Commons.Net 1.4.1 as well as HEAD as well as with the patch applied: public void testParseJan01() throws Exception { GregorianCalendar now = new GregorianCalendar(2007, Calendar.DECEMBER, 31, 12, 0); checkShortParse("2007-12-31",now,now); // should always work GregorianCalendar target = (GregorianCalendar) now.clone(); target.add(Calendar.DAY_OF_YEAR, +1); checkShortParse("2008-1-1",now,target); } I think this is the kind of situation that was originally meant to address with the setLenienFutureDates() method - See The right solution, IMHO, is to allow short dates +- 6 months in the future or past, which seems to be in line with what FreeBSD is doing as well as what Linux is doing. Note that allowing a short date in the future might require adding "current year + 1" to the current Feb29 hack to fix a case where a future date is a leap year on Feb.29. Patch to add testcase for Jan01 to FTPTimestampParserImplTest.java
Sorry for not knowing how to format the testcase properly in a Jira comment. Ok, so if we add the +/- 6 month date parsing logic, do you think we should make this the default? Or an optional operating mode (like the current lenient mode)? Or could we just drop the lenient mode if we add the +/- 6 month parsing?
I'd suggest that in terms of API, when a programmer knows the exact behavior of the server, he should be able to get exact timestamps even if they are short and/or in the future. Right now we know that FreeBSD does +-6-month short dates; Linux RHEL4 does +0/-6-month short dates.
So what about a new API like The old setLenientFutureDates(boolean) method could be deprecated, and translated into this: I'm not completely sure what would be the best default value for the new settings. Given that lenientFutureDates was false by default, we should probably always expect dates in the past only by default. On the other hand, given that a "one day in future" date is much more likely due to time zone differences than a "365 days in the past" short date, a setting of +1 day / -364 days might be a more appropriate default. As a matter of fact, we cannot ever guarantee correct parsing when the user has not specified by API what the setting should be. So the goal of a good default setting could either be "avoid invalid dates even if a parse error occurs" (not expected by the average user, IMHO, as the Feb.29 case shows); or, "always fall back to a reasonable date as it would be read by the user, even if it might be incorrect". Given that the original RFC for FTP says "The output of DIR is designed for human readers and might not be machine readable", I think the default should be guided by what a human user would expect from the listing. That being said, I think that when it's March 15 and I'm reading "March 16" I'd expect a future date; reading "March 20" I'd expect a past date; so I'm in favor of a "+1 day / -364 days" strategy by default. Which is, admittedly, breaking the previous behavior which was lenientFuture == false meanding a +0 / -365 strategy. Don't you guys think that the piece of code below:
if (lenientFutureDates) { // add a day to "now" so that "slop" doesn't cause a date // slightly in the future to roll back a full year. (Bug 35181) now.add(Calendar.DATE, 1); } if (working.after(now)) { working.add(Calendar.YEAR, -1); } should also be executed when the hackFormatter is used? Rather than only when the recentDateFormat is used? Applied patch of 12/Mar/08 01:19 PM to trunk:
http://svn.apache.org/viewvc?rev=645336&view=rev This fixes the Feb 29 test errors. As far as I can see also addresses Mauricio's comment of 24/Mar/08. There remains the problem of future dates. I think the original bug (Feb 29 parsing) reported in this issue has now been solved, so this issue can be closed.
I've created two new JIRA issues to deal with the other issues raised in the discussion above:
I'm also fine with closing this. There's only FTPTimestampParserImpl line 124 which I do not quite understand:
<code> This code branch only gets executed when the hackFormatter parsed OK, so it must have read the year already and setting the year yet again should be unnecessary? I think you're correct - looks like that line is a left-over from the previous code, where the hackFormatter parsing was done later.
Fixed in: I've done another fix for the leap year problem. This should work regardless which is the correct year for the 29. Feb (this year or last year).
See attached file "FTPTimestampParserImpl.patch" Index: FTPTimestampParserImpl.java private SimpleDateFormat defaultDateFormat; /** Date parsed = null;
/** @@ -179,6 +212,9 @@ /** String defaultFormatString = config.getDefaultDateFormatStr(); If this is fixed, you should close this issue.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||