Issue Details (XML | Word | Printable)

Key: NET-99
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: henrik
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Net

[net] MVSFTPEntryParser.java only halfway implemented

Created: 27/Mar/06 05:15 AM   Updated: 06/Dec/06 09:45 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Java Source File Licensed for inclusion in ASF works FTPParseTestFramework.java 2006-09-01 11:33 AM henrik sorensen 5 kB
Java Source File Licensed for inclusion in ASF works FTPParseTestFramework.java 2006-08-30 08:42 PM henrik sorensen 5 kB
Java Source File Licensed for inclusion in ASF works MVSFTPEntryParser.java 2006-09-01 11:33 AM henrik sorensen 20 kB
Java Source File Licensed for inclusion in ASF works MVSFTPEntryParser.java 2006-08-28 07:28 PM henrik sorensen 18 kB
Java Source File MVSFTPEntryParser.java 2006-04-22 01:34 AM henrik 13 kB
Java Source File MVSFTPEntryParser.java 2006-04-10 05:40 AM Steve Cohen 9 kB
Java Source File MVSFTPEntryParser.java 2006-04-10 04:13 AM Steve Cohen 9 kB
Java Source File MVSFTPEntryParser.java 2006-04-07 07:17 PM henrik 14 kB
Java Source File MVSFTPEntryParser.java 2006-03-27 05:16 AM henrik 7 kB
Java Source File Licensed for inclusion in ASF works MVSFTPEntryParserTest.java 2006-09-01 11:33 AM henrik sorensen 8 kB
Java Source File Licensed for inclusion in ASF works MVSFTPEntryParserTest.java 2006-08-30 08:42 PM henrik sorensen 6 kB
Java Source File Licensed for inclusion in ASF works RegexFTPFileEntryParserImpl.java 2006-08-28 07:28 PM henrik sorensen 5 kB
Environment:
Operating System: other
Platform: Other

Bugzilla Id: 39109
Resolution Date: 06/Dec/06 09:45 AM


 Description  « Hide
I have been chasing some bugs when using the ant ftp task, and some problems
stem from the parsing of the result of the ftp list command.

The file structure of MVS is not a hierarchical filesystem, but it has a
directory concept known as partition organized datasets. These can be compared
to a single level directory.

The current parser does not take these into account.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
henrik added a comment - 27/Mar/06 05:16 AM
Created an attachment (id=17989)
MVSFTPEntryParser.java replacement

for review, note there is a lot of debug info, and further note, I am not
really a java programmer but are willing to learn, so just shoot on whatever
is moving ...


henrik added a comment - 27/Mar/06 01:04 PM
added [net] to summary, and keyword PatchAvailable

Steve Cohen added a comment - 28/Mar/06 10:22 AM
You are quite correct. I remember thinking the same thing when this code was
submitted. However, it filled the needs of the original author and was better
than nothing.

None of the committers has access to an MVS system or any knowledge of it, as
you apparently do. And so, the bottom line is, we humbly request you, in the
spirit of Open Source, to contribute a fully implemented parser, since you have
the expertise. We'll be happy to commit such a contribution if you submit it,
and help you get it done if you aren't familiar with the process.


henrik added a comment - 29/Mar/06 04:53 AM
thanks for taking this into consideration.
I am more than willing to get this committed, so what is the process ?

Must the copyright be assigned to Apache.org, and is so what are the paper work
needed for this ?

For the code ...
Since I do not use the Regex for the file name parsing, which xxxxFtpParserImpl
should be extended ?


Steve Cohen added a comment - 31/Mar/06 11:48 AM
To answer your question, I guess your choices are

1. Redo your code to use Regular Expressions and inherit from
ConfigurableFTPFileEntryParserImpl which would be my recommendation.

2. If really don't want to use regex or it's impossible to write the parser
this way (i can't tell from your patch), then you should probably inherit from
FTPFileEntryParserImpl

As far as copyright goes, just start with what's in MVSFTPEntryParser.java. You
can change every line of code after that if you like.

However, it looks like you still have some work to do. Your code, I see, still
has TODOs in it. If you're going to change the parser implementation, let's get
as complete an implementation as you can.

Out of curiosity, would you mind also explaining for our benefit what these ZOS
and PDS things are? Comments might be a good place to do that. This OS is much
different from many of the others we handle.

Finally, don't forget the JUnit tests. Please do not submit your patch until
all existing tests pass. And please add more tests that exercise the new
functionalities you are adding.


henrik added a comment - 06/Apr/06 02:36 AM
this is my current comments in my working version of MVSFTPEntryParser.java.

try to read it and let know if it is understandable.

/*

  • Very brief and imcomplete description of the zOS/MVS-filesystem.
  • (Note: "zOS" is the operating system on the mainframe,
  • and is the new name for MVS)
  • The filesystem on the mainframe does not have hierarchal structure as
  • for example the unix filesystem.
  • For a more comprehensive description, please refer to the IBM manuals
  • @LINK:www.ibm....
  • Dataset names
  • =============
  • A dataset name consist of a number of qualifiers separated by '.',
  • each qualifier can be at most 8 characters, and the total length
  • of a dataset can be max 44 characters including the dots.
  • Dataset organisation
  • ====================
  • A dataset represents a piece of storage allocated on one or more disks.
  • The structure of the storage is described with the field dataset
    organinsation (DSORG).
  • There are a number of dataset organisations, but only two are usable for FTP
    transfer.
  • DSORG:
  • PS: sequential, or flat file
  • PO: partitioned dataset
  • PO-E: extended partitioned dataset
  • The PS file is just a flat file, as you would find it on the unix
  • file system.
  • The PO and PO-E files, can be compared to a single level directory
    structure.
  • A PO file consist of a number of dataset members, or files if you
  • will. It is possible to CD into the file, and to retrieve the
  • individual members.
  • Dataset record format
  • =====================
  • The physical layout of the dataset is described on the dataset itself.
  • There are a number of record formats (RECFM), but just a few is relavant for
  • the FTP transfer.
  • Any one beginning with either F or V can safely used by FTP transfer.
  • All others should only be used with great care.
  • F means a fixed number of records per allocated storage, and V means a
    variable
  • number of records.
  • Other notes
  • ===========
  • The file system supports automatically backup and retrieval of datasets. If
    a
  • file is backed up, the ftp LIST command will return:
  • ARCIVE Not Direct Access Device KJ.IOP998.ERROR.PL.UNITTEST
  • Implementation notes
  • ====================
  • Only datasets that have dsorg PS, PO or PO-E and have recfm
  • beginning with F or V, is fully parsed.
  • The following fields in FTPFile is used:
  • FTPFile.Rawlisting: Always set.
  • FTPFile.Type: DIRECTORY_TYPE or FILE_TYPE or UNKNOWN or null
  • FTPFile.Name: name or null
  • FTPFile.Timestamp: create time or null
  • */

/* Format of ZOS/MVS file list:

  • 0 1 2 3 4 5 6 7 8 9
  • Volume Unit Referred Ext Used Recfm Lrecl BlkSz Dsorg Dsname
  • B10142 3390 2006/03/20 2 31 F 80 80 PS MDI.OKL.WORK
  • ARCIVE Not Direct Access Device KJ.IOP998.ERROR.PL.UNITTEST
  • B1N231 3390 2006/03/20 1 15 VB 256 27998 PO PLU
  • B1N231 3390 2006/03/20 1 15 VB 256 27998 PO-E PLB
  • */
    /* -----------------------------------

  • [0] Volume
  • [1] Unit
  • [2] Referred
  • [3] Ext: number of extents
  • [4] Used
  • [5] Recfm: Record format
  • [6] Lrecl: Logical record length
  • [7] BlkSz: Block size
  • [8] Dsorg: Dataset organisation. Many exists but only support: PS, PO, PO-E
  • [9] Dsname: Dataset name
  • Note: When volume is ARCIVE, it means the dataset is stored somewhere in
  • a tape archive. These entries is currently not supported by this
  • parser. A null value is returned.
  • */
    // dsorg last two tokens describe file:
    // 'PS': sequential file
    // 'PO': partioned dataset PDS
    // 'PO-E': PDS Library
    // ' ': unknown, probably archived.

/*

  • Format of a PDS:
  • 0 1 2 3 4 5 6 7 8
  • Name VV.MM Created Changed Size Init Mod Id
  • TBSHELF 01.03 2002/09/12 2002/10/11 09:37 11 11 0 KIL001
  • TBTOOL 01.12 2002/09/12 2004/11/26 19:54 51 28 0 KIL001
  • */
    /*

  • [0] Name
  • [1] VV.MM: Version . modification
  • [2] Created: yyyy / MM / dd
  • [3,4] Changed: yyyy / MM / dd HH:mm
  • [5] Size: number of lines
  • [6] Init: number of lines when first created
  • [7] Mod: number of modified lines a last save
  • [8] Id: User id for last update
  • */


henrik added a comment - 07/Apr/06 07:17 PM
Created an attachment (id=18038)
MVSFTPEntryParser.java replacement v2.

this is a much cleaned up version of the MVSFTP parser.
TODO:

  • make it configurable
  • figure out how to report errors

Steve Cohen added a comment - 09/Apr/06 12:16 AM
Thanks very much for your comments on this OS. They help me to finally
understand what it is we are working with here.

OK, here's what I think we'll do. I just noticed upthread your "I am not a java
programmer" which explains why you're not using the regex thing, and a few other
things I'd wondered about that a java programmer might know about but you may
not. I wouldn't expect you to write JUnit tests for example as a non-java
programmer.

Let me have a whack at what you have done. Now that I understand the code, I
can probably find the best way to fit it into our system. Then I'll send it
back to you for testing, and we'll work collaboratively. Does that sound okay
to you?


Steve Cohen added a comment - 09/Apr/06 06:44 AM
henrik - one thing I notice in your code compared with the previous - in the
"FILELIST" mode, according to the guy who contributed the previous code, the
"date" field is a "Date Last Accessed" and therefore not suitable for the
timestamp field of FTPFile which assumes "Date Modified" semantics (for example
the FTP tasks of Ant will use this for "newer" comparisons. Do you know if he
was right?

Steve Cohen added a comment - 10/Apr/06 04:13 AM
Created an attachment (id=18048)
Candidate replacement 3 - from Steve Cohen

henrik - please have a look at this file. I believe it does basically the same
thing as yours albeit in a much different way. We use
CompositeFileEntryParser, which delegates to two internal classes, one for the
"Memberlist" and one for the "FileList". Anything else is rejected. It uses
regexes, which, once you get over the hump, add clarity to the process.

I don't want to commit this until you have had a chance to test it in your
environment. Also, this breaks existing JUnit tests which, in my opinion,
weren't very robust anyway. If you understand JUnit, please have a look and
see what you can do. If you don't understand JUnit, could you please attach
lists of good and bad listings that we can work into a test?

As per earlier note, this FileList parser doesn't do anything with the date,
which I believe has "last access" semantics, not "last modified" semantics,
which is what we need for the date to be of value.

Please send me your comments, and thanks for your contributions, without which
mine would not have been possible.


Steve Cohen added a comment - 10/Apr/06 05:40 AM
Created an attachment (id=18049)
Candidate replacement 4 - from Steve Cohen

Yet another attachment. I realized I could do some testing, and the previous
patch failed. This one does better.

henrik - please note that the way you were doing it, and the way my previous
attachment was doing it violated the agreed-upon semantics of these parsers -
when an entry cannot be parsed, null must be returned, NOT an FTPFile with only
the rawListing field filled in. We had this argument a few years back and
while neither way is perfect, it was felt that this way had the fewest
problems. It is better to be consistent here, than to try to give the caller
as much information as possible, which will undoubtedly trip someone up
somewhere.


henrik added a comment - 16/Apr/06 03:08 AM
  • steve,

thanks for spring cleaning of my code. I will soon try it, and let you know
about the results.
Consider me a complete 'junit' illiterate. What do I need to do ?

Exactly how is the MvsFilelist and MvsMemberlist classes choosen ?
By trial and error, aka the first parser that does not return null ?
The problem is that there are at least one more list type beside the two
mentioned, and more might come.
The header that is removed in the preparse method is IMO the most reliable way
to detect which kind of list is coming.

In the MvsMemberlistFTPEntryParser, you have
try

{ file.setTimestamp(super.parseTimestamp(datestr)); }

catch (ParseException e)

{ return null; // this is a parsing failure too. }

if the date information cannot parse, just set the field to null. There are a
number of reasons as to why the date cannot parse. It can be blank, or the FTP
is setup to simply not to return this information.

Further, in MvsFilelistFTPEntryParser, two comments.
First the original observation regarding 'referred date' is correct.
It is a shame the referred date is returned by the ftp LIST command, since it is
pretty useless, at least for FTP, so it should be kept as null.

Second, in parseFTPEntry, don't be cheap with the if statements:
use
if ("PS".equals(dsorg)) file.setType(FTPFile.FILE_TYPE);
else if ("PO".equals(dsorg) || "PO-E".equals(dsorg))
file.setType(FTPFile.DIRECTORY_TYPE);
else return null;

it is a matter of style of course, but I doubt this extra if statement costs
anything.

Lastly.
Detection of zOS server
-----------------------
Strictly speaking to the check when to use the MVS parser at all, it is not
enough to check the result of the SYST ftp command.
Even when the SYST ftp command returns
'MVS is the operating system of this server. FTP Server is running on z/OS.'
the server supports two dataset modes.
The classical unix style and the mentioned zOS filesystem.
The correct way to check which is active, is to check the first character of the
PWD ftp command. If it is "/" then use a standard unix parser, and if "'",
(single quote) then use the MVS parser.
I am clueless as to where to put this information, but it could probably be done
with the getDefaultConfiguration method.
Btw, I would prefer to change the name MVS to new name zOS, but this is probably
a larger change. Maybe keep the existing parser using the MVS name and mark it
deprecated, and create a new class using the ZOS name?

The null issue
--------------
If the convention is to return null for unparsed entries then so be it. But how
the caller can use this null entry is a mystery to me. At least with the
rawlisting filled, the caller have some chance of at least providing some hints
to the expected null-pointer exceptions.


Steve Cohen added a comment - 18/Apr/06 10:00 AM
(In reply to comment #12)

My comments are inline.

> - steve,
>
> thanks for spring cleaning of my code. I will soon try it, and let you know
> about the results.
> Consider me a complete 'junit' illiterate. What do I need to do ?
>
> Exactly how is the MvsFilelist and MvsMemberlist classes choosen ?
> By trial and error, aka the first parser that does not return null ?

Yes, that is correct.

> The problem is that there are at least one more list type beside the two
> mentioned, and more might come.

Can we get parsers for those? Is that what you mean by "might come"?

> The header that is removed in the preparse method is IMO the most reliable way
> to detect which kind of list is coming.

You may be right. I thought about that and thought that this way was most
faithful to the "Composite" parser concept, but I haven't really made a final
decision on that yet.

>
> In the MvsMemberlistFTPEntryParser, you have
> try
> { > file.setTimestamp(super.parseTimestamp(datestr)); > }
> catch (ParseException e)
> { > return null; // this is a parsing failure too. > }
> if the date information cannot parse, just set the field to null. There are a
> number of reasons as to why the date cannot parse. It can be blank, or the FTP
> is setup to simply not to return this information.

Why would the date SOMETIMES be blank. A system like that violates the usual
assumptions that these systems behave in a deterministic way. (I can understand
if it's a switch that is sometimes turned off. In your experience, is that very
common?)

>
> Further, in MvsFilelistFTPEntryParser, two comments.
> First the original observation regarding 'referred date' is correct.
> It is a shame the referred date is returned by the ftp LIST command, since it is
> pretty useless, at least for FTP, so it should be kept as null.
>
> Second, in parseFTPEntry, don't be cheap with the if statements:
> use
> if ("PS".equals(dsorg)) file.setType(FTPFile.FILE_TYPE);
> else if ("PO".equals(dsorg) || "PO-E".equals(dsorg))
> file.setType(FTPFile.DIRECTORY_TYPE);
> else return null;
>
> it is a matter of style of course, but I doubt this extra if statement costs
> anything.

You're right.

>
> Lastly.
> Detection of zOS server
> -----------------------
> Strictly speaking to the check when to use the MVS parser at all, it is not
> enough to check the result of the SYST ftp command.
> Even when the SYST ftp command returns
> 'MVS is the operating system of this server. FTP Server is running on z/OS.'
> the server supports two dataset modes.
> The classical unix style and the mentioned zOS filesystem.
> The correct way to check which is active, is to check the first character of the
> PWD ftp command. If it is "/" then use a standard unix parser, and if "'",
> (single quote) then use the MVS parser.

This sounds like useful information. We could extend the composite pattern and
put a UNIX parser into the list. This is more like the original uses of the
Composite parser, in which Windows fTP servers are sometimes set up to list in
unix format.

> I am clueless as to where to put this information, but it could probably be done
> with the getDefaultConfiguration method.
> Btw, I would prefer to change the name MVS to new name zOS, but this is probably
> a larger change. Maybe keep the existing parser using the MVS name and mark it
> deprecated, and create a new class using the ZOS name?

That is a possibility.
>
> The null issue
> --------------
> If the convention is to return null for unparsed entries then so be it. But how
> the caller can use this null entry is a mystery to me. At least with the
> rawlisting filled, the caller have some chance of at least providing some hints
> to the expected null-pointer exceptions.
>


henrik added a comment - 22/Apr/06 01:34 AM
Created an attachment (id=18148)
Candidate replacement 5 - from Henrik Sorensen

Using the regex from the candidate repl 4, I have a version that now works, and
which also supports the unix file as well.

I choose to use preParse to detect which type of list is comming.

The problem with candiate 4, was the two REGEX is actually not mutually
exclusive.

I had to add a new method to RegexFTPEntryParserImpl:
public boolean changeRegex(String regex)
{
try

{ _matcher_ = new Perl5Matcher(); pattern = new Perl5Compiler().compile(regex); }

catch (MalformedPatternException e)

{ throw new IllegalArgumentException ( "Unparseable regex supplied: " + regex); }

return pattern!=null;
}

and invoke the method from the constructor:
public RegexFTPFileEntryParserImpl(String regex)

{ super(); changeRegex(regex); }

Looking forward to your comments.


henrik added a comment - 22/Apr/06 01:49 AM
(In reply to comment #13)
> (In reply to comment #12)
>
> My comments are inline.
mine too
> > Exactly how is the MvsFilelist and MvsMemberlist classes choosen ?
> > By trial and error, aka the first parser that does not return null ?
>
> Yes, that is correct.
but I cannot see how this can possible work.
If for example the first entry on the file list is one of the unsupported types,
the MvsFilelist parser returns null, and the MvsMemberlist is invoked?!?
I have uploaded a candidate replacement 5, that uses preparse to detect which
list is coming.
>
> > The problem is that there are at least one more list type beside the two
> > mentioned, and more might come.
>
> Can we get parsers for those?
well, the datasets are not relevant for FTP transfer, so I think not.

>Is that what you mean by "might come"?
well IBM is still developing the mainframe, and a new version is just around the
corner
> >
> > In the MvsMemberlistFTPEntryParser, you have
...
> > number of reasons as to why the date cannot parse. It can be blank, or the FTP
> > is setup to simply not to return this information.
>
> Why would the date SOMETIMES be blank.
sometimes blank for the same member means always blank for this member.
Why it is like this is a longer story, but the short version is that it is
possible to created the member without any other information than the member name.

> A system like that violates the usual
> assumptions that these systems behave in a deterministic way. (I can understand
> if it's a switch that is sometimes turned off. In your experience, is that very
> common?)
not very common, but still possible.

> > The null issue
> > --------------
> > If the convention is to return null for unparsed entries then so be it. But how
> > the caller can use this null entry is a mystery to me.
you didn't comment on this.

In the ant FTP.java in package org.apache.tools.ant.taskdefs.optional.net, does
not handle the null entries at all. I have a fix working by me, that I will try
to push to the ant guys.


henrik added a comment - 27/Apr/06 04:02 AM
It seems these two bugs will be solved with the suggested replacement as well

ASF Bugzilla COM-2341
ASF Bugzilla COM-2338

except the unix regex must to extended with an 'e' link type.


henrik sorensen added a comment - 28/Aug/06 07:28 PM
Latest version of the MVS Ftp parser.
This version supports parsing of the following dataset organisations:
PS: sequential files
PO: partitioned files (single level direcory)
PO-E:partitioned files (single level direcory)

and the following record formats beginning with
V: Variable length records
F: Fixed length records

The files system is described in details within the source code.

Further a parser for the JES subsystem level one has been added. (JES: Job entry Subsystem (scheduler))


henrik sorensen added a comment - 30/Aug/06 08:42 PM
With these two changes the JUNIT tests now completes.
I had to hack a bit in the FTPParseTestFramework.java, to make it work with multiple good samples testsets.

henrik sorensen added a comment - 01/Sep/06 11:33 AM
The MVSFTPEntryParser now supports JES Interface Level 1 and 2.
The JUNIT test cases have been updated accordingly.

Have fun
Henrik


Rory Winston added a comment - 06/Dec/06 09:45 AM
Fixed in 2.0