Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.0.0-alpha-2
-
None
Description
This starter component for the pipeline is a component that transform an HTML content, taken by the specified URL, and transform it in XHTML or, at least, a well-formed XML document.
So now the original document can be processed in the pipeline in various ways:
* following links;
* implementing crwalers;
* easy transforming the original document in other various formats;
* etc...
I want to explain the need of this component with a testcase; last week I had to face a singular problem, realizing a simple service that takes in input an HTML page's URL, and transform it , through the Optimus' XSLT (http://microformatique.com/optimus - http://code.google.com/p/mf-optimus/source/browse/#svn/trunk/xsl) in an XML document that contains the original doc's Microformats, in an easier and more parsable formats.
So now the original document can be processed in the pipeline in various ways:
* following links;
* implementing crwalers;
* easy transforming the original document in other various formats;
* etc...
I want to explain the need of this component with a testcase; last week I had to face a singular problem, realizing a simple service that takes in input an HTML page's URL, and transform it , through the Optimus' XSLT (http://microformatique.com/optimus - http://code.google.com/p/mf-optimus/source/browse/#svn/trunk/xsl) in an XML document that contains the original doc's Microformats, in an easier and more parsable formats.