Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0-alpha-2
    • Fix Version/s: 1.1
    • Component/s: Sink API
    • Labels:
      None

      Description

      If the idea with extensibility and interchangeable input/output formats should be more than a nice dream, the Sink API needs a thorough specification (e.g. by means of more javadoc at Sink) because that's were everything meets. It should define

      1. what rules parsers must obey when generating events and
      2. what events a sink needs to be prepared to handle

      Currently, all of this is left to assumptions. Some example issues that need to be clarified:

      • What characters may constitute an anchor reported by anchor()? Arbitrary, ASCII-only, ...?
      • What format applies to the name parameter of link()? How are internal and external links to be distinguished (DOXIA-208)?
      • What character chunks are reported by text()? Longest consecutive sequence, line-by-line, arbitrary, ... (DOXIA-222)?
      • What exactly is a figure's source as reported by figureGraphics()? Relative/absolute path, relative to which directory? What about file extensions (DOXIA-99)?
      • What order of events is "reasonable" (DOXIA-132)? May parsers report table body and caption in a specific or arbitrary order? Must the document head always be reported before body or may it be postponed?
      • Is closing a sink twice acceptable or an error?

        Issue Links

          Activity

          Hide
          Benjamin Bentmann added a comment -

          A thought with regard to event order/nesting: What about defining a Sink Object Model in terms of a XSD? This tree structure should not be expressed by a programmatic API but merely serves as a reference for parser validation. I.e. a CanonicalSink would output the XML document

          <sink>
            <head>
              <title>
                <text>foo</text>
               </title>
             </head>
          </sink>
          

          for the event sequence head(), title(), text("foo"), _title(), _head(). This tree could then be passed through a validating XML parser (conveniently wrapped in a ValidatingSink, superceding the WellformednessCheckingSink) to check that the parser obeys the ordering/nesting rules defined by the XSD. Of course, the XSD would also serve the purpose of documenting the intended usage of the Sink API to implementors.

          Show
          Benjamin Bentmann added a comment - A thought with regard to event order/nesting: What about defining a Sink Object Model in terms of a XSD? This tree structure should not be expressed by a programmatic API but merely serves as a reference for parser validation. I.e. a CanonicalSink would output the XML document <sink> <head> <title> <text> foo </text> </title> </head> </sink> for the event sequence head() , title() , text("foo") , _title() , _head() . This tree could then be passed through a validating XML parser (conveniently wrapped in a ValidatingSink , superceding the WellformednessCheckingSink ) to check that the parser obeys the ordering/nesting rules defined by the XSD. Of course, the XSD would also serve the purpose of documenting the intended usage of the Sink API to implementors.
          Hide
          Lukas Theussl added a comment -

          This sounds like a good idea. I'll think about it and put a proposal into confluence.

          Show
          Lukas Theussl added a comment - This sounds like a good idea. I'll think about it and put a proposal into confluence.
          Hide
          Lukas Theussl added a comment -

          Some preliminary comments to the points you raised above, just to open the discussion:

          What characters may constitute an anchor reported by anchor()?

          An anchor in HTML should be a valid HTML ID token, so IMO we could apply the same rules to a general sink. See the javadoc for HtmlTools.encodeId().

          What format applies to the name parameter of link()?

          As I said at DOXIA-208, internal links should start with "#", again a la html.

          What character chunks are reported by text()?

          Longest consecutive sequence IMO. No pretty printing, no modification whatsoever.

          What exactly is a figure's source as reported by figureGraphics()?

          A figure source is just a link, so it can be relative to the current document or absolute. The figure extension is now required as noted at DOXIA-99.

          What order of events is "reasonable" (DOXIA-132)?

          We should define a 'canonical' order of events that should be followed by all parsers. IMO the order emitted by SinkTestDocument could serve as a definition. What is 'reasonable' is of course subjective, eg IMO a figureCaption can come before or after figureGraphics, but a definedTerm in a definitionList should come before the definition.

          Is closing a sink twice acceptable or an error?

          Can you elaborate why this is relevant? IMO closing a sink a second time should just do nothing, as it basically just closes the underlying Writer.

          Show
          Lukas Theussl added a comment - Some preliminary comments to the points you raised above, just to open the discussion: What characters may constitute an anchor reported by anchor()? An anchor in HTML should be a valid HTML ID token, so IMO we could apply the same rules to a general sink. See the javadoc for HtmlTools.encodeId(). What format applies to the name parameter of link()? As I said at DOXIA-208 , internal links should start with "#", again a la html. What character chunks are reported by text()? Longest consecutive sequence IMO. No pretty printing, no modification whatsoever. What exactly is a figure's source as reported by figureGraphics()? A figure source is just a link, so it can be relative to the current document or absolute. The figure extension is now required as noted at DOXIA-99 . What order of events is "reasonable" ( DOXIA-132 )? We should define a 'canonical' order of events that should be followed by all parsers. IMO the order emitted by SinkTestDocument could serve as a definition. What is 'reasonable' is of course subjective, eg IMO a figureCaption can come before or after figureGraphics, but a definedTerm in a definitionList should come before the definition. Is closing a sink twice acceptable or an error? Can you elaborate why this is relevant? IMO closing a sink a second time should just do nothing, as it basically just closes the underlying Writer.
          Hide
          Benjamin Bentmann added a comment -

          An anchor in HTML should be a valid HTML ID token, so IMO we could apply the same rules to a general sink.

          Given HTML's popularity I generally agree in adopting its rules for the Sink API to keep the learning curve low.

          See the javadoc for HtmlTools.encodeId().

          My question is just which side of the game (parser - sink) is responsible for doing this conversion? For instance, looking at AptParser and XhtmlBaseSink, we currently seem to allow parsers to call Sink.anchor() with an arbitrary string and require the sinks to normalize this string according to their needs/restrictions. I am fine with this approach, all I wanted are one or two lines in the javadoc of Sink.anchor() that explicitly express this freedom for parsers such that sink implementors are aware of their job to handle arbitrary inputs and don't erroneously assume something like a HTML ID token coming in.

          Side note:
          The javadoc for the method HtmlTools.encodeId() mentions the pattern [A-Za-z][A-Za-z0-9:_.-]* for its output. To me, this looks like the term "letter" in meant to refer to ASCII characters in this context. However, the employed method Character.isLetter() will classify characters according to the Unicode data file. For instance, the characters "ä" and "ß" are letters in the Unicode sense. encodeId() will pass these through to its output, violating the ASCII-only pattern stated in its javadoc.

          internal links should start with "#", again a la html.

          In other words:

          • AptParser must convert "foo.pdf" into "#foo.pdf" (assuming its current link interpretation) before passing the link to the sink
          • XhtmlBaseSink must neither check for protocols nor for "./" nor for ".html#" in the link string but only for a leading charp
            Right?

          No pretty printing, no modification whatsoever.

          Certain input formats (e.g. XML-based) allow for ignorable/collapsable whitespace. Removing/normalizing this should be the responsibility of the parser such that the sink only sees real content. I.e. pretty printing from the input document should be removed by the parser.

          Also worth to clarify: Line terminators. Require sinks to handle all possible variants or require parsers to normalize these to "\n" as done in the XML spec?

          IMO a figureCaption can come before or after figureGraphics

          That's right, the same for the table caption. Again, if this freedom is allowed for parsers, it should be documentated somewhere to let sink implementors know that they must be prepared to handle any possible sequence.

          Can you elaborate why this is relevant?

          Because an API is a contract and there should be no guessing, no assumptions, no grey zones, the responsibilities of all parties should be clear. Imagine somebody wanted to implement a sink. When it comes to the close() method, may he throw an IllegalStateException on the second call? The API does neither allow nor prohibit this, so there is space for guessing, space for different implementations and space for breaking the interchangebility on sinks. I never wanted to come up with a new Doxia, all I am seeking for is a specification that allows people to clearly identify which component (parser/sink) is misbehaving in case of a wrong output document.

          IMO closing a sink a second time should just do nothing, as it basically just closes the underlying Writer.

          +1, but again, this should simply be explicitly written down for everybody as a reference.

          Show
          Benjamin Bentmann added a comment - An anchor in HTML should be a valid HTML ID token, so IMO we could apply the same rules to a general sink. Given HTML's popularity I generally agree in adopting its rules for the Sink API to keep the learning curve low. See the javadoc for HtmlTools.encodeId(). My question is just which side of the game (parser - sink) is responsible for doing this conversion? For instance, looking at AptParser and XhtmlBaseSink , we currently seem to allow parsers to call Sink.anchor() with an arbitrary string and require the sinks to normalize this string according to their needs/restrictions. I am fine with this approach, all I wanted are one or two lines in the javadoc of Sink.anchor() that explicitly express this freedom for parsers such that sink implementors are aware of their job to handle arbitrary inputs and don't erroneously assume something like a HTML ID token coming in. Side note: The javadoc for the method HtmlTools.encodeId() mentions the pattern [A-Za-z] [A-Za-z0-9:_.-] * for its output. To me, this looks like the term "letter" in meant to refer to ASCII characters in this context. However, the employed method Character.isLetter() will classify characters according to the Unicode data file. For instance, the characters "ä" and "ß" are letters in the Unicode sense. encodeId() will pass these through to its output, violating the ASCII-only pattern stated in its javadoc. internal links should start with "#", again a la html. In other words: AptParser must convert "foo.pdf" into "#foo.pdf" (assuming its current link interpretation) before passing the link to the sink XhtmlBaseSink must neither check for protocols nor for "./" nor for ".html#" in the link string but only for a leading charp Right? No pretty printing, no modification whatsoever. Certain input formats (e.g. XML-based) allow for ignorable/collapsable whitespace. Removing/normalizing this should be the responsibility of the parser such that the sink only sees real content. I.e. pretty printing from the input document should be removed by the parser. Also worth to clarify: Line terminators. Require sinks to handle all possible variants or require parsers to normalize these to "\n" as done in the XML spec? IMO a figureCaption can come before or after figureGraphics That's right, the same for the table caption. Again, if this freedom is allowed for parsers, it should be documentated somewhere to let sink implementors know that they must be prepared to handle any possible sequence. Can you elaborate why this is relevant? Because an API is a contract and there should be no guessing, no assumptions, no grey zones, the responsibilities of all parties should be clear. Imagine somebody wanted to implement a sink. When it comes to the close() method, may he throw an IllegalStateException on the second call? The API does neither allow nor prohibit this, so there is space for guessing, space for different implementations and space for breaking the interchangebility on sinks. I never wanted to come up with a new Doxia, all I am seeking for is a specification that allows people to clearly identify which component (parser/sink) is misbehaving in case of a wrong output document. IMO closing a sink a second time should just do nothing, as it basically just closes the underlying Writer. +1, but again, this should simply be explicitly written down for everybody as a reference.
          Hide
          Lukas Theussl added a comment -

          AptParser must convert "foo.pdf" into "#foo.pdf"

          Yes. The current behavior of the apt parser is the historic origin of the mess we have right now...

          XhtmlBaseSink must neither check for protocols nor for "./" nor for ".html#"

          Right. If the parser already emits a valid html link, the sink doesn't need to check anymore.

          Also worth to clarify: Line terminators.

          See DOXIA-59

          I am working on updated javadocs to the Sink interface which I will commit as soon as svn is working again...

          Show
          Lukas Theussl added a comment - AptParser must convert "foo.pdf" into "#foo.pdf" Yes. The current behavior of the apt parser is the historic origin of the mess we have right now... XhtmlBaseSink must neither check for protocols nor for "./" nor for ".html#" Right. If the parser already emits a valid html link, the sink doesn't need to check anymore. Also worth to clarify: Line terminators. See DOXIA-59 I am working on updated javadocs to the Sink interface which I will commit as soon as svn is working again...
          Hide
          Benjamin Bentmann added a comment -

          See DOXIA-59

          Alright, we just need to have specs on both sides: What terminators may a parser report to a sink and what line terminator should a sink issue to the output document?

          I am working on updated javadocs to the Sink interface

          Thanks Lukas, I really appreciate your efforts on this. I can't fight the impression that some Maven developers don't consider the importance of well-defined/-documented APIs and so it's a gladness to see that there are exceptions!

          Show
          Benjamin Bentmann added a comment - See DOXIA-59 Alright, we just need to have specs on both sides: What terminators may a parser report to a sink and what line terminator should a sink issue to the output document? I am working on updated javadocs to the Sink interface Thanks Lukas, I really appreciate your efforts on this. I can't fight the impression that some Maven developers don't consider the importance of well-defined/-documented APIs and so it's a gladness to see that there are exceptions!
          Hide
          Lukas Theussl added a comment -

          Committed in r652137. Please review...

          Show
          Lukas Theussl added a comment - Committed in r652137. Please review...
          Hide
          Benjamin Bentmann added a comment -

          Wow, great job! Only a few possible additions:

          • A document can have more than one author, can't it? If so, the docs for author() should read "Start an author element. ... to identify an author ...", i.e. express possiblity of multiple elements.
          • Regarding section titles:
            • Must they be the first child element of the section?
            • May they be omitted?
            • Will they automatically/implicitly define an anchor (I guess not, it should be left to parsers, but that might be good to document)?
          • As for section(int, SinkAttributes]: People used to RFCs will recognize the term "should" as a recommendation, not as a requirement. Is it really acceptable to nest a lower-level section in a higher-level one?
          Show
          Benjamin Bentmann added a comment - Wow, great job! Only a few possible additions: A document can have more than one author, can't it? If so, the docs for author() should read "Start an author element. ... to identify an author ...", i.e. express possiblity of multiple elements. Regarding section titles: Must they be the first child element of the section? May they be omitted? Will they automatically/implicitly define an anchor (I guess not, it should be left to parsers, but that might be good to document)? As for section(int, SinkAttributes] : People used to RFCs will recognize the term "should" as a recommendation, not as a requirement. Is it really acceptable to nest a lower-level section in a higher-level one?
          Hide
          Lukas Theussl added a comment -

          I committed some further clarifications. I was actually under the impression that the author element had to be unique, but looking at some example code it seems easy to generalize, so I modified the docs. Section titles may be omitted, if they exist they must be the first child elements , and they should (sic!) never be implicit anchors. I have reviewed the use of "should" everywhere, please check if it is clearer.

          Waiting for the next round of review...

          Show
          Lukas Theussl added a comment - I committed some further clarifications. I was actually under the impression that the author element had to be unique, but looking at some example code it seems easy to generalize, so I modified the docs. Section titles may be omitted, if they exist they must be the first child elements , and they should (sic!) never be implicit anchors. I have reviewed the use of "should" everywhere, please check if it is clearer. Waiting for the next round of review...
          Hide
          Benjamin Bentmann added a comment -

          Fine that you pointed out the optional nulls for the sink attributes, too! Just one concern:

          Section titles [...] should (sic!) never be implicit anchors.

          IMHO, this should have been the opposite of the current javadocs, i.e. "must not" add anchors on the sink side, "may" do so on the parser side.

          Rationale:
          I believe it's a valuable design choice if the syntax and semantics of an output document are completely separated:

          • a parser defines the semantics (text, links, formatting) by means of sink events, i.e. it defines what elements constitute the document
          • a sink defines the syntax for a particular output format, i.e. it defines how the document will be encoded

          If a parser did not emit an anchor within a section title, I don't see any argument why a sink should be allowed to add one. This would only lead to inconsistent behavior of the output documents: one document might have a link and another one might have not, surprise suprise. The parser is (usually) processing an input file from the user, so if the parser didn't get the user's intention to output an anchor, why should the sink think differently out of a sudden?

          Parser: Dear sink, my master wants to output a section title.
          Sink: Dear parser, I don't know your master but I know he meant to output an anchor, too.
          

          Some people would call this sink a fortune teller

          Regarding the parser: If an input format defines (say as a matter of convenience) that something like

          * Section
          

          defines an implicit anchor and is equivalent to

          * {Section}
          

          a parser simply needs to be allowed to issue both a section title and an (implicit) anchor in case of the first input snippet. I mean, restricting the parsers is equivalent to restricting the input formats which seems plain wrong if Doxia wants to be open-minded.

          I'm not too familar with Doxia so my understanding might be wrong. If you feel there's more to discuss, we can simply switch over to doxia-dev where Jason and Vincent can hopefully clarify things, too.

          Show
          Benjamin Bentmann added a comment - Fine that you pointed out the optional nulls for the sink attributes, too! Just one concern: Section titles [...] should (sic!) never be implicit anchors. IMHO, this should have been the opposite of the current javadocs, i.e. "must not" add anchors on the sink side, "may" do so on the parser side. Rationale: I believe it's a valuable design choice if the syntax and semantics of an output document are completely separated: a parser defines the semantics (text, links, formatting) by means of sink events, i.e. it defines what elements constitute the document a sink defines the syntax for a particular output format, i.e. it defines how the document will be encoded If a parser did not emit an anchor within a section title, I don't see any argument why a sink should be allowed to add one. This would only lead to inconsistent behavior of the output documents: one document might have a link and another one might have not, surprise suprise. The parser is (usually) processing an input file from the user, so if the parser didn't get the user's intention to output an anchor, why should the sink think differently out of a sudden? Parser: Dear sink, my master wants to output a section title. Sink: Dear parser, I don't know your master but I know he meant to output an anchor, too. Some people would call this sink a fortune teller Regarding the parser: If an input format defines (say as a matter of convenience) that something like * Section defines an implicit anchor and is equivalent to * {Section} a parser simply needs to be allowed to issue both a section title and an (implicit) anchor in case of the first input snippet. I mean, restricting the parsers is equivalent to restricting the input formats which seems plain wrong if Doxia wants to be open-minded. I'm not too familar with Doxia so my understanding might be wrong. If you feel there's more to discuss, we can simply switch over to doxia-dev where Jason and Vincent can hopefully clarify things, too.
          Hide
          Lukas Theussl added a comment -

          Thanks for the graphic illustration

          However, I most definitely disagree with your conclusion. Curiously, I had to defend my point several times already, so let me just direct you to some issues: DOXIA-152, DOXIA-138 (lower part of the discussion). In short: a parser doesn't know yet where it's output will go, some feature that might be acceptable for one Sink may lead to errors in others. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Otherwise you would not be able to produce eg a pdf and a html from the same set of source documents.

          restricting the parsers is equivalent to restricting the input format

          I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. We have modified the original apt format (as documentet in the doxia-apt.apt document on the doxia site) for these reasons.

          Show
          Lukas Theussl added a comment - Thanks for the graphic illustration However, I most definitely disagree with your conclusion. Curiously, I had to defend my point several times already, so let me just direct you to some issues: DOXIA-152 , DOXIA-138 (lower part of the discussion). In short: a parser doesn't know yet where it's output will go, some feature that might be acceptable for one Sink may lead to errors in others. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Otherwise you would not be able to produce eg a pdf and a html from the same set of source documents. restricting the parsers is equivalent to restricting the input format I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. We have modified the original apt format (as documentet in the doxia-apt.apt document on the doxia site) for these reasons.
          Hide
          Benjamin Bentmann added a comment -

          Thanks for the graphic illustration

          Sometimes I just can't resist my brain dumps, sorry

          a parser doesn't know yet where it's output will go,

          Yep, exactly my motivation for this issue: Since a parser can't and shouldn't know the various sinks, he must at least know the contract of their common interface that every sink obeys. If you can't setup such a common denominator among the sinks, it's all lost with interchangable output formats.

          some feature that might be acceptable for one Sink may lead to errors in others

          Of course the output formats created by sinks will have different requirements/restrictions, but every sink should
          a) either fully support an event that is defined as part of the Sink API
          b) or at least gracefully ignore an event it can't handle
          such that users get a (best-effort) output regardless of the selected sink. It is the responsibility of the sink implementor to shield parsers from the details of its realized output format. IMHO, a sink should never ever fail with an exception if the input event is valid according to the Sink API.

          Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format.

          Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink, regardless whether the event is driven by explicit user input or by implicit convention. It is the sink's responsibility to handle this defined event, whether it support anchors or not.

          Regarding the issue of unique anchor names: This is merely another aspect that needs to be added to the javadoc of the Sink API. If you define that anchor names must be unique within a document then

          1. a conforming parser is responsible for providing this uniqueness
          2. a sink has all right to fail if a non-conforming parser outputs two anchor events with the same name

          I consider it a fundamental design flaw if an input format defines implicit anchors for section titles.

          I am fine with your arguments against implicit anchors. However, then I still don't understand why sinks are allowed to output implicit anchors for sections. If we consider such anchors as problematic, nobody should be allowed to create them. An implicit anchor is an implicit anchor, regardless whether the parser of the sink created it, isn't it?

          For example, if we consider the SiteRenderingSite to be one of those specialized sinks that may output implicit anchors to the XHTML pages, people could start using these auto-links to cross-reference to those sections from external documents (of the same site). Now this a dangerous because as soon as the users wants to output his nicely linked HTML website into a PDF book, he will find all the auto-links not working anymore because the PdfSink doesn't create implicit anchors like the SiteRenderingSink.

          We have modified the original apt format

          From SVN logs I see this was created after the last deployment of the doxia site (2007-11-06). If it doesn't cause any harm to the overall site, it would be cool to have this doc online. For example, the APT Reference still reads "Section titles are implicitly defined anchors." which does not apply to the version of Doxia used by the Site Plugin, IIRC.

          Show
          Benjamin Bentmann added a comment - Thanks for the graphic illustration Sometimes I just can't resist my brain dumps, sorry a parser doesn't know yet where it's output will go, Yep, exactly my motivation for this issue: Since a parser can't and shouldn't know the various sinks, he must at least know the contract of their common interface that every sink obeys. If you can't setup such a common denominator among the sinks, it's all lost with interchangable output formats. some feature that might be acceptable for one Sink may lead to errors in others Of course the output formats created by sinks will have different requirements/restrictions, but every sink should a) either fully support an event that is defined as part of the Sink API b) or at least gracefully ignore an event it can't handle such that users get a (best-effort) output regardless of the selected sink. It is the responsibility of the sink implementor to shield parsers from the details of its realized output format. IMHO, a sink should never ever fail with an exception if the input event is valid according to the Sink API. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink, regardless whether the event is driven by explicit user input or by implicit convention. It is the sink's responsibility to handle this defined event, whether it support anchors or not. Regarding the issue of unique anchor names: This is merely another aspect that needs to be added to the javadoc of the Sink API. If you define that anchor names must be unique within a document then a conforming parser is responsible for providing this uniqueness a sink has all right to fail if a non-conforming parser outputs two anchor events with the same name I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. I am fine with your arguments against implicit anchors. However, then I still don't understand why sinks are allowed to output implicit anchors for sections. If we consider such anchors as problematic, nobody should be allowed to create them. An implicit anchor is an implicit anchor, regardless whether the parser of the sink created it, isn't it? For example, if we consider the SiteRenderingSite to be one of those specialized sinks that may output implicit anchors to the XHTML pages, people could start using these auto-links to cross-reference to those sections from external documents (of the same site). Now this a dangerous because as soon as the users wants to output his nicely linked HTML website into a PDF book, he will find all the auto-links not working anymore because the PdfSink doesn't create implicit anchors like the SiteRenderingSink. We have modified the original apt format From SVN logs I see this was created after the last deployment of the doxia site (2007-11-06). If it doesn't cause any harm to the overall site, it would be cool to have this doc online. For example, the APT Reference still reads "Section titles are implicitly defined anchors." which does not apply to the version of Doxia used by the Site Plugin, IIRC.
          Hide
          Lukas Theussl added a comment -

          Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink

          Not if there is no anchor in the parsed source document. Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient.

          regardless whether the event is driven by explicit user input or by implicit convention.

          I disagree on the latter. A doxia parser is a translator, not an interpreter, if you want anchors for your section titles, provide them explicitly.

          I still don't understand why sinks are allowed to output implicit anchors

          Because there is no hard reason why they shouldn't. While there is such a reason to forbid it for parsers (because they don't know the output format), I don't see why it should in principle be forbidden for sinks. My personal opinion is that implicit anchors should never be generated neither by parser nor sink, and I think I made that clear in the javadocs, but after all, automatically generated anchors are still a useful and widely used feature for one single output format (html).

          If we consider such anchors as problematic, nobody should be allowed to create them

          The problem is not the existence of the implicit anchor, but its translation into different output formats. If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. Of course you will be in trouble the day you want to create a pdf from your docs. You will either have to adjust your input documents, or use an adapted pdf sink as well. So you could have adapted your input docs in the first place...

          it would be cool to have this doc online

          The docs are for doxia-beta-1 which is not released yet, so we can't publish them.

          Show
          Lukas Theussl added a comment - Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink Not if there is no anchor in the parsed source document. Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient. regardless whether the event is driven by explicit user input or by implicit convention. I disagree on the latter. A doxia parser is a translator, not an interpreter, if you want anchors for your section titles, provide them explicitly. I still don't understand why sinks are allowed to output implicit anchors Because there is no hard reason why they shouldn't. While there is such a reason to forbid it for parsers (because they don't know the output format), I don't see why it should in principle be forbidden for sinks. My personal opinion is that implicit anchors should never be generated neither by parser nor sink, and I think I made that clear in the javadocs, but after all, automatically generated anchors are still a useful and widely used feature for one single output format (html). If we consider such anchors as problematic, nobody should be allowed to create them The problem is not the existence of the implicit anchor, but its translation into different output formats. If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. Of course you will be in trouble the day you want to create a pdf from your docs. You will either have to adjust your input documents, or use an adapted pdf sink as well. So you could have adapted your input docs in the first place... it would be cool to have this doc online The docs are for doxia-beta-1 which is not released yet, so we can't publish them.
          Hide
          Benjamin Bentmann added a comment -

          Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient.

          Yes of course, a parser should not emit events at random. What I did not clearly express is my understanding that a parser adopts a certain input format for usage with Doxia, just like a sink realizes some output format. Now if the format (which is in general external and unrelated to Doxia) specifies that a single syntactical construct like a section title is to be interpreted as a title with an implicit anchor, a parser which wants to feed this format into Doxia now simply can't follow the format specification because sending the anchor event is prohibited, i.e. informataion from the input document is lost. That's the only thing that puzzled me a little, wondering if it's really necessary/desired. I'm fine if Doxia says "you ugly input format, don't use implicit anchors", it's just some kind of pushing best practices, I can fairly well understand that

          I don't see why it should in principle be forbidden for sinks.

          Alright, as long as the implicit anchors generated by such a sink do not interfere with the explicit anchors defined by the user (e.g. name clash).

          If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you.

          If you are only interested in a html web siteAPT sources for your project, I see no reason why you shouldn't be allowed to write a sinkparser that automatically generates those anchors for you.

          Just for the fun of the words, it wasn't meant seriously

          so we can't publish them.

          I see, at least I know where to look for them.

          To come to an end, I might not fully understand all your arguments but that's mostly because I'm not familiar enough with Doxia's architecture. If I look back to where this issue started, I can only repeat you did a good job and feel this issue is ready for being closed, thanks Lukas!

          Show
          Benjamin Bentmann added a comment - Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient. Yes of course, a parser should not emit events at random. What I did not clearly express is my understanding that a parser adopts a certain input format for usage with Doxia, just like a sink realizes some output format. Now if the format (which is in general external and unrelated to Doxia) specifies that a single syntactical construct like a section title is to be interpreted as a title with an implicit anchor, a parser which wants to feed this format into Doxia now simply can't follow the format specification because sending the anchor event is prohibited, i.e. informataion from the input document is lost. That's the only thing that puzzled me a little, wondering if it's really necessary/desired. I'm fine if Doxia says "you ugly input format, don't use implicit anchors", it's just some kind of pushing best practices, I can fairly well understand that I don't see why it should in principle be forbidden for sinks. Alright, as long as the implicit anchors generated by such a sink do not interfere with the explicit anchors defined by the user (e.g. name clash). If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. If you are only interested in a html web site APT sources for your project, I see no reason why you shouldn't be allowed to write a sink parser that automatically generates those anchors for you. Just for the fun of the words, it wasn't meant seriously so we can't publish them. I see, at least I know where to look for them. To come to an end, I might not fully understand all your arguments but that's mostly because I'm not familiar enough with Doxia's architecture. If I look back to where this issue started, I can only repeat you did a good job and feel this issue is ready for being closed, thanks Lukas!
          Hide
          Lukas Theussl added a comment -

          Thank you for the thread! I hope this can serve as a reference for future doubts...

          I have opened DOXIA-238 and DOXIA-239 for some of your intermediary comments, please feel free to file anything else I might have overlooked.

          Show
          Lukas Theussl added a comment - Thank you for the thread! I hope this can serve as a reference for future doubts... I have opened DOXIA-238 and DOXIA-239 for some of your intermediary comments, please feel free to file anything else I might have overlooked.

            People

            • Assignee:
              Lukas Theussl
              Reporter:
              Benjamin Bentmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development