Xerces2-J
  1. Xerces2-J
  2. XERCESJ-1558

GSoC: Implement the StAX XMLStreamWriter

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.11.0
    • Fix Version/s: None
    • Component/s: StAX
    • Labels:

      Description

      Xerces does not yet have an XMLStreamWriter (the interface in StAX which writes/serializes an XML document). I think a basic implementation of an XMLStreamWriter (one that can write to a java.io.OutputStream and java.io.Writer) would be a good GSoC project.

        Activity

        Hide
        Alim Ul Gias added a comment -

        I am a Masters student from Bangladesh. I have come to know about this project from GSoC 2014 and already submitted my proposal for developing a XMLStreamWriter like StAX for Xerces. I would really like to contribute on this project. I think that I have a good knowledge on Java and also worked with JAXB. I have already gone through the thread and was wondering that whether there are some more materials that I should look on to make myself prepare. This would certainly help me if my proposal got accepted.

        One more thing, can anyone please tell me that to which mail list I should subscribe for sending mail to the XERCES-J group.

        Show
        Alim Ul Gias added a comment - I am a Masters student from Bangladesh. I have come to know about this project from GSoC 2014 and already submitted my proposal for developing a XMLStreamWriter like StAX for Xerces. I would really like to contribute on this project. I think that I have a good knowledge on Java and also worked with JAXB. I have already gone through the thread and was wondering that whether there are some more materials that I should look on to make myself prepare. This would certainly help me if my proposal got accepted. One more thing, can anyone please tell me that to which mail list I should subscribe for sending mail to the XERCES-J group.
        Hide
        Mihil Ranathunga added a comment - - edited

        Hi Michael,

        I would like to contribute to Xercese as my 2014 GSOC project. But i don't have a thorough idea about this projects deliverable's in order to present a formal proposal.

        1.What is the mailing list i should join to discuss about doing this project for 2014?

        2.Is there an IRC channel?
        3.I saw you mentioned before to get used to existing serializers in Xercese and xalan to get some idea. Could you point me where to look at Xercese Source code?
        4.Is their any existing XMLstreamwriters for stax api? what is the relationship between deliverables of this project and javax.xml.stream XMLStreamWriter interface? Should we create something like a concrete implementation of this class?

        http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195 is not working. Could you tell me what was in this article so i can search for it in the web?

        UPDATE: Sorry: only now i saw that some of my questions have already been answered on the thread above.

        Show
        Mihil Ranathunga added a comment - - edited Hi Michael, I would like to contribute to Xercese as my 2014 GSOC project. But i don't have a thorough idea about this projects deliverable's in order to present a formal proposal. 1.What is the mailing list i should join to discuss about doing this project for 2014? 2.Is there an IRC channel? 3.I saw you mentioned before to get used to existing serializers in Xercese and xalan to get some idea. Could you point me where to look at Xercese Source code? 4.Is their any existing XMLstreamwriters for stax api? what is the relationship between deliverables of this project and javax.xml.stream XMLStreamWriter interface? Should we create something like a concrete implementation of this class? http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195 is not working. Could you tell me what was in this article so i can search for it in the web? UPDATE: Sorry: only now i saw that some of my questions have already been answered on the thread above.
        Michael Glavassevich made changes -
        Labels gsoc gsoc2013 mentor gsoc gsoc2014 mentor
        Hide
        Michael Glavassevich added a comment -

        Hello, you are welcome to submit a proposal if you are interested in this project.

        Show
        Michael Glavassevich added a comment - Hello, you are welcome to submit a proposal if you are interested in this project.
        Hide
        lqjack added a comment -

        Hi Michael,
        I'm a student and I'm intersting in this task for GSoC 2013.
        can i get the task now ?

        Show
        lqjack added a comment - Hi Michael, I'm a student and I'm intersting in this task for GSoC 2013. can i get the task now ?
        Hide
        Shameera Rathnayaka added a comment -

        Hi Michael,

        According to the discussion above, Scope would be to implement only Writer part isn't it? . For implement StAX API over JSON i used java stack and queue data structures, and used state transition mechanism to identify the state of XMLStreamWriter state. I think we can do the same here. I refered to XMLOutputFactory api , I think implementing createXMLStreamWriter(OutputStream stream), createXMLStreamWriter(OutputStream stream, String encoding) and createXMLStreamWriter(Writer stream) api parts would match with Scope of GSOC. WDYT?

        is there any pre-design architecture for implementation? Do Xerces has StAX XMLStreamReader implementation ?.

        Show
        Shameera Rathnayaka added a comment - Hi Michael, According to the discussion above, Scope would be to implement only Writer part isn't it? . For implement StAX API over JSON i used java stack and queue data structures, and used state transition mechanism to identify the state of XMLStreamWriter state. I think we can do the same here. I refered to XMLOutputFactory api , I think implementing createXMLStreamWriter(OutputStream stream), createXMLStreamWriter(OutputStream stream, String encoding) and createXMLStreamWriter(Writer stream) api parts would match with Scope of GSOC. WDYT? is there any pre-design architecture for implementation? Do Xerces has StAX XMLStreamReader implementation ?.
        Hide
        Michael Glavassevich added a comment - - edited

        Hi Shameera,

        There's been quite a bit of discussion about the details of the project on this JIRA issue. Do you have specific questions?

        Given your previous experience with implementing the StAX APIs over JSON, I would guess you're already quite familiar with the interfaces and how they behave as an API.

        Thanks.

        Show
        Michael Glavassevich added a comment - - edited Hi Shameera, There's been quite a bit of discussion about the details of the project on this JIRA issue. Do you have specific questions? Given your previous experience with implementing the StAX APIs over JSON, I would guess you're already quite familiar with the interfaces and how they behave as an API. Thanks.
        Hide
        Shameera Rathnayaka added a comment -

        Hi devs,

        I have implemented XMLStreamReader/Writer implementations to provide XML infoset while processing JSON stream internally[ AXIS2-5362 ] as part of my 2012 gsoc project for Apache Axis2. As this implementation inline with my previous experience, I am interesting about this project.

        Workload is depend on how much internal work we need to do, As i can see this need to override every single method in XMLStreanWriter interface. And also need to do some background research of existing STAX supports too. May be we can extend our scope to write XMLEventWriter as you suggested in above.

        Can you provide more details about this or shall we discuss in improving this project?

        Thansk,
        Shameera.

        Show
        Shameera Rathnayaka added a comment - Hi devs, I have implemented XMLStreamReader/Writer implementations to provide XML infoset while processing JSON stream internally[ AXIS2-5362 ] as part of my 2012 gsoc project for Apache Axis2. As this implementation inline with my previous experience, I am interesting about this project. Workload is depend on how much internal work we need to do, As i can see this need to override every single method in XMLStreanWriter interface. And also need to do some background research of existing STAX supports too. May be we can extend our scope to write XMLEventWriter as you suggested in above. Can you provide more details about this or shall we discuss in improving this project? Thansk, Shameera.
        Hide
        Michael Glavassevich added a comment -

        Hello Arek, I can't speak for why certain students decided not to write a proposal. I imagine that there are probably several thousand potential project ideas that are published across all of the organizations that participate in GSoC. Only a fraction of those will get formal proposals from students and only some of those students will get accepted for GSoC. There are plenty of project ideas from last year that are still available this year. That's not a reflection on ideas being better than others or the priorities of the community. It just means that there are a lot of choices, far more than there are participants for GSoC.

        Regarding the XMLStreamWriter, in general it doesn't verify that the XML is well-formed but it is required to check some error conditions (that are documented on the methods of the API). If you're looking for reading material, this developerWorks article [1] is a good place to start. If you have specific questions about the project and want to discuss it in more depth I would encourage you to join the project mailing list and post your thoughts and questions there.

        Thanks.

        [1] http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195

        Show
        Michael Glavassevich added a comment - Hello Arek, I can't speak for why certain students decided not to write a proposal. I imagine that there are probably several thousand potential project ideas that are published across all of the organizations that participate in GSoC. Only a fraction of those will get formal proposals from students and only some of those students will get accepted for GSoC. There are plenty of project ideas from last year that are still available this year. That's not a reflection on ideas being better than others or the priorities of the community. It just means that there are a lot of choices, far more than there are participants for GSoC. Regarding the XMLStreamWriter, in general it doesn't verify that the XML is well-formed but it is required to check some error conditions (that are documented on the methods of the API). If you're looking for reading material, this developerWorks article [1] is a good place to start. If you have specific questions about the project and want to discuss it in more depth I would encourage you to join the project mailing list and post your thoughts and questions there. Thanks. [1] http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195
        Hide
        Arek Pień added a comment -

        I see that a few people were intrested this project on Gsoc 2012, but any1 haven't realised it. I am wondering why nobody even try, poeple find better ideas? or maybe this project isn't priority for Xerces,apache communicity and he didn't find slot on Gsoc?

        I am interesting this project on present edition of Google summer of code.
        This implementation of XMLStreamWriter don't have to verify if creating xml file is well-formed ,right??
        I will be grateful for any extra materials to study.

        Show
        Arek Pień added a comment - I see that a few people were intrested this project on Gsoc 2012, but any1 haven't realised it. I am wondering why nobody even try, poeple find better ideas? or maybe this project isn't priority for Xerces,apache communicity and he didn't find slot on Gsoc? I am interesting this project on present edition of Google summer of code. This implementation of XMLStreamWriter don't have to verify if creating xml file is well-formed ,right?? I will be grateful for any extra materials to study.
        Hide
        Michael Glavassevich added a comment -

        Yes, it is still available.

        Show
        Michael Glavassevich added a comment - Yes, it is still available.
        Hide
        Arek Pień added a comment -

        Hello, is this project still available to take up at Google summer of code 2013??

        Show
        Arek Pień added a comment - Hello, is this project still available to take up at Google summer of code 2013??
        Hide
        Michael Glavassevich added a comment -

        Hi Paritosh, this project was discussed with multiple people last year but no work has been done on it yet. It's still available for anyone who's interested in working on it.

        Thanks.

        Show
        Michael Glavassevich added a comment - Hi Paritosh, this project was discussed with multiple people last year but no work has been done on it yet. It's still available for anyone who's interested in working on it. Thanks.
        Hide
        Paritosh Aggarwal added a comment -

        Hi, Was any work done on this project? I would like to pick this up and start working, if there hasn't been any patch submitted yet of course.

        Show
        Paritosh Aggarwal added a comment - Hi, Was any work done on this project? I would like to pick this up and start working, if there hasn't been any patch submitted yet of course.
        Michael Glavassevich made changes -
        Field Original Value New Value
        Labels gsoc gsoc2012 mentor gsoc gsoc2013 mentor
        Hide
        Michael Glavassevich added a comment -

        Yes, that's right. You should send your questions to the mailing list.

        Show
        Michael Glavassevich added a comment - Yes, that's right. You should send your questions to the mailing list.
        Hide
        Venkatesh Jujjavarapu added a comment -

        Hi Micheal,

        Thanks i am now subscribed, now i have to send my questions to the mailing address j-dev@xerces.apache.org Right ?

        Show
        Venkatesh Jujjavarapu added a comment - Hi Micheal, Thanks i am now subscribed, now i have to send my questions to the mailing address j-dev@xerces.apache.org Right ?
        Hide
        Michael Glavassevich added a comment -

        Hi Venkatesh,

        Send an e-mail to j-dev-subscribe@xerces.apache.org to subscribe to the development mailing list. Once you're subscribed the mailing list address is j-dev@xerces.apache.org.

        Thanks.

        Show
        Michael Glavassevich added a comment - Hi Venkatesh, Send an e-mail to j-dev-subscribe@xerces.apache.org to subscribe to the development mailing list. Once you're subscribed the mailing list address is j-dev@xerces.apache.org. Thanks.
        Hide
        Venkatesh Jujjavarapu added a comment -

        Hi Micheal ,

        There are many lists and i am supposed to join The Xerces Java developers list, which i am unable to join. Could you suggest me how to join the list so that we can discuss the project in more depth. Thanks in Advance.

        Show
        Venkatesh Jujjavarapu added a comment - Hi Micheal , There are many lists and i am supposed to join The Xerces Java developers list, which i am unable to join. Could you suggest me how to join the list so that we can discuss the project in more depth. Thanks in Advance.
        Hide
        Michael Glavassevich added a comment -

        Hi Venkatesh, I've mentioned other things to consider in an earlier post on this JIRA issue and on the mailing lists [1]. If you have more questions, I would invite you to join the development mailing list [2] to discuss the project in more depth.

        Thanks.

        [1] http://xerces.markmail.org/thread/f6kwks4had5k2odl
        [2] http://xerces.apache.org/mail.html#xerces-j-dev

        Show
        Michael Glavassevich added a comment - Hi Venkatesh, I've mentioned other things to consider in an earlier post on this JIRA issue and on the mailing lists [1] . If you have more questions, I would invite you to join the development mailing list [2] to discuss the project in more depth. Thanks. [1] http://xerces.markmail.org/thread/f6kwks4had5k2odl [2] http://xerces.apache.org/mail.html#xerces-j-dev
        Hide
        Venkatesh Jujjavarapu added a comment -

        Hi Micheal,

        From the first article , we have to take care about managing the namespaces and their prefixes.

        and learn more about starting and closing the methods properly.

        do something about close() method to close the underlying output also

        Manage attributes and characters and if i miss anything , point on that area

        Show
        Venkatesh Jujjavarapu added a comment - Hi Micheal, From the first article , we have to take care about managing the namespaces and their prefixes. and learn more about starting and closing the methods properly. do something about close() method to close the underlying output also Manage attributes and characters and if i miss anything , point on that area
        Hide
        Venkatesh Jujjavarapu added a comment -

        Hi Micheal,

        I am a Graduate student interested in working on this task for Gsoc 2012.

        I really don't have any idea about Xerces, But

        I am familiar with Java and I can understand XML and i don't think this task is too complex, I am so much interested in taking up this task. Thanks for the articles above and can you provide me any other input for the task which i can study and make a proper detailed timely organization of implementation of task in the proposal. Thanks in Advance.

        Show
        Venkatesh Jujjavarapu added a comment - Hi Micheal, I am a Graduate student interested in working on this task for Gsoc 2012. I really don't have any idea about Xerces, But I am familiar with Java and I can understand XML and i don't think this task is too complex, I am so much interested in taking up this task. Thanks for the articles above and can you provide me any other input for the task which i can study and make a proper detailed timely organization of implementation of task in the proposal. Thanks in Advance.
        Hide
        Michael Glavassevich added a comment - - edited

        Hi Alexander,

        > 1) Is it really actual for now?

        It's a feature of StAX that Xerces doesn't exist yet support if that's what you're asking. We've been gradually getting help from the community on building a StAX implementation. All of the serialization parts still need work.

        > 2) As I understand such XMLStreamWriter implementation could be well localized (in a couple or little more classes), because I just should Implement a class, matching [1] specification. Of course, there may be some helper classes. Am I right with it?

        At the very least there needs to be an XMLOutputFactory (the thing that creates XMLStreamWriters) and an XMLStreamWriter implementation, so that's two classes, plus any other helper classes you need to support a StAX serializer.

        > 3) I have downloaded Xerces and Xalan sources and take a short look on them. But I haven't really understood about existing serializers. Is it about something under org.apache.xml.serializer namespace? Should I use something from there or should just use provided Writer or OutputStream?

        Yes, org.apache.xml.serializer.* is the base Xalan serializer. You could use this as the base for the XMLStreamWriter (e.g. SerializerFactory and SerializationHandler) if you choose.

        > 4) At the first look there are not a lot of work (this description is really simplified):

        There's more than that, for instance:

        • the creation and initialization of the serializer
        • managing and recycling of resources which have been returned in close()
        • handling of flush() when the XMLStreamWriter is in various states
        • error handling / reporting; message files
        • build.xml updates, e.g. to include a META-INF/services/javax.xml.stream.XMLOutputFactory file

        Depending on your interest, the scope of the project could be expanded to include more parts of the StAX serialization API (e.g. XMLEventWriter).

        Thanks.

        Show
        Michael Glavassevich added a comment - - edited Hi Alexander, > 1) Is it really actual for now? It's a feature of StAX that Xerces doesn't exist yet support if that's what you're asking. We've been gradually getting help from the community on building a StAX implementation. All of the serialization parts still need work. > 2) As I understand such XMLStreamWriter implementation could be well localized (in a couple or little more classes), because I just should Implement a class, matching [1] specification. Of course, there may be some helper classes. Am I right with it? At the very least there needs to be an XMLOutputFactory (the thing that creates XMLStreamWriters) and an XMLStreamWriter implementation, so that's two classes, plus any other helper classes you need to support a StAX serializer. > 3) I have downloaded Xerces and Xalan sources and take a short look on them. But I haven't really understood about existing serializers. Is it about something under org.apache.xml.serializer namespace? Should I use something from there or should just use provided Writer or OutputStream? Yes, org.apache.xml.serializer.* is the base Xalan serializer. You could use this as the base for the XMLStreamWriter (e.g. SerializerFactory and SerializationHandler) if you choose. > 4) At the first look there are not a lot of work (this description is really simplified): There's more than that, for instance: the creation and initialization of the serializer managing and recycling of resources which have been returned in close() handling of flush() when the XMLStreamWriter is in various states error handling / reporting; message files build.xml updates, e.g. to include a META-INF/services/javax.xml.stream.XMLOutputFactory file Depending on your interest, the scope of the project could be expanded to include more parts of the StAX serialization API (e.g. XMLEventWriter). Thanks.
        Hide
        Alexander Likhanov added a comment -

        Hi!
        I'm a student and I'm intersting in this task for GSoC 2012.
        I'm new to Xerces, but I familiar with Java, and I understand that XML is an very important part for Java World.
        I think this task is good for start, Xerces looks not as "monstrous" as some complex Java libraries and apps.
        Thanks for the links above, they give me a little brief about task.

        I have some questions about task:

        1) Is it really actual for now?

        2) As I understand such XMLStreamWriter implementation could be well localized (in a couple or little more classes), because I just should Implement a class, matching [1] specification. Of course, there may be some helper classes. Am I right with it?

        3) I have downloaded Xerces and Xalan sources and take a short look on them. But I haven't really understood about existing serializers. Is it about something under org.apache.xml.serializer namespace? Should I use something from there or should just use provided Writer or OutputStream?

        4) At the first look there are not a lot of work (this description is really simplified):

        • Learn more about namespaces and prefixes and different cases of them and "isRepairingNamespaces" property from first table in [1]
        • Implement something to control tags hierarchy (starting and right closing), I think something like stack will be appropriate.
        • Take care about proper caracters and attributes escaping
        • Well unit tests coverage, of course. May be some documentation
          Maybe I missed something very important?

        Thanks in advance.

        P.S. If it is not right place to ask and write, please point me more appropriate place for this.

        [1] - http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamWriter.html

        Show
        Alexander Likhanov added a comment - Hi! I'm a student and I'm intersting in this task for GSoC 2012. I'm new to Xerces, but I familiar with Java, and I understand that XML is an very important part for Java World. I think this task is good for start, Xerces looks not as "monstrous" as some complex Java libraries and apps. Thanks for the links above, they give me a little brief about task. I have some questions about task: 1) Is it really actual for now? 2) As I understand such XMLStreamWriter implementation could be well localized (in a couple or little more classes), because I just should Implement a class, matching [1] specification. Of course, there may be some helper classes. Am I right with it? 3) I have downloaded Xerces and Xalan sources and take a short look on them. But I haven't really understood about existing serializers. Is it about something under org.apache.xml.serializer namespace? Should I use something from there or should just use provided Writer or OutputStream? 4) At the first look there are not a lot of work (this description is really simplified): Learn more about namespaces and prefixes and different cases of them and "isRepairingNamespaces" property from first table in [1] Implement something to control tags hierarchy (starting and right closing), I think something like stack will be appropriate. Take care about proper caracters and attributes escaping Well unit tests coverage, of course. May be some documentation Maybe I missed something very important? Thanks in advance. P.S. If it is not right place to ask and write, please point me more appropriate place for this. [1] - http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamWriter.html
        Hide
        Dishara Wijewardana added a comment -

        Hi Michael,
        I looked in to this article and it gives a better overview of StAX API. I found another useful tutorial [1] fro sun which gives API level good detailed description.
        [1] - http://java.sun.com/webservices/reference/tutorials/jaxp/html/stax.html#bnbff

        Show
        Dishara Wijewardana added a comment - Hi Michael, I looked in to this article and it gives a better overview of StAX API. I found another useful tutorial [1] fro sun which gives API level good detailed description. [1] - http://java.sun.com/webservices/reference/tutorials/jaxp/html/stax.html#bnbff
        Hide
        Michael Glavassevich added a comment -

        For students considering this project, in addition to familiarizing yourself with the XMLOutputFactory/XMLStreamWriter APIs it would probably be a good idea to have a look at the existing serializers in Xerces and Xalan to get a feel for the implementation of such a component.

        Here's a good introductory article [1] on the XMLStreamWriter.

        [1] http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195

        Show
        Michael Glavassevich added a comment - For students considering this project, in addition to familiarizing yourself with the XMLOutputFactory/XMLStreamWriter APIs it would probably be a good idea to have a look at the existing serializers in Xerces and Xalan to get a feel for the implementation of such a component. Here's a good introductory article [1] on the XMLStreamWriter. [1] http://www.ibm.com/developerworks/library/x-stax3/index.html#N10195
        Michael Glavassevich created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Glavassevich
          • Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development