Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-4062

Experiment flexmark for HTML text extraction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • JMAP
    • None

    Description

      JMAP code currently relies on a homegrown rendering code plugged onto an HTML parser.

      Though the code kind of works, it is not core code from ASF James and we regularly miss some formating options and https://issues.apache.org/jira/browse/JAMES-4061 is a good example of it.

      An alternative could be to rely on a battle tested generally purposed library, eg https://github.com/vsch/flexmark-java and flexmark-html2md-converter as suggested privately by Wojtek.

      Related code would likely handle all corner cases without us thinking about it.

      Also we could offer a JVM option for switching between it and the current jsoup implementation, which would stay the default (the time to experiment the flexmark option)

      Attachments

        Activity

          People

            aduprat Antoine Duprat
            btellier Benoit Tellier
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: